D83 | Macro Paper Warehouse

A Cognitive Theory of Reasoning and Choice

Mon, 01 Jan 0001 00:00:00 +0000

Bordalo, Gennaioli, Lanzani, and Shleifer develop a cognitive theory of choice in which a decision maker’s attention to the features of options is determined by her categorization of the current problem against a memory database of problems she solved in the past. The core claim is that before solving a problem, the decision maker asks “what kind of problem is this?” and resolves it by selecting the category — indexed by a prototype attention-plus-context vector and a time-discounted frequency — whose similarity to the current problem is maximized. This problem recognition step then pins down which features (price, quality, probabilities) receive attention, which in turn shapes valuation and choice.

The model formalizes two-step choice. In step one (recognition), the decision maker jointly chooses an attention vector alpha_P and a category c* to maximize a separable similarity function S[(alpha_P, kappa_P), (alpha_c, kappa_c)] weighted by category frequency F_c, plus a Type I extreme-value shock that yields a logit probability over categories. In step two, she maximizes perceived value over the menu using the endogenously determined weights. Perceived hedonic value of feature i shrinks toward the menu average when alpha_{P,i} < 1; perceived probabilities compress toward uniform when the event-attention weight falls below 1, producing probability overweighting of unlikely events. Full attention recovers expected utility.

The model yields three structural predictions that hold without changing tastes or information. First, within-person multi-modal attention: because categorization is stochastic, the same person can cluster on entirely different features (e.g., the base rate vs. the likelihood in an inference problem) across otherwise identical choice occasions. Second, systematic context-driven instability: when an irrelevant context feature kappa_{P,i} drifts away from a category’s diagnostic kappa_{c,i}, the probability of that category falls discontinuously, causing a discrete switch in the attention profile and hence in valuation. Third, experience-driven heterogeneity: people more frequently exposed to a category (higher F_c) are more likely to use it, producing persistent differences in price elasticities or probability weighting at constant income and tastes.

Applied to riskless consumer choice, the paper introduces two categories — “buying” (full attention to price, partial to quality: alpha_{M_g}=1 > alpha_{Q_g}=alpha) and “consuming” (full attention to quality, partial to price: alpha_{Q_g}=1 > alpha_{M_g}=alpha). A jam problem categorized as buying yields valuation v = alphaq - etap; categorized as consuming, v = q - alphaetap. The valuation jumps discontinuously as context crosses a threshold kappa*, which shifts when relative category frequency F_{buy}/F_{con} changes. This framework accounts for context-dependent price elasticities (Wakefield and Inman 2003), poverty-driven excess price focus (Shah et al. 2018), de-commoditization through advertising, and mental accounting anomalies including opportunity cost neglect and the sunk cost fallacy — both arising because con neglects capital gains (alpha_{con,Delta_M}=0) and buy neglects quality shocks (alpha_{buy,Delta_Q}=0).

Applied to statistical judgment, the paper introduces two categories — “frequency estimation” (attention alpha_1=1 to a single i.i.d. draw from a known DGP) and “agnostic inference” (attention alpha_S=1 to the share of heads as a sufficient statistic). The threshold N* separates recognition: for sequence length N_P < N*(F_{freq}/F_{inf}), the decision maker categorizes as frequency and correctly assesses odds; for N_P >= N*, she switches to inference and overweights balanced sequences, producing the Gambler’s Fallacy. The same competition between categories also accounts for base rate neglect, conjunction fallacy, and correlation neglect, with the bias strengthening as sequences grow longer.

Applied to risky choice, bottom-up salience — sensory prominence and contrast — interacts with categorization. A publicity shock drawing attention to a low-probability contamination risk raises similarity to “consuming,” triggering a category switch that amplifies attention to quality broadly and reduces attention to price, producing large valuation drops disproportionate to the actual probability shift. This mechanism generates the framing effects of prospect theory without a stable S-shaped utility function: gains and losses frames correspond to different contexts activating different categories.

Scope conditions: the theory applies when features and their values are fully known to the decision maker (no uncertainty about attributes), so the distortions take the form of altered sensitivity to known features rather than missing information. The set of categories C is taken as given in the formal analysis, though the authors discuss endogenization as future work.

Q: What is the paper’s central departure from standard rational inattention and noisy-perception models?

A: Standard models (Sims 2003, Woodford 2012, Enke and Graeber 2023) produce unimodal, stably weighted valuations — the decision maker’s weighting of features is a smooth function of payoff-relevant costs or priors. In this paper, the weighting is determined by problem recognition, which is discrete and stochastic, producing within-person multi-modal attention: the same person can cluster on entirely different features across identical problems. The authors cite direct evidence from Bordalo, Conlon, Gennaioli, Kwon, and Shleifer [20] showing bimodal clustering on base rates vs. likelihoods in statistical problems, a pattern inconsistent with stable-weighting models.

Q: How is perceived value distorted when the attention weight on a hedonic feature is below 1?

A: The perceived value of hedonic feature i is u_i(alpha_P) = alpha_{P,i} * u_i + (1 - alpha_{P,i}) * u_bar_i, where u_bar_i is the average value of that feature across options in the menu. An attention weight of zero collapses perceived variation in that feature to zero; full attention recovers the true value. The implication is that under-attention shrinks the decision maker’s effective sensitivity to a known attribute, causing systematic under- or over-valuation relative to a rational benchmark while tastes (marginal utilities) are held fixed.

Q: How is perceived probability distorted?

A: With attention weight alpha_{P,W} on event W, the perceived probability of event e is P(e)^{alpha_{P,W}} / sum_{e’} P(e’)^{alpha_{P,W}}, which compresses the distribution toward uniform as alpha_{P,W} falls toward 0 and recovers the true distribution at alpha_{P,W}=1. In the jam example, under-attention to the small probability of spoilage causes the decision maker to overestimate the risk of contamination. For multi-dimensional event vectors the formula generalizes multiplicatively, allowing “editing out” of entire event dimensions (e.g., urn selection in a balls-and-urns problem) when their attention weight hits zero.

Q: What is the mechanism for context-dependent price elasticity?

A: When context kappa_P is below threshold kappa*(F_{buy}/F_{con}), the decision maker categorizes the problem as “buying” and her valuation is v = alphaq - etap, giving a high price sensitivity (coefficient eta) and attenuated quality sensitivity (coefficient alpha < 1). Above kappa*, she categorizes as “consuming” and valuation is v = q - alphaetap, reversing the emphasis. Because the threshold kappa* is increasing in relative frequency F_{buy}/F_{con}, a decision maker with more buying experience has a higher threshold and thus acts as more price-elastic at any given context level. These elasticity differences arise without any change in the true marginal utility of money eta or quality q.

Q: How does the model generate the sunk cost fallacy and opportunity cost neglect as a unified phenomenon?

A: Both anomalies arise because buying and consuming categories selectively neglect shocks. In the football example, recognizing the problem as “buying” activates alpha_{buy,Delta_Q}=0, so the blizzard quality shock Delta_q<0 is ignored and the decision maker drives to the game as if the shock did not occur — the sunk cost fallacy. In the wine example, recognizing the problem as “consuming” activates alpha_{con,Delta_M}=0, so the capital gain Delta_p is ignored and the decision maker reports a zero or purchase-price cost — opportunity cost neglect. The unifying mechanism is that each category attends only to the features diagnostic of its prototypical experiences: buying attends to price paid and normal quality; consuming attends to realized quality and partly to price, but not to capital gains.

Q: What comparative static does the model predict for sunk cost susceptibility based on experience?

A: People with higher F_{buy} (more buying experiences, e.g. poverty experiences or having recently purchased but not yet consumed the good) exhibit more sunk cost fallacy and less opportunity cost neglect. Conversely, season ticket holders face many consuming experiences relative to one buying event, raising F_{con} and thus reducing susceptibility to the sunk cost fallacy for sports events. Making the blizzard more salient in the description shifts similarity toward “consuming,” also reducing the sunk cost fallacy through a different channel (bottom-up salience rather than experience).

Q: What is the paper’s explanation for the Gambler’s Fallacy, and what distinguishes it from prior accounts?

A: The Gambler’s Fallacy arises when sequence length N_P exceeds threshold N*(F_{freq}/F_{inf}), causing the decision maker to switch from the frequency category (which attends to the 50:50 fairness of the coin) to the inference category (which attends to the share of heads). Under inference, the decision maker treats balanced and unbalanced sequences as representatives of their “share of heads equivalence class,” and the class of balanced sequences is larger, so balanced sequences receive higher estimated probability — the Gambler’s Fallacy. This differs from Rabin and Vayanos (2010), where the bias stems from a belief that the coin is drawn from a pool; here the decision maker knows the coin is fair (kappa_{P,U}=0.5) but the inference representation causes question substitution rather than a wrong model of the DGP.

Q: How does the model make the Gambler’s Fallacy testable beyond length effects?

A: The model predicts the bias is stronger for decision makers who recently solved many inference problems (lower F_{freq}/F_{inf}), and weaker when the 50:50 nature of flips is made bottom-up salient in the choice context (because salience raises similarity to the frequency category, hindering recognition of inference). These cognitive proxies — experience frequencies and bottom-up salience — are orthogonal to the statistical content of the problem and thus allow identification of the mechanism separately from changes in information or incentives.

Q: How does the model produce framing effects in risky choice without a stable S-shaped utility function?

A: Gains and losses frames are modeled as different context vectors kappa_P that differentially increase similarity to a “safe outcome” category or a “risk” category. Recognizing the problem as the safe-outcome category shifts attention toward the certain option; recognizing it as the risk category shifts attention toward variance. The reversal of preferences between gain and loss frames (the Asian Disease problem, Tversky and Kahneman 1981) thus emerges from context-driven re-categorization rather than from a fixed probability weighting function. The novel prediction is that framing effects should be stronger for decision makers with more experience with the category activated by each frame, and weaker when bottom-up salience of the alternative frame’s features is raised.

Q: How does bottom-up salience interact with top-down categorization in the contamination example?

A: A publicity shock alpha_{delta,Q_b}>0 raises baseline attention to the spoiled-jam quality feature, increasing the similarity of the current problem to the “consuming” category (where quality is focal). This triggers a category switch for marginal agents, activating the full consuming attention profile — which attends to quality broadly, not just to contamination specifically, and reduces attention to price. The resulting valuation drop is therefore disproportionate to the actual probability of contamination and exhibits price insensitivity, because re-categorization shifts the entire attention profile rather than just updating a single probability.

Q: How does the model relate to and distinguish itself from case-based decision theory (Gilboa and Schmeidler 1995) and analogical reasoning (Mullainathan 2002, Fryer and Jackson 2008)?

A: In Gilboa-Schmeidler and related models, the decision maker uses past cases to resolve uncertainty about unknown attributes of current options; attention is full and the mechanism is extrapolation of payoffs from similar cases. In Mullainathan (2002) memory-based model, categories again serve to fill in missing information. In this paper, there is no uncertainty about attributes — features and their values are fully known — and the distortion instead takes the form of altered sensitivity to known features through selective attention. This allows the model to produce biases even in simple problems with full data disclosure, and to explain phenomena like base rate neglect and price insensitivity that are not primarily about missing information.

Q: What does the model predict about within-person versus across-person distributions of valuations?

A: Within a person, attention is multi-modal (bimodal in the two-category case) because categorization is stochastic. However, if many categories are possible across the population, the aggregate distribution of valuations can appear approximately unimodal even though each individual’s distribution is not. This distinction is empirically important: a researcher observing average choices may incorrectly infer smooth preference heterogeneity when the underlying mechanism is discrete category switching.

Q: What cognitive proxies does the model propose for empirical identification?

A: The theory links endogenous attention and choice to three observable (or measurable) proxies: (1) past experience frequencies F_c, measurable from administrative histories, surveys about past exposure, or experimental manipulation of training; (2) contextual similarity, measurable from field or experimental variation in irrelevant context features; and (3) bottom-up salience, experimentally controllable via prominence or contrast manipulations. The key identification logic is that these proxies are payoff-irrelevant — they do not change tastes, information, or the objective choice problem — yet predict systematic shifts in choice through their effect on recognition.

Problem Recognition: The first step in the decision maker’s choice process, in which she jointly selects an attention vector alpha_P and a category c* by maximizing weighted similarity between the current problem (characterized by its context vector kappa_P) and the prototype of a past category (alpha_c, kappa_c), multiplied by the category’s time-discounted frequency F_c. Recognition is not about resolving uncertainty over attributes but about selecting which known attributes to attend to.

Category: A partition element of the decision maker’s memory database, indexed by a prototype attention-plus-context vector (alpha_c, kappa_c) and a frequency scalar F_c. The prototype encodes both the context features diagnostic of experiences in that category (binary alpha_{c,i} for i in Phi_K) and the attention to hedonic and event features (alpha_{c,i} for i in Phi_H union Phi_E) used when solving problems in that category. Examples in the paper: “buying” and “consuming” for riskless choice; “frequency estimation” and “agnostic inference” for statistical judgment.

Attention Weight (alpha_{P,i}): A scalar in [0,1] assigned to feature i of the current problem P. For hedonic features, alpha_{P,i}<1 collapses perceived variation toward the menu average; for event features, alpha_{P,i}<1 compresses perceived probabilities toward uniform. Full attention alpha_{P,i}=1 recovers expected utility. Attention weights are the endogenous output of the recognition step, not fixed preference parameters.

Contextual Similarity S: A separable function measuring how close the current problem (alpha_P, kappa_P) is to a category prototype (alpha_c, kappa_c). It decreases in discrepancies in the attention vector (measured by a strictly increasing, convex function d) and in discrepancies in the values of context features diagnostic of the category (d_i(kappa_{P,i}, kappa_{c,i}) * alpha_{c,i}). Endogenous attention to context is set to reduce sensitivity to discrepancies, not to eliminate them.

Mental Accounting (as categorization): In the paper’s account, non-fungibility, sunk cost fallacy, and opportunity cost neglect all arise because buying and consuming categories selectively attend to different monetary and quality features. The sunk cost effect is alpha_{buy,Delta_Q}=0; opportunity cost neglect is alpha_{con,Delta_M}=0. Mental accounts are not separate budget constraints but the by-product of category-specific attention profiles that were calibrated to normal-state experiences and do not generalize to shocks.

Bottom-up Salience: Exogenous attention to a feature driven by sensory prominence (described by alpha_{delta,i} in the problem’s presentation vector) or payoff contrast (the DM attends more to features where her option’s value deviates more from the menu average relative to total menu variance). Bottom-up salience raises baseline attention to a feature before top-down categorization acts, and can trigger a category switch by raising similarity to the category for which that feature is focal.

Gambler’s Fallacy via Question Substitution: In the model, the Gambler’s Fallacy arises when a long sequence length kappa_{P,N} causes recognition of the “agnostic inference” category, which focuses attention on the share of heads alpha_S=1. The decision maker then treats sequences as representatives of a “share of heads equivalence class,” and since the balanced class is larger than the unbalanced class, balanced sequences are assigned higher estimated probability. This is not a belief that the coin is unfair; it is question substitution induced by the inference representation.

Central bank communication by ??? The economics of monetary policy leaks

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper investigates the economics of monetary policy leaks — anonymous disclosures of confidential information by insiders to the media — focusing on three central questions: (1) Are leaks random accidents, strategic individual disclosures, or institutionally authorized “plants”? (2) Do leaks shape public (financial market) views, and by how much? (3) Can attributed (named) communication by central bank officials mitigate the effects of leaks?

Data and Setting

The authors study the Eurosystem (ECB and euro area National Central Banks) over January 2002 to December 2021. Their primary data source is a novel database of 368 unique policy-relevant leaks — assembled by manually filtering and classifying more than a million news items from Reuters, Bloomberg, and Market News International archives — with precise minute-level timestamps. Topics covered include: policy rates (178 leaks), unconventional monetary policy/UMP (207 leaks), economic growth (47), inflation (41), and euro exchange rate (36); individual leaks may cover multiple topics. They complement this with a dataset of 7,883 attributable public statements by ECB Governing Council members, identified via keyword filtering and machine learning classification of the Reuters News Archive.

Methodology

The paper employs four main empirical strategies. First, high-frequency event studies using asymmetric windows (5 minutes before to 30 minutes after an event) compare absolute market reactions in OIS rates across the full term structure (3M to 10Y) and in the EURO STOXX 50 across leaks, 5,000 randomly sampled placebo events, and attributable statements. Second, Poisson regression models relate the number of leaks per policy meeting to proxies for Governing Council disagreement (Italian-German sovereign yield spread, inter-quartile range of national inflation rates, number of attributable statements per meeting) and a dummy for quarterly macroeconomic projection releases. Third, a regression framework tests whether leaks move market expectations toward the subsequent policy outcome — identifying whether leaks are informative about the direction of policy. Fourth, an augmented version of the Tillmann (2021) model relates end-of-day changes in longer-term OIS rates to high-frequency monetary policy surprises, interacted with dummies for post-announcement leaks and attributable statements.

Main Findings

Incidence and timing. The number of Eurosystem leaks peaked at 36 in 2019 (more than four per policy meeting on average) before declining by more than one third following the start of Christine Lagarde’s presidency in November 2019. Leaks cluster around policy meetings and, since 2015, have shifted notably from before meetings to after meetings, a shift driven by leaks related to UMP. Leaks occur even during the ECB’s quiet period, when policy-makers are formally restricted from public statements on policy-sensitive topics.

Leaks are not accidents. Poisson regressions reveal that the number of leaks per meeting is significantly and positively associated with proxies for Governing Council disagreement: every additional percentage point in the Italian-German sovereign yield spread is associated with approximately half an additional leak per meeting. The propensity of a policy change increases by four to six percentage points with each additional pre-meeting leak (statistically significant at the 5% or 10% level). The specification explains around 15% of the variation in leak counts.

Market impact. Market movements around leaks are up to 85% larger than those around placebo events. Leaks trigger market reactions that are consistently larger than those of attributable statements by individual Governing Council members across the entire OIS term structure and in equities — a result robust to controlling for distance to policy meetings. Rate leaks mainly move the short and medium end of the yield curve; UMP leaks affect the long end and equities. Leaks about general economic conditions (growth, inflation, exchange rate) produce little statistically significant market response.

Leaks are uninformative about policy direction. Conditional on a pre-meeting leak occurring, the average leak does not move market rates closer to the levels prevailing directly after the subsequent policy announcement. By contrast, attributable statements systematically do reduce this distance. This asymmetry implies that leaks predominantly reflect minority opinions within the Governing Council. Consistent with this, leaks counteract prevailing trends in market expectations at the short end of the yield curve (as established by a negative coefficient on the interaction between the prevailing seven-day pre-leak trend and the leak dummy).

Leaks are not plants; attributed communication mitigates their effects. Post-announcement leaks dampen the transmission of monetary policy surprises to longer-term rates (negative and significant interaction coefficient in the augmented Tillmann framework). Attributed statements by ECB Executive Board members, by contrast, systematically move in the direction opposite to the preceding leak across most of the yield curve, partially reversing leak-induced market moves. More intense pre-leak attributable communication is also associated with lower market impact of the subsequent leak, across most maturities. These results jointly indicate that most Eurosystem leaks originate from individual insiders with minority opinions rather than constituting institutional plants.

Scope Conditions

Results pertain to the Eurosystem committee setting, where decision-making is broadly consensus-based and voting records are not published; they may not fully generalize to institutions with concentrated decision-making power. The study measures effects on financial markets, not broader public opinion.

Layer 2 — Q&A

Q1: How is a “leak” defined in this paper, and how are Eurosystem leaks identified empirically?

A leak is defined as a disclosure of confidential information by an insider to the media with an expectation of anonymity. Eurosystem leaks are identified from Reuters, Bloomberg, and Market News International archives (2002–2021) using keyword-driven pre-filtering followed by manual classification of “candidate” items. The resulting database contains 1,253 news items that aggregate to 368 unique policy-relevant leaks with minute-level timestamps. Policy-relevant leaks touch on: policy rates, unconventional monetary policy tools, economic growth, inflation, or the euro exchange rate; leaks about local economic conditions, banking regulation, or managerial appointments are excluded.

Q2: What are the broad trends in the number and topic composition of Eurosystem leaks over 2002–2021?

The number of leaks rose sharply in the second half of the sample, peaking at 36 in 2019 (more than four per meeting on average). Since Christine Lagarde took over the ECB presidency in November 2019, leaks fell by more than one third from that peak. The topic composition shifted substantially over time: policy-rate leaks predominated in the earlier period, while leaks related to UMP came to dominate in the 2015–2021 sub-period.

Q3: How does the timing of leaks within the policy meeting cycle change across sub-periods?

In the full sample, leaks cluster in the run-up to policy meetings and immediately following announcement days (both on the announcement day itself and the following Friday). Since 2015, a notable shift occurs from pre-meeting to post-meeting timing, driven specifically by leaks related to UMP. The authors attribute this shift to the expectation-management role of UMP: post-meeting leaks allow dissenting insiders to reshape market expectations that are otherwise guided by official press releases and press conferences.

Q4: What regression evidence supports the view that leaks are not random accidents?

Poisson regressions of the number of leaks per meeting on disagreement proxies find significant positive coefficients on: the lagged Italian-German sovereign yield spread (about half a leak more per meeting for each additional percentage point of spread), the inter-quartile range of national inflation rates, and the number of attributable statements per meeting. Meetings coinciding with the release of quarterly macroeconomic projections also attract significantly more leaks. These results are robust to replacing the disagreement proxies with a binary dissent index based on Q&A sessions at ECB press conferences (Tillmann, 2021), even after excluding disagreement-related leaks from the dependent variable to address endogeneity. The model explains about 15% of the variation in leak counts.

Q5: Does the number of pre-meeting leaks predict policy changes?

Yes. The propensity of a monetary policy change increases by four to six percentage points with each additional pre-meeting leak (significant at the 5% or 10% level). This signal about the propensity of change (not the direction) is hard to square with the random accidents hypothesis.

Q6: How large are the financial market reactions to leaks relative to placebo events and to attributable statements?

Market movements around leaks are up to 85% larger than the average size of market reactions to 5,000 randomly sampled placebo events. When leaks are compared directly to attributable statements (with leaks as the baseline and fixed effects for year, month, weekday, and hour), average absolute market moves around leaks are consistently larger across the entire term structure of OIS rates and for the EURO STOXX 50. This result is robust to differences in distance to policy meetings, with size differences across the full term structure persisting for periods far from meetings; near meetings, differences narrow but the average market reaction to leaks never falls below that to attributable statements.

Q7: Do the market effects of leaks differ by topic?

Yes. Leaks about policy rates primarily move the short and medium end of the yield curve. Leaks about UMP tools affect the long end of the curve and equities. Leaks about general economic conditions (growth, inflation, euro exchange rate) do not produce statistically significant market reactions, consistent with the interpretation that economic condition leaks require more interpretation before their implications for the policy path become apparent.

Q8: Do leaks move market expectations in the direction of the subsequent policy outcome?

No. The average pre-meeting leak does not reduce the absolute distance of market rates to post-announcement levels. This result holds across maturities from 3M to 10Y and is robust to separating leaks inside and outside the ECB’s quiet period. Attributable statements, by contrast, systematically reduce this distance (Table 7). The failure of leaks to align expectations with outcomes is interpreted as evidence that leaks predominantly reflect minority views within the Governing Council rather than information held by the decisive voter.

Q9: Do leaks counteract or reinforce prevailing trends in market expectations?

Leaks counteract prevailing trends. The regression of market reactions to leaks and placebo events on the seven-day pre-event trend reveals a significantly negative interaction between the trend and the leak dummy at the short end of the yield curve. This result is driven specifically by leaks about policy rates.

Q10: Do post-announcement leaks dampen the transmission of monetary policy surprises to longer-term rates?

Yes. In the augmented Tillmann (2021) framework, the interaction of the high-frequency 2Y monetary policy surprise with a dummy for post-announcement leaks is negative and significant for 2Y, 5Y, and 10Y OIS rates. In contrast, the interaction with a dummy for post-announcement attributable statements is positive and significant across maturities, indicating that attributed communication reinforces the official policy signal. These two results jointly show that leaks weaken official policy announcements while attributed communication strengthens them.

Q11: Does more intense pre-leak attributable communication reduce the market impact of subsequent leaks?

Yes. Using an intensity measure that weights each attributable statement by the inverse of its distance in hours to the subsequent leak (covering a window from 36 hours to 30 minutes before the leak), the paper finds a significant negative relationship between pre-leak communication intensity and the absolute market reaction to the leak, controlling for year, month, weekday, and hour fixed effects. This holds across most maturities.

Q12: Does the market impact evidence support the “plant” hypothesis?

No. If leaks were institutional plants intended to prepare markets for new policy, one would expect the ECB Executive Board — which controls official communication — to subsequently reinforce the signal from leaks. Instead, attributable statements by ECB-affiliated Governing Council members are systematically negatively correlated with the market direction of the preceding leak across the yield curve, with significant coefficients at medium maturities. NCB Governor statements show weaker and more ambiguous effects, potentially because their statements generate smaller average market movements rather than reflecting a lack of willingness to counteract leaks.

Q13: Why do markets react to leaks even though leaks are generally uninformative about policy outcomes?

The paper offers three candidate explanations: (1) automated trading algorithms that do not distinguish between attributed and anonymous communication; (2) leaks serve as a coordination device in the spirit of Morris and Shin (2002), amplifying even noisy signals; (3) media-reporting models such as Nimark (2014) and Chahrour et al. (2021) predict that “man-bites-dog” news — unusual events such as revelations of committee disagreement — shift beliefs beyond their true information content. Leaks are unusual both in frequency (far less common than attributed statements) and in content (they reveal disagreement that rarely surfaces in official communication).

Q14: What are the implications for the measurement of monetary policy shocks from high-frequency identification?

The paper notes that Eurosystem leaks frequently occur shortly before or after official policy announcements. Pre-announcement leaks can shift market expectations before the start of standard event windows, reducing the measured surprise component of official announcements. Post-meeting leaks dampen the end-of-day effects of announcements. In both cases, standard high-frequency surprise instruments extracted from official announcements alone may miss the full extent of new information available to market participants, suggesting that accounting for leaks could improve the relevance of high-frequency instruments used in monetary policy identification.

Q15: What are the implications for the design of central bank quiet periods?

The ECB’s quiet period ends with the policy announcement, whereas the Federal Reserve’s extends to the day after the meeting. Based on the finding that post-announcement leaks dampen policy announcement effects while post-announcement attributed statements reinforce them, the paper suggests that permitting attributed communication shortly after policy decisions may help mitigate the market impact of post-announcement leaks.

Key Concepts

Monetary policy leak (“sources story”): In this paper, a leak is defined as a disclosure of confidential information emanating from an insider within the Eurosystem (ECB or NCB staff or policy-makers) that is transmitted to financial media with an expectation of anonymity for the source. The paper excludes whistle-blower cases and focuses on leaks where anonymity keeps attention on the content rather than the identity of the source. Leaks are distinct from “plants” (formally authorized institutional disclosures intended to advance the institution’s goals) and from “pleaks” (the middle ground).

Plant: An authorized or semi-authorized anonymous disclosure of confidential information made for the purpose of advancing the public institution’s own goals and interests, as distinct from a leak that originates from an individual insider’s personal agenda. The paper tests and rejects the plant hypothesis for most Eurosystem leaks on the basis that ECB Executive Board members’ attributed statements systematically counteract the market impact of leaks.

Single voice principle: The ECB’s communication norm requiring that Governing Council members discuss and resolve disagreements internally while publicly representing the official policy stance. This principle creates a setting where individual members with minority views may resort to anonymous communication as a way to express dissent “off-protocol.”

Quiet period (purdah): The ECB’s rule requiring policy-makers to refrain from public statements on policy-related topics in the seven days before each Governing Council monetary policy meeting. Leaks cluster during this period despite the restriction, supporting the non-random interpretation of leaks.

Attributable (named) statement: A public statement clearly attributed to a specific, named member of the ECB Governing Council, reported as a breaking-news headline. Attributable statements serve both as a comparison benchmark for measuring the market impact of leaks and as a mitigation instrument when they counteract leak-induced market moves.

Pre-leak communication intensity (lambda): The paper’s measure of the intensity of attributable communication in the 36-hour window before a given leak, defined as the sum of inverse time distances (in hours) from each attributable statement to the leak. A higher value means more recent and/or more numerous attributed statements precede the leak.

High-frequency event study window: The paper uses an asymmetric window starting 5 minutes before and ending 30 minutes after a leak’s timestamp. Market reactions are measured as the change in the median OIS quote during the 10 minutes after the window versus the 10 minutes before, matching methodology used for both leaks and attributable statements to ensure comparability across communication types.

Post-announcement leak dummy: An indicator taking the value of one if at least one leak occurs between the end of the official ECB monetary policy announcement window (15:50 CET) and end of trading hours on the announcement day. Used in the augmented Tillmann (2021) regression to measure whether leaks dampen the transmission of monetary policy surprises to longer-term rates.

Customer accumulation, returns to scale, and secular trends

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks how rising returns to scale in production contributed to three concurrent U.S. secular trends since 1980: declining business dynamism, rising markups, and growing firm expenditures on customer acquisition. The author constructs a firm dynamics model in the Hopenhayn (1992) tradition with endogenous entry and exit, heterogeneous markups, and customer accumulation grounded in directed search in the product market. Firms compete for customers through both prices and selling activities; larger firms gain a competitive edge when returns to scale rise because their marginal costs fall more than those of smaller firms—even though the technological shift is uniform across firms. This demand-based channel triggers winners-and-losers dynamics and the rise of superstar firms.

The empirical foundation rests on Compustat data for U.S. publicly traded firms (1977–2014) and Business Dynamics Statistics (BDS) for aggregate and sector-level dynamism measures. Production-function estimation using Ackerberg, Caves, and Frazer (2015) augmented with sales-share controls documents that aggregate returns to scale rose from approximately 1.0 in 1980 to approximately 1.05 by 2014—a within-sector increase, not a reallocation effect. Over the same period, the cost-weighted markup rose by 42%, the firm entry rate fell by 33%, the excess reallocation rate fell by 29%, and selling costs relative to production costs rose by 60%–90% depending on the measure used.

The model is calibrated to 1980 steady-state moments (firm life-cycle patterns, markups, entry and reallocation rates). A 5% increase in returns to scale—matching the empirical estimate—accounts for: a +15 percentage point rise in the average cost-weighted markup (vs. +42% in the data); a 33% decline in the entry rate (exactly matching the data); a 21% decline in the reallocation rate (vs. 29% in the data); and a 23% increase in selling costs relative to production costs (vs. 60%–90% in the data). The model also generates a 53% rise in the share of firms aged 11 years or older (vs. 50% in the data) and a 58% decline in the employment share of firms aged 5 years or younger (vs. 56% in the data), closely tracking the aging of the U.S. firm population. Firm-level responsiveness to productivity shocks declines by 0.08 in the model, versus about 0.01 in Compustat and 0.09 in Decker et al. (2020).

Sector-level panel regressions with sector fixed effects confirm the model’s directional predictions: within-sector increases in returns to scale are associated with lower entry rates (coefficient −2.89, significant at 1%), lower reallocation rates (−1.16, significant at 1%), higher markups (+3.15, significant at 1%), and higher selling costs relative to production costs (+1.85 for the advertising-based measure; +8.52 for adjusted SG&A).

A key scope condition is that the model yields a constrained-efficient allocation: directed search and full internalization of returns to scale imply decentralized equilibrium efficiency, making the paper a laboratory for assessing how far efficient firm responses to technological change can explain the secular trends without invoking market failures. The model fits the post-2000 transition dynamics better than the 1980s–1990s period, and explains a substantial but incomplete share of the trends, suggesting complementary—possibly inefficient—forces also contributed.

Q: What is the core mechanism through which rising returns to scale generate winners-and-losers dynamics?

A: The marginal cost of production under increasing returns to scale (alpha > 1) is MC(z,n) = l(n,z)^(1−alpha) × (1/alpha) × (W/e^z), which depends on firm size l(n,z). A uniform rise in alpha rotates the marginal cost schedule clockwise by firm size: larger firms see a proportionally larger cost reduction than smaller firms, even though the technological change is identical across all firms. Because firms compete for the same pool of customers, this asymmetric cost advantage allows large firms to offer lower prices while sustaining higher margins, attracting customers away from small firms. The result is a demand-based channel that generates winners-and-losers dynamics and increases market concentration.

Q: How does the model capture customer accumulation, and why is it central to the paper’s argument?

A: The model introduces directed search in the product market, where firms post advertisements and customers—including those already matched with a firm—choose which submarket to enter by trading off offered utility against matching probability. A constant-returns-to-scale matching function governs match creation; in submarket with tightness theta, customers match with probability m(theta) = theta(1+theta)^(−1) and firms attract customers with probability q(theta) = (1+theta)^(−1). The customer accumulation motive creates an investment-harvest trade-off: firms can either post high promised utility (low prices) to grow their customer base or extract surplus through high prices. Rising returns to scale amplify large firms’ ability to resolve this trade-off favorably, linking the technological change directly to markup dynamics, entry incentives, and selling expenditures.

Q: What is the directed search framework’s role in ensuring equilibrium uniqueness and efficiency?

A: The author introduces firm-side commitment contracts—specifying price, separation probability, and continuation utility contingent on productivity realizations—combined with directed search. Because search is directed on both sides and firms fully internalize returns to scale, the decentralized equilibrium is constrained-efficient. This delivers uniquely determined heterogeneous prices in equilibrium (solving the indeterminacy problem common in customer-market models) and establishes the paper’s efficient-mechanism benchmark: it tests how far profit-maximizing firm responses to technological change—without any market failure—can account for the secular trends.

Q: How are prices structured in the model, and what life-cycle pattern do they generate?

A: Each firm charges two distinct prices in each period: one to incumbent customers (the same for all incumbents, since they are identical conditional on being attached to the same firm) and one to newly acquired customers (which varies based on the promised utility in the submarket searched). Firms that are expanding their customer base offer greater promised utility and therefore charge lower prices to attract customers; firms harvesting their existing base charge higher prices. Because firms enter small and grow, this dynamic generates a price life cycle: young firms invest via low prices and mature firms harvest through higher prices, which the model reproduces as a rising markup pattern over the firm life cycle—an untargeted moment the model fits well.

Q: What does the calibration target and what untargeted moments does the model reproduce?

A: The model is calibrated to 1980 using: the number of employees of entrant firms (pinning entry customer base n_e), employees of age-5 firms (pinning convex cost chi_1), share of firms aged 11+ years (pinning chi_2), average firm size (operating cost f), entry rate (entry cost kappa), excess reallocation rate (exit shock delta), and average cost-weighted markup (linear cost c). Untargeted moments reproduced include: a sales-weighted markup of 0.28 (vs. 0.25 in De Loecker et al. 2020), endogenous customer turnover of approximately 9% (vs. 15% in Gourio and Rudanko 2014), and an elasticity of customer base shrinkage to price of 0.08 (within the 0.01–0.16 range from Paciello et al. 2019). The model also matches markup and selling-cost life-cycle patterns that are typically overlooked.

Q: How large is the quantitative contribution of the 5% rise in returns to scale to each secular trend?

A: Comparing the 1980 steady state (alpha = 1) to the 2014 steady state (alpha = 1.05): the average cost-weighted markup rises by 15% in the model versus 42% in the data; the entry rate declines by 33% in the model, exactly matching the data; the reallocation rate declines by 21% in the model versus 29% in the data; and selling costs relative to production costs rise by 23% in the model versus 60%–90% in the data. The model thus explains a substantial share of each trend while leaving a residual requiring additional mechanisms.

Q: How does the model explain the aging of U.S. firms, and how well does it match the data?

A: The winners-and-losers mechanism shifts activity toward larger, older firms, which mechanically ages the firm population. The model generates a 53% increase in the share of firms aged 11 years or older (vs. 50% in the data) and a 58% decline in the employment share of firms aged 5 years or younger (vs. 56% in the data). This aging arises because rising returns to scale increase the cost of customer acquisition, acting as a barrier to entry that disproportionately hurts new, small firms while allowing large incumbents to remain viable at lower productivity thresholds.

Q: What is the channel through which rising returns to scale reduce business dynamism specifically?

A: The unequal reduction in marginal costs intensifies competition for customers and raises customer acquisition costs. This operates through two simultaneous effects on the exit threshold: (i) lower marginal costs allow large firms to remain viable at lower productivity levels despite higher customer acquisition costs; and (ii) heightened competition forces smaller firms to require higher productivity to survive in a market that has become increasingly costly to operate in. Higher customer acquisition costs therefore function as an endogenous barrier to entry, reducing the entry rate and the reallocation of resources across firms.

Q: Does the model attribute the secular trends entirely to efficient firm behavior, and what does it conclude about residual explanations?

A: No. The model is explicitly designed as a constrained-efficient benchmark, and the paper finds that while rising returns to scale account for a substantial share of the trends—particularly in magnitude—the transition dynamics show a less accurate fit before the 2000s. The author concludes that complementary mechanisms, likely involving inefficiencies (such as market power from horizontal product differentiation or barriers to entry beyond those captured by the model), played a significant role in the earlier evolution of these trends and in the portion of the trends not explained by the efficient channel.

Q: What evidence supports the rising returns to scale finding, and what are its limitations?

A: Production-function estimation using the Ackerberg-Caves-Frazer method with sales-share controls on Compustat data shows returns to scale rising from approximately 1.0 in 1980 to approximately 1.05 by 2014, driven primarily by within-sector increases rather than reallocation toward high-returns sectors. A translog production function finds limited evidence of heterogeneous increases across firm sizes within Compustat. However, Compustat predominantly covers large publicly traded firms; smaller firms outside the sample may have experienced minimal or no increase in returns to scale. If technology adoption involves fixed costs, the aggregate impact could be larger than estimated, meaning the quantitative exercises likely represent a conservative lower bound.

Q: How does the paper relate to and extend the directed search literature in product markets?

A: The paper builds on Gourio and Rudanko (2014) and Roldan-Blanco and Gilbukh (2020), where customers are locked in once matched, by introducing labor-search tools from Schaal (2017) to allow: (i) incumbent customer switching between firms at rates of 10%–25% annually (Gourio and Rudanko 2014), and (ii) a non-zero price sensitivity of incumbent customers (Paciello et al. 2019). It also allows firms to invest in demand through selling expenditures, which prior directed search models in product markets typically abstracted from, making it possible to study how technological changes affect customer reallocation and firms’ cost structures jointly.

Customer capital: The stock of customers a firm has accumulated through prior selling and pricing decisions; treated as a state variable that firms invest in (by offering low prices and spending on advertisements) or harvest from (by charging high markups), with a customer turnover rate estimated at 10%–25% annually in the literature.

Directed search in the product market: A market structure in which both firms and customers choose which submarket (indexed by the promised utility level) to enter, trading off match probability against terms; delivers constrained-efficient equilibrium and uniquely determined heterogeneous prices.

Investment-harvest trade-off: The firm’s dynamic choice between offering high promised utility (low prices, low current markups) to grow the customer base versus extracting surplus through high prices from an existing customer base; shaped by the firm’s current size, productivity, and the cost structure implied by returns to scale.

Returns to scale (alpha): The curvature of the production function y = e^z × l^alpha; equals 1.0 under constant returns and approximately 1.05 by 2014 in the empirical estimates; the paper’s central technological change parameter, whose rise disproportionately reduces marginal costs for larger firms.

Winners-and-losers dynamics: The reallocation of customers and market share from small to large firms triggered by the asymmetric cost advantage large firms obtain when returns to scale rise; the demand-based channel through which superstar firms emerge.

Cost-weighted markup: The average markup aggregated using each firm’s costs as weights, as opposed to sales-weighted markup; the primary measure of market power used in the paper, rising by 42% in the data between 1980 and 2014.

Constrained-efficient allocation: An equilibrium outcome in which, given the frictions present (search-and-matching in the product market), no social planner operating under the same constraints could improve welfare; the paper uses this as a benchmark to assess how far efficient firm responses explain secular trends without invoking market failures.

Selling costs relative to production costs: The ratio of customer acquisition expenditures (advertising or adjusted SG&A) to cost of goods sold; rose by 60%–90% in the data between 1980 and 2014 and by 23% in the model’s steady-state comparison.

Dynamic Concern for Misspecification

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

This paper asks how an agent who fears that none of their probabilistic models is the correct description of the data-generating process (DGP) should update that fear as evidence accumulates, and what long-run behavior such an agent exhibits. The central contribution is making the concern for misspecification endogenous: the better the agent’s structured models explain past observations, the less concerned the agent becomes.

Decision Criterion

The agent posits a finite-dimensional parametric set of structured models Θ, holds a prior µ over Θ, and evaluates each action according to an average robust control criterion. This criterion takes a weighted average (over models) of robust control assessments, where each assessment penalizes expected utility for probability distributions that deviate from the structured model in terms of relative entropy, scaled by a misspecification concern parameter λ > 0. A standard subjective expected utility maximizer is the limiting case as λ → 0 (no concern), and a maxmin agent is approached as λ → ∞.

Endogenous Misspecification Concern

The concern parameter λ is updated each period as a function of the likelihood ratio test (LRT) statistic of the structured models against unstructured alternatives, scaled by a time-normalizing sequence βₜ: λ(hₜ) = LRT(hₜ, Θ) / (2βₜ). The sequence βₜ determines how demanding the agent is in evaluating model fit.

Taxonomy of Agent Types

Three types emerge based on the speed of βₜ:

Statistician type (βₜ = ct, linear): applies a time scaling that keeps the LRT asymptotically informative about the degree of misspecification. This is the unique type satisfying both safety (long-run average payoff at least ε-close to the maxmin guarantee, almost surely) and consistency under almost correct specification (no ε-regret when misspecification is small).
Lenient type (t = o(βₜ)): attributes unexplained evidence to sampling variability; corresponds to the Law of Large Numbers intuition.
Demanding type (βₜ = o(t)): overly penalizes small discrepancies, analogous to the Law of Small Numbers fallacy (Tversky and Kahneman, 1971).

Standard SEU maximization fails safety; robust control with an invariant λ (Hansen and Sargent, 2001; 2022) fails consistency under almost correct specification.

Long-Run Convergence Results (Theorem 1)

For a misspecified agent (no θ ∈ Θ with qθ_{a*} = p*_{a*}), the nature of the limit action a* depends on the agent type:

Lenient type: a* is a Berk-Nash equilibrium — an SEU best reply to beliefs supported on the models with minimum relative entropy from the true DGP.
Demanding type: a* is a maxmin equilibrium — a worst-case best reply to all models absolutely continuous with respect to the true DGP.
Statistician type: if behavior converges, a* is a c-robust equilibrium — a robust control best reply to beliefs on the relative entropy minimizers, with the concern for misspecification endogenously set at minθ R(p*{a*} || qθ{a*}) / c.

For a correctly specified agent (Proposition 2), every limit action is a self-confirming equilibrium, regardless of the agent type.

Cycles and Limit Frequency (Section 4, Theorem 2)

The statistician type’s behavior need not converge. In natural settings, the agent cycles between actions: playing a “safe” action whose consequences are well-explained by Θ reduces concern for misspecification, eventually leading to a riskier action whose poorly-explained consequences raise concern again, inducing a return to the safe action. The paper proves that every limit frequency (empirical distribution over actions) is a mixed c-robust equilibrium — a generalization that allows mixing while tying the concern for misspecification to the frequency-weighted average relative entropy of each action.

Empirical Applications

Monetary policy cycles (Sargent 1999, 2008): In a central bank model where the true DGP includes increased inflation variability under aggressive policy (a feature absent from the bank’s structured models), no pure c-robust equilibrium exists for small c. The model predicts persistent cycles between conservative and aggressive policy. The frequency of the conservative policy is increasing in the strength of the exploitable inflation-unemployment trade-off (θ₁π + θ₁a).
Labor supply under complex tax schedules (Rees-Jones and Taubinsky, 2020): Agents with a “schmeduling” heuristic (linearizing the tax schedule) are misspecified. Berk-Nash equilibrium predicts these agents exert excess effort, with the bias increasing in the complexity (convexity) of the tax code. The c-robust equilibrium attenuates this bias: conditional on the equilibrium, minθ R(p*_a || qθ_a) > 0, so agents maintain positive concern for misspecification and pull back from the biased recommendation. The paper rationalizes the empirical finding that approximately 40% of agents hold the schmeduling belief but only about 20% fewer agents act on it — consistent with endogenous concern reducing the behavioral impact of the biased model.

Axiomatization (Section 5)

The paper axiomatizes the static average robust control criterion (Theorem 3) using: a Variational Axiom (from Maccheroni, Marinacci, and Rustichini, 2006a), a Structured Savage axiom (Sure-Thing Principle for bets on the model identity), an Intramodel Sure-Thing Principle (STP for bets conditional on the model), and Uniform Misspecification Concern (the agent is equally concerned about misspecification regardless of which model is identified as best-fitting). Three additional dynamic axioms characterize preference evolution: Constant Preference Invariance (utility index stable over time), Dynamic Consistency over Models (Bayesian updating over structured models), and Q-Likelihood (misspecification concern increases in the LRT). A novel Asymptotic Frequentism axiom characterizes the statistician type: preferences must become arbitrarily similar (in a precise quantitative sense) after sufficiently long histories with the same outcome frequency.

Layer 2 — Q&A

Q1: What is the average robust control criterion and how does it generalize prior decision criteria?

A: An agent evaluates action a by averaging over structured models θ a robust control assessment: for each θ, minimize expected utility over probability distributions within relative entropy distance (penalized by 1/λ) of qθ_a, then integrate over θ with prior µ. This nests SEU (λ → 0, perfect trust in models), standard robust control of Hansen and Sargent (2001) (µ is Dirac, single benchmark model), and maxmin expected utility of Gilboa and Schmeidler (λ → ∞). The key extension is allowing µ to be nondegenerate, so the agent is simultaneously uncertain about the best-fitting model and about whether any model is exact.

Q2: What is the role of the likelihood ratio test statistic in driving misspecification concern?

A: The LRT statistic compares the maximum likelihood of the structured models against the best unstructured alternative. It diverges almost surely when the agent is misspecified, regardless of how close the structured models are to the true DGP. The concern parameter λ(hₜ) = LRT(hₜ, Θ) / (2βₜ) uses a time-scaling sequence βₜ to keep this statistic interpretable. Without scaling, a misspecified agent’s concern would always explode to infinity.

Q3: Why does linear time scaling (βₜ = ct) uniquely characterize the statistician type as rational?

A: Proposition 1 establishes two properties: (1) ε-safety — every βₜ = ct-optimal policy achieves average payoff at least ε below the maxmin guarantee, almost surely; (2) ε-consistency under almost correct specification — for DGPs sufficiently close to Θ, the agent avoids long-run regret. Part 2 of Proposition 1 shows that no βₜ with βₜ = o(t) or t = o(βₜ) satisfies both properties simultaneously. SEU fails safety; invariant-λ robust control fails consistency.

Q4: What is a c-robust equilibrium and how does it differ from a Berk-Nash equilibrium?

A: A Berk-Nash equilibrium (Esponda and Pouzo, 2016) requires the action to be an SEU best reply to beliefs supported on the relative entropy minimizers of the true DGP. A c-robust equilibrium requires the same support condition but with the best reply taken under the average robust control criterion, where the concern for misspecification λ equals minθ R(p*{a*} || qθ{a*}) / c — that is, the minimum relative entropy scaled by 1/c. The endogenous λ is positive whenever the agent is misspecified, so the agent does not fully trust even the best-fitting model.

Q5: How does the paper explain that misspecified lenient types converge to Berk-Nash while demanding types converge to maxmin?

A: For the lenient type (t = o(βₜ)), the time scaling makes the concern for misspecification converge to 0 (the LRT grows slower than βₜ relative to t), so the agent effectively behaves as an SEU maximizer with beliefs on the KL-minimizing models — the Berk-Nash condition. For the demanding type (βₜ = o(t)), the LRT diverges relative to βₜ, so λ → ∞ and the agent’s preferences converge to worst-case evaluation over all models absolutely continuous with the true DGP — the maxmin condition. These are Theorem 1, parts 1 and 2.

Q6: Why does the statistician type exhibit cycles rather than convergence?

A: Section 4 and Corollary 1 show in the monetary policy application that no pure c-robust equilibrium exists for small c. Intuitively, the conservative policy (a=0) is a best reply to a high misspecification concern, but it produces outcomes well-explained by Θ, which drives concern down. The aggressive policy (a=1) is a best reply to a low concern, but it generates increased inflation variability not captured in Θ, which drives concern up sharply. There is no fixed point that is self-sustaining, so the agent cycles. Theorem 2 shows that the empirical frequency of actions still converges to a mixed c-robust equilibrium.

Q7: What are the quantitative comparative statics for the monetary policy cycles?

A: Corollary 1 establishes that there exists a threshold c̄ > 0 such that for all c ≤ c̄: (1) no pure c-robust equilibrium exists; (2) a mixed c-robust equilibrium exists; and (3) in the maximal and minimal equilibria, the frequency of the conservative policy α*(0) is increasing in θ₁π + θ₁a — a larger exploitable trade-off between inflation and unemployment implies more time spent on the aggressive policy.

Q8: How does the model rationalize the Rees-Jones and Taubinsky (2020) labor supply finding?

A: Rees-Jones and Taubinsky (2020) find that approximately 40% of agents have incentive-compatible beliefs consistent with the schmeduling heuristic (linearizing a convex tax schedule), but approximately 20% fewer agents act according to that heuristic. In a Berk-Nash equilibrium, the schmeduling agent exerts excess effort relative to the optimum; the more convex the tax code, the larger the excess. In a c-robust equilibrium, the agent retains a positive misspecification concern proportional to the deviation between the convex tax schedule and the linear approximation. Higher effort levels are more exposed to uncertainty in the marginal rate (the misspecified term θ+ε multiplies a higher average income z), so the concern for misspecification provides a natural force that reduces effort below the Berk-Nash prediction. The paper notes this finding is also consistent with an alternative interpretation in Rees-Jones and Taubinsky where all agents hold schmeduling beliefs but under-respond behaviorally.

Q9: What is the mixed c-robust equilibrium and why does it always exist?

A: A mixed c-robust equilibrium is a mixed action α* ∈ Δ(A) such that beliefs ν are supported on the relative entropy minimizers Θ(α*) — computed as the parameter minimizing the α*-weighted average relative entropy across actions — and every action in the support of α* is a best reply under the average robust control criterion with λ = minθ Σ_a α*(a) R(p*_a || qθ_a) / c. Proposition 3 proves existence by mapping this fixed-point condition to a Nash equilibrium in an auxiliary game between the agent and two adversarial Nature players, then invoking Reny (1999) on that game. A pure c-robust equilibrium need not exist, but mixing over actions allows the concern for misspecification to be calibrated to the frequency of poorly-explained actions.

Q10: How does Theorem 2 formally connect cycles to mixed c-robust equilibria?

A: Theorem 2 states that if βₜ = ct for all t and α* is a βₜ-limit frequency (i.e., the empirical action distribution converges to α* with positive probability under some optimal policy), then α* is a mixed c-robust equilibrium. The intuition is that when α* places weight on both a well-explained action and a poorly-explained action, the time-averaged relative entropy stabilizes at a fixed level, producing a stable endogenous concern for misspecification that makes the agent asymptotically indifferent between the actions in the support — sharply reducing the incentive to break the cycle.

Q11: What does the axiomatization contribute beyond the learning results?

A: The axiomatization (Section 5, Theorem 3) provides behavioral foundations observable from choices, without assuming the internal LRT mechanism. Two primary axioms pin down the average robust control criterion within the variational class: Structured Savage (Sure-Thing Principle for bets over model identity) and Uniform Misspecification Concern (equal concern for misspecification regardless of which model is revealed as best-fitting). Dynamic Consistency over Models pins down Bayesian updating. Q-Likelihood axiomatizes that the concern for misspecification is ordinally increasing in the LRT. The novel Asymptotic Frequentism axiom (Axiom 9) pins down the quantitative speed of adjustment: long histories with the same empirical frequency must induce asymptotically similar preferences, and Proposition 5 shows this implies λ_{hₜ} / (LRT(hₜ, Q) / (2tₙ)) converges to a finite limit — exactly the statistician type’s linear scaling.

Q12: What is the correlation between behavioral biases that the model predicts?

A: The paper derives three novel empirical predictions about the cross-sectional and time-series correlation of uncertainty attitudes: (1) long-run uncertainty aversion positively correlates with initial misspecification and with belief in the Law of Small Numbers; (2) these correlations are causal — repeated model failures and overly demanding evaluation induce a shift toward cautious behavior; (3) even holding misspecification and probability reasoning fixed, limit uncertainty attitudes are stochastic, depending on whether the limit action’s outcomes are well-explained by the structured models.

Q13: How does Example 2 (Correlation Neglect) show that endogenous concern can amplify rather than attenuate biases?

A: In a double auction, a buyer who mistakenly treats their own valuation and the ask price as independent (Correlation Neglect, Esponda, 2008) bids below the optimum in Berk-Nash equilibrium. In a c-robust equilibrium, the positive correlation between valuations and prices produces a strictly positive minθ R(p*{a*} || qθ{a*}), so the agent maintains misspecification concern. Since lower bids are accepted with lower probability (and thus are less sensitive to model misspecification), the endogenous concern drives the agent to bid even lower — amplifying the bias rather than attenuating it. This example illustrates that the direction of the correction depends on the geometry of how the misspecification interacts with the payoff structure.

Key Concepts

Average Robust Control Criterion: The decision criterion proposed in the paper. An agent evaluates action a by taking the expectation over structured models θ (with prior µ) of min_{p_a ∈ Δ(Y)} [E_{p_a}[u(a,y)] + (1/λ) R(p_a || qθ_a)]. This is a weighted average of robust control assessments, each penalizing distributions that deviate from a structured model in relative entropy. The parameter λ > 0 governs the intensity of misspecification concern, with SEU as the limit at λ → 0 and maxmin at λ → ∞.

Endogenous Misspecification Concern: Unlike prior robust control models where λ is fixed or set externally, here λ(hₜ) = LRT(hₜ, Θ) / (2βₜ) is a function of how well the structured models explain the observed history hₜ via the likelihood ratio test statistic. The better the models explain past data, the smaller λ becomes and the less the agent hedges.

Statistician Type: An agent who scales the likelihood ratio test statistic with a linear time sequence βₜ = ct for some c > 0. This is the unique agent type satisfying both ε-safety (guaranteed long-run average payoff above the maxmin guarantee minus ε) and ε-consistency under almost correct specification (no long-run regret when misspecification is small). The statistician type’s linear scaling is the only one for which the LRT statistic retains asymptotic informativeness about the degree of misspecification.

c-Robust Equilibrium: A fixed-point concept for the long-run behavior of the statistician type. Action a* is a c-robust equilibrium if it is an average robust control best reply to beliefs supported on Θ(a*) = argmin_θ R(p*{a*} || qθ{a*}), with misspecification concern λ = minθ R(p*{a*} || qθ{a*}) / c. This generalizes Berk-Nash equilibrium by incorporating an endogenous hedging motive proportional to the minimum relative entropy between the true DGP and the best structured model.

Mixed c-Robust Equilibrium: A generalization of c-robust equilibrium to mixed actions α* ∈ Δ(A) for environments where no pure equilibrium exists. The beliefs are supported on the models minimizing the α*-weighted average relative entropy, and the misspecification concern is tied to that average entropy. Every βₜ-limit frequency is a mixed c-robust equilibrium (Theorem 2). This concept characterizes the long-run time-average behavior when the statistician type cycles.

Law of Small Numbers (LSN) Type / Demanding Type: An agent for whom βₜ = o(t), meaning the time scaling grows sub-linearly. This agent is excessively sensitive to early model failures (analogously to the Law of Small Numbers fallacy of Tversky and Kahneman, 1971, where short-run frequencies are treated as the long-run norm). The long-run behavior of such a type converges to maxmin behavior rather than robust control.

Asymptotic Frequentism (Axiom 9): A novel axiom requiring that conditional preferences after sufficiently long histories with the same empirical outcome frequency must be arbitrarily similar (in a quantitative sense defined by measuring rods x, y, E) to a limiting preference. This axiom axiomatically pins down the statistician type’s linear time scaling: it implies that the ratio λ_{hₜ} / (LRT(hₜ, Q) / (2t)) converges to a finite limit c, exactly characterizing βₜ = ct.

Berk-Nash Equilibrium: The equilibrium concept (Esponda and Pouzo, 2016) that describes the long-run behavior of lenient (SEU) agents learning under misspecification. An action a* is a Berk-Nash equilibrium if it is an SEU best reply to beliefs supported on Θ(a*) — the KL-minimizing models — without any additional hedging against misspecification. The current paper shows that lenient types converge to Berk-Nash equilibria, while statistician types converge to c-robust equilibria that differ by incorporating a positive misspecification concern.

From Doubt to Devotion: Trials and Learning-Based Pricing

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies a dynamic mechanism design problem in which an informed seller sells an experience good to a skeptical buyer who learns about the product through consumption. The central question is: how does a seller leverage proprietary data about product-buyer match quality together with the buyer’s ability to learn, and what are the welfare implications in equilibrium?

The model features a seller who privately observes a binary match quality (theta in {H, L}) between their service and the buyer. The buyer does not observe match quality and has an initially unknown private value v for the good, drawn from a Myerson-regular distribution F with support [v_low, v_high] and normalized mean E[v] = 1. If the match is high, the buyer receives instantaneous utility rewards according to a Poisson process with flow rate lambda*I, where I in [0,1] is the seller-controlled access level. Upon receiving the first reward, the buyer perfectly learns both match quality theta and their own value v. The seller commits to a dynamic mechanism over time horizon T = [0, T] specifying access and prices conditional on reported histories. Both parties are risk-neutral and there is no discounting in the baseline.

Two benchmark cases show the first-best is attainable absent both key features simultaneously. If trade is static (prices set only at time 0) or if the seller is uninformed about theta, the seller achieves first-best revenue of lambdamu_0T by selling the entire service upfront. Proposition 1 establishes both cases; this implies that consumer data on theta is not required for maximizing social welfare, and it is weakly dominant for a seller to never collect consumer data in static environments.

The central result is that the combination of dynamic pricing and seller private information breaks the first-best. A high-type seller can deviate by offering a “Myersonian free trial”: provide full access up to time tM (defined as argmax_t {(1 - exp(-lambdat))(T - t)}), then offer the remaining service at post-trial price lambdavM(T - tM), where vM is the Myerson monopoly price. The buyer accepts the trial regardless of beliefs (participation is weakly dominant) and purchases the post-trial service if and only if v >= vM. This deviation yields payoff pi_F = (1 - exp(-lambdatM))(1 - F(vM))lambdavM*(T - tM). Proposition 2 states that the first-best cannot be implemented in any equilibrium if and only if pi_F > lambdamu_0T. Corollary 1 shows this condition holds for sufficiently large T, since pi_F grows proportionally with T while the first-best also grows with T but the ratio converges to a constant less than 1 only for some parameter configurations and exceeds 1 for others.

Theorem 1 (the main mechanism design result) characterizes the boundary of the IC-IR feasible payoff set: any mechanism on this boundary is outcome-uniquely implemented by a trial mechanism, defined by a triple (v0, t0, p0) — a trial length, a post-trial value threshold, and a trial price. During [0, t0] uninformed buyers receive full access; after t0 only buyers who received a reward with v >= v0 continue at a premium. Trial length t0 is weakly increasing in the weight placed on the low-type seller and in the prior mu_0; post-trial threshold v0 is weakly decreasing in the same objects (Proposition 3).

Equilibrium payoffs (Proposition 5) are precisely the IC-IR feasible pairs satisfying pi_H >= pi_F, implemented by pooling trial mechanisms in which both seller types propose identical mechanisms and the buyer updates beliefs only through private consumption signals. Under the D1 refinement (Proposition 6), only mechanisms with trial length tM and post-trial threshold vM survive. These have the shortest trial and highest post-trial price of all equilibrium mechanisms, minimize social surplus, and may leave both seller types strictly worse off than in a world without private information — directly contrasting the static informed principal result of Koessler and Skreta (2016) where data always helps the seller.

When the seller can control service quality q in addition to access I (Section 6), the relevant equilibrium mechanisms become dynamic tiered pricing rather than binary trials: a low-quality, high-ad-load free tier provides learning opportunities while reducing information rents; convinced buyers upgrade to a premium ad-free tier. Counterintuitively, enriching the seller’s screening technology can reduce both revenue and social efficiency in equilibrium because additional instruments create additional signaling opportunities that distort outcomes further.

Q: What is the core tension that prevents the first-best from being an equilibrium?

A: When the seller is privately informed and pricing is dynamic, the high-type seller anticipates a greater likelihood of the buyer receiving a utility shock than the buyer’s own prior implies. This belief gap makes it profitable for the high-type seller to deviate from a proposed first-best mechanism by offering a free trial that “proves” high match quality and then extracting rent from convinced buyers. Because this deviation is profitable — yielding pi_F > lambdamu_0T under some parameters — the first-best pooling contract unravels. The interaction of both ingredients (dynamic pricing and informed seller) is necessary: either ingredient alone is insufficient to break the first-best (Proposition 1).

Q: What exactly is the Myersonian free trial and why does the buyer always accept it?

A: The Myersonian free trial provides full service access up to time tM = argmax_t {(1 - exp(-lambdat))(T - t)} at (approximately) zero price, then offers the remaining service at price lambdavM(T - tM) where vM is the Myerson monopoly price. The buyer accepts the trial regardless of their prior belief about match quality because the trial itself is free and provides non-negative payoff. After the trial, the buyer purchases the post-trial service if and only if they received a reward with v >= vM; otherwise they exit. The deviation payoff is pi_F = (1 - exp(-lambdatM))(1 - F(vM))lambdavM*(T - tM).

Q: Under what parametric conditions can the first-best not be supported in equilibrium?

A: By Proposition 2, the first-best cannot be implemented if and only if pi_F > lambdamu_0T. Corollary 1 states that for sufficiently large T this always fails, since as T grows, pi_F grows proportionally (the post-trial term (T - tM) dominates) while tM converges to a finite value. More precisely, for large T, pi_F / (lambdamu_0T) converges to (1 - exp(-lambda*tM)) * (1 - F(vM)) * vM / mu_0, which exceeds 1 under appropriate parameter configurations. Conversely, when mu_0 is high or the service horizon is short, the first-best may remain implementable.

Q: What is a trial mechanism and how does Theorem 1 characterize it?

A: A trial mechanism is defined by a triple (v0, t0, p0): uninformed buyers receive full access on [0, t0] and no access thereafter; a buyer who reports a reward of value v >= v0 at time t receives full service for the remainder [t, T] at a price increment of lambdav0(T - t0); the trial itself is priced at p0. Theorem 1 states that any payoff pair on the boundary of the IC-IR feasible set is outcome-uniquely attained by such a trial mechanism with appropriately determined (v0, t0, p0). The proof uses a relaxed problem retaining only two key constraint families: local incentive constraints on value reporting (IC-V) and a global intertemporal constraint preventing buyers from hiding the arrival of rewards forever (IC-U).

Q: How does the trial length respond to changes in prior belief mu_0 and distributional spread?

A: Proposition 3 states that t0 is weakly increasing in mu_0: as market belief becomes more optimistic, both seller types extract higher revenue from the trial, so the mechanism designer extends the trial. Proposition 4 adds that for a uniform distribution on [1-delta, 1+delta], trial length t0 is weakly increasing in delta (greater spread). The post-trial threshold v0 is weakly decreasing in mu_0, meaning that a more optimistic prior leads to a less exclusive post-trial cutoff.

Q: What are the equilibrium payoffs and how does the high-type seller’s free-trial option constrain them?

A: Proposition 5 states that (pi_L, pi_H) is an equilibrium payoff if and only if it lies in the IC-IR feasible set and pi_H >= pi_F. The lower bound pi_H >= pi_F reflects the high-type seller’s outside option: they can always deviate to the Myersonian free trial. Corollary 4 then shows that all “reasonable” equilibrium payoffs (those with pi_H >= pi_L, surviving a mild off-path refinement) are implemented by trial mechanisms with complete pooling — both seller types propose the same mechanism and the buyer updates beliefs only through private consumption signals, not the mechanism’s structure.

Q: What does the D1 refinement select and why do it lead to worse outcomes?

A: Proposition 6 shows that the only equilibrium trial mechanisms surviving the D1 criterion have trial length tM and post-trial threshold vM — the Myersonian free trial parameters. These have the shortest trial and highest post-trial price among all equilibrium mechanisms, resulting in the minimum social surplus. The intuition is that the high-type seller signals credibly by proposing mechanisms that generate high revenue from post-trial price discrimination (which the low type cannot profit from), pushing toward maximum learning-based discrimination. All D1-surviving payoffs are Pareto dominated by the point H (the unconstrained IC-IR optimum) for any prior mu_0, and Pareto dominated by point B when mu_0 is small.

Q: Can having consumer preference data hurt the seller, and under what conditions?

A: Yes. The distortion from signaling incentives can be so large that both seller types earn strictly less in the D1-surviving equilibrium than they would if neither possessed private information (where the first-best is attained). This result holds when the condition of Proposition 2 is satisfied — i.e., when pi_F > lambdamu_0T. This contrasts sharply with the static result of Koessler and Skreta (2016), in which the ex-ante profit-maximizing mechanism is always supportable in equilibrium and data always (weakly) helps sellers.

Q: How do trial mechanisms differ from the prior literature on signaling through introductory prices?

A: The earlier literature (Milgrom and Roberts 1986; Bagwell 1987; Bagwell and Riordan 1991; Judd and Riordan 1994) uses two-period models with no seller commitment, so all pricing behavior is necessarily trial-like by model restriction. The present model instead allows the seller full flexibility to design any dynamic mechanism — including selling everything ex-ante, which would prevent buyers from gaining information rent. Trials emerge endogenously as the equilibrium outcome rather than being imposed by the model structure, and the paper provides new economic content on what determines trial length and price thresholds.

Q: What happens when the seller controls service quality in addition to access?

A: Section 6 extends the baseline by allowing the seller to choose (I, q) from a subset of [0,1]^2, where I governs the Poisson arrival rate and q scales the reward value (utility from a reward is v*q). Theorem 2 shows that the relevant equilibrium mechanisms now take the form of dynamic tiered pricing: a low-quality tier (interpreted as high ad load) provides learning opportunities while reducing information rents; once convinced, buyers upgrade to a premium high-quality tier. Enriching the screening technology in this way can reduce both revenue and social efficiency in equilibrium, because additional instruments create additional signaling opportunities that distort outcomes further from the revenue-maximizing benchmark.

Q: What are the two sources of welfare loss relative to the first-best in D1-surviving equilibria?

A: The welfare analysis in Appendix F identifies two sources. First, exclusion inefficiency: buyers with values v in [v_low, vM) who would generate positive surplus are excluded from post-trial service. Second, service truncation inefficiency: service access is cut off after trial length tM for buyers who were never convinced (theta = L type realizations and high-type buyers with v < vM), reducing total surplus below the first-best of mu_0 * lambda * T. Both losses are minimized (welfare is maximized) among trial mechanisms by longer trials and lower post-trial cutoffs, precisely the opposite of what D1 selects.

Q: Does the model extend to continuous seller types or multiple buyer types?

A: Appendix K outlines an extension to continuous seller types theta drawn from a distribution G on [theta_low, theta_high], where rewards arrive at rate lambdaItheta. The main economic forces persist: higher seller types anticipate faster buyer learning and have stronger incentives to offer trials. The main results generalize: equilibrium mechanisms are trial mechanisms, and under D1, pooling equilibria with maximum post-trial discrimination are selected. Appendix G similarly notes that the multiple-buyer-type extension preserves complete pooling and the D1 selection result.

Q: What is the role of the “global intertemporal constraint” (IC-U) in the proof of Theorem 1?

A: The canonical approach to dynamic mechanism design (Eso and Szentes 2007; Pavan, Segal, and Toikka 2014) relaxes the problem to only local incentive constraints on the initial report. This fails here because the informed seller causes buyer and seller to disagree on the evolution of buyer beliefs, making the timing of trade matter and requiring tracking of incentive constraints at every point in time. The paper identifies two key binding constraints in the relaxed problem: (IC-V) the buyer does not misreport their reward value, and (IC-U) the buyer does not remain silent about the arrival of a reward forever. Retaining only these two constraint families yields a tractable bang-bang solution for the optimal access policy, which is then verified to satisfy all original IC-IR constraints.

Q: What are the implications for platform design and data collection strategy?

A: The results imply that the value of consumer data depends critically on market dynamics. In static markets, collecting data about consumer match quality is weakly beneficial for sellers (Proposition 1, first point). In dynamic markets with buyer learning and sufficiently long service horizons, the same data can strictly reduce seller revenue by enabling a deviation that unravels first-best pricing. This suggests platforms in dynamic digital markets should weigh whether possessing and acting on proprietary match data improves or worsens their equilibrium position, and that regulatory attention to consumer data collection in dynamic markets may have welfare-ambiguous effects.

Trial mechanism: A dynamic mechanism parameterized by (v0, t0, p0) in which the seller provides full service access during [0, t0] for uninformed buyers, offers continued service after t0 only to buyers who received a reward with value v >= v0, and charges a post-trial price of p0 + lambdav0(T - t0) for those who qualify. In the paper’s usage, this is the unique outcome-implementing mechanism on the boundary of the IC-IR feasible payoff set.

Myersonian free trial: The limiting trial mechanism as the trial price epsilon approaches zero, with trial length tM = argmax_t {(1 - exp(-lambdat))(T - t)} and post-trial threshold vM equal to the Myerson monopoly price. It yields payoff pi_F = (1 - exp(-lambdatM))(1 - F(vM))lambdavM*(T - tM) to the high-type seller, and constitutes the binding outside option constraining equilibrium payoffs.

Belief gap: The divergence between the seller’s and buyer’s beliefs about the rate at which the buyer will receive Poisson rewards. Because the high-type seller knows theta = H, they anticipate a higher probability of reward arrival than the buyer’s prior implies. This gap makes the buyer’s belief process non-martingale from the seller’s perspective, breaking the standard dynamic mechanism design approach and creating profitable deviation incentives.

IC-IR feasible payoff set: The set of seller payoff pairs (pi_L, pi_H) achievable by mechanisms satisfying both incentive compatibility (for seller type reports and buyer learning reports) and individual rationality (non-negative ex-ante payoffs for all parties). Theorem 1 establishes that the boundary of this set is uniquely implemented by trial mechanisms.

Dynamic tiered pricing: The equilibrium mechanism form that emerges when the seller controls both access I and service quality q. It features a low-quality tier (high ad load) providing learning opportunities at reduced information rent, and a premium tier offering full quality to buyers convinced of high match quality. This generalizes trial mechanisms to settings with richer screening technology.

Global intertemporal constraint (IC-U): The constraint requiring that, upon receiving a Poisson reward, the buyer finds it suboptimal to remain silent about its arrival forever. Together with the local value-reporting incentive constraint (IC-V), these two constraints constitute the binding restrictions in the paper’s relaxed mechanism design problem, replacing the full continuum of incentive constraints that would otherwise be intractable.

D1 criterion: A standard equilibrium refinement from signaling games applied here to the space of mechanism proposals. Among all pooling equilibrium trial mechanisms, D1 selects only those with parameters (tM, vM) — the shortest trial length and highest post-trial threshold — because the high-type seller has a strictly larger set of buyer responses for which deviation to a high-discrimination mechanism is profitable. These surviving mechanisms Pareto dominate no other equilibrium mechanism and minimize social surplus.

Ideological Alignment and Evidence-Based Policy Adoption

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates how the ideological alignment between knowledge-disseminating institutions and policymakers affects the adoption of evidence-based policies. The core research question is whether, and through which mechanisms, the ideology of the messenger — rather than the content of the message — determines whether local policymakers act on rigorous research evidence.

The authors conduct a country-wide randomized controlled trial (RCT) across 5,678 touristic Spanish municipalities. The policy recommendation derives from Hinnosaar et al. (2021), an RCT demonstrating that minor improvements to municipalities’ Wikipedia pages (adding photographs, local festival information, touristic landmark details) increased overnight tourist stays by 9%. This policy was chosen because it is ideologically neutral, low cost, within local policymakers’ remit, and its implementation is directly traceable via Wikipedia edit histories.

Municipalities were randomized into five treatment arms and a control group (approximately 950 municipalities each), stratified by ruling party ideology, population, and touristic accommodation count. Three arms received the same policy brief endorsed by: (1) an ideologically aligned think tank (FAES for right-wing municipalities, Fundación Alternativas for left-wing), (2) the ideologically opposite think tank, or (3) an ideologically nonsalient researcher from the London School of Economics. Two further arms received links to newspaper articles covering the same research from either an ideologically aligned outlet (El Mundo for right, Eldiario.es for left) or an ideologically opposite outlet. The control group received no information. The experiment ran from May to December 2022, with multiple reminder emails sent across the period.

The main outcome is a binary indicator for whether a municipality’s Wikipedia page was changed in line with the recommended guidelines during the study period, coded blind to treatment status by two independent coders.

Key findings: Pooled across all treatment arms, information provision increased the probability of policy adoption by approximately 0.98 percentage points (a 38% relative increase over the control group baseline), but this effect is only marginally above conventional significance thresholds (p-value = 0.13). The aggregate effect masks sharp heterogeneity by ideological alignment. When the informing institution’s ideology aligns with the policymaker’s, policy adoption increases by 1.68 percentage points (think tank) and 1.67 percentage points (newspaper) relative to the control group — equivalent to a 66% and 65% relative increase, respectively, both statistically significant at the 5% level. By contrast, information from an ideologically opposite institution produces a coefficient that is negligible and statistically indistinguishable from zero, indicating that misaligned information is no more effective than receiving no information at all. The ideologically nonsalient LSE researcher arm produced an intermediate effect (0.94 percentage points, 37% relative increase), but the p-value (0.27) exceeds conventional thresholds, and the effect is not statistically distinguishable from either the aligned or the control condition. Policy briefs and newspaper articles are equally effective when ideologically aligned (difference of 0.1 percentage points, p-value = 0.82).

To decompose mechanisms, the authors propose a three-stage framework: (1) selective exposure to information, (2) belief updating, and (3) policy implementation. Email click-through rates (access to the full policy brief or article once the informing institution is revealed) do not differ significantly across treatment arms, ruling out selective exposure as the operative mechanism. A post-intervention online survey experiment with 1,600 policymakers from 1,196 municipalities shows that those receiving information from an aligned or nonsalient institution updated their beliefs about policy effectiveness significantly more than those receiving information from an opposite institution, implicating belief updating as one operative channel. However, comparing the survey experiment (where nonsalient and aligned treatments produce similar belief updating) with the main experiment (where the aligned arm adopts at nearly twice the rate of the nonsalient arm, though not statistically distinguishable) suggests that ideological alignment also affects the third stage — policy implementation — beyond mere belief updating.

The estimated monetary cost of ideological misalignment is 2,192 euros per municipality per year, calculated using the impact of Wikipedia changes on touristic revenues from Hinnosaar et al. (2021).

Scope conditions: The context is Spanish local government, a policy that is explicitly non-ideological, low-cost, and easily implemented. Generalizability to ideologically charged or costly policies is not established. Left-wing municipalities show larger responses to aligned information, though this heterogeneity is not statistically significant at conventional levels.

Q: What is the baseline rate of policy adoption in the control group, and what does the aligned-institution treatment achieve in absolute terms?

A: The paper reports that ideologically aligned institutions increase the share of municipalities implementing recommended Wikipedia changes by 1.68 percentage points (think tank) and 1.67 percentage points (newspaper) relative to the control group. Working backward from the stated 66% and 65% relative increases, this implies a control group baseline of approximately 2.5 percentage points. The aligned effects are statistically significant at the 5% level.

Q: Does information from an ideologically opposite institution have any effect on policy adoption?

A: No. The coefficient for opposite-ideology treatment arms is negligible in magnitude, closely resembling the near-zero coefficients from the placebo analysis conducted for the same months in 2019 (pre-intervention). The authors conclude that receiving information from an ideologically opposite institution is statistically indistinguishable from receiving no information at all. This null result is consistent across heterogeneity analyses by mayor ideology, municipality population, Wikipedia page length, and party type.

Q: How does the ideologically nonsalient (LSE researcher) treatment compare to aligned and opposite arms?

A: The nonsalient arm increases policy adoption by 0.94 percentage points (a 37% relative increase), approximately half the effect of the aligned arm (1.68 percentage points). However, the p-value is 0.27, and the effect is not statistically different from either the aligned arm (p-value = 0.34) or the control group at conventional confidence levels. The result should therefore be interpreted with caution.

Q: Are policy briefs or newspaper articles more effective in promoting policy adoption?

A: Neither format is significantly more effective than the other. Conditional on ideological alignment, the difference between policy brief and newspaper article effects is 0.1 percentage points with a p-value of 0.82. Both are equally effective when ideologically aligned with the receiving policymaker, a finding the authors describe as a novel contribution to the policy communication literature.

Q: Does ideological alignment affect whether policymakers choose to access the full information (selective exposure)?

A: No. Click-through rates on the links to policy briefs or newspaper articles — measured after policymakers have seen the informing institution’s identity — do not differ significantly across treatment arms. The observed average click-through rate is 6.42%. This null result is consistent with the hypothesis that policymakers do not strategically filter information acquisition based on the messenger’s ideology, at least for non-ideological policies.

Q: What does the survey experiment reveal about belief updating?

A: In the post-intervention survey experiment with 1,600 policymakers, participants first reported beliefs about a purportedly beneficial (but actually harmful) policy, then were randomly assigned to receive information about its negative effects from an aligned, opposite, or nonsalient think tank. Those receiving information from an aligned or nonsalient institution updated their beliefs significantly more than those receiving information from an ideologically opposite institution. This implicates belief updating — not just selective exposure — as a channel through which ideological alignment affects policy adoption.

Q: Why do the authors conclude that ideological alignment also affects the third stage (policy implementation) beyond belief updating?

A: In the survey experiment, aligned and nonsalient institutions produce statistically similar belief updating. Yet in the main field experiment, the aligned arm adopts policy at nearly twice the rate of the nonsalient arm (1.68 vs. 0.94 percentage points), although this difference is not statistically significant. The authors interpret this gap as suggestive evidence that ideological alignment affects policy implementation through channels beyond belief updating — such as career concerns, party cues, or the political economy of implementation — though they acknowledge the evidence is indirect and the treatment difference is not statistically distinguishable.

Q: What is the estimated economic cost of ideological misalignment?

A: The authors estimate a cost of 2,192 euros per municipality per year attributable to ideological misalignment between the informing institution and the receiving policymaker. This calculation uses the estimated impact of Wikipedia changes on touristic revenues from Hinnosaar et al. (2021) and reflects not the cost of not implementing the policy, but the marginal cost of using an ideologically opposite rather than aligned institution to disseminate the research evidence.

Q: How did outside researchers’ predictions compare to actual results?

A: Researchers surveyed on the Social Science Prediction Platform correctly anticipated the rank ordering of treatment effectiveness (aligned > nonsalient > opposite > control) but substantially overestimated adoption rates in every arm. They predicted relative increases of 144%, 103%, and 48% for aligned, nonsalient, and opposite conditions respectively, compared to actual relative increases of roughly 65%, 37%, and ~0%. Email opening rates were the most accurately predicted (49% predicted vs. 38% actual). The results highlight the difficulty of translating evidence into policy even for simple, low-cost interventions.

Q: What are the main threats to validity and how are they addressed?

A: Three main threats are considered. First, differential email opening rates across treatment arms: addressed by showing the informing institution was revealed only after email opening, and confirmed by finding no significant differences in opening rates across groups. Second, spillovers between municipalities: the endline survey shows only 5 of 236 control-group respondents reported receiving any information from external sources; spillover distance analyses in Table D.II find no significant effect on control municipalities’ adoption rates. Third, contamination bias in multi-arm RCTs with strata fixed effects: addressed by replicating main results using the Goldsmith-Pinkham et al. (2022) method, yielding nearly identical estimates.

Q: What heterogeneity is observed across left- and right-wing municipalities?

A: The positive effect of receiving information from an ideologically aligned institution appears larger for left-wing municipalities, with coefficients approximately three times larger than for right-wing municipalities, but this difference is not statistically significant at conventional confidence levels. The authors caution that the strength of ideological alignment may differ systematically between the partner think tanks on the left and right, making direct comparisons between left- and right-wing effects difficult to interpret cleanly.

Q: How does the paper relate to prior work on evidence-based policymaking?

A: The closest prior work is Hjort et al. (2021) and Mehmood et al. (2024), which examine the impact of scientific evidence access on actual policy adoption, and DellaVigna and Kim (2022), which identifies ideology as a factor in the diffusion of innovative policies across governments. The present paper’s main contribution is being the first to isolate the causal effect of ideological alignment on policy adoption using a large-scale field experiment with real, authoritative ideological institutions — rather than surveys or hypothetical scenarios — while using a non-ideological policy recommendation to avoid confounding messenger ideology with policy ideology.

Ideological alignment: In this paper’s usage, the congruence between the political ideology of the institution disseminating research evidence (think tank or newspaper) and the political ideology of the local government receiving that information. Alignment is operationalized by matching right-wing municipalities with right-leaning institutions (FAES, El Mundo) and left-wing municipalities with left-leaning institutions (Fundación Alternativas, Eldiario.es).

Evidence-based policy adoption: The actual implementation by local policymakers of a policy recommendation derived from published peer-reviewed research — measured here as whether a municipality’s Wikipedia page was edited in line with specific recommended guidelines during the study period, not merely expressed intention or stated support.

Knowledge brokers: Institutions, such as think tanks, that serve as intermediaries between academic researchers and policymakers, translating and disseminating research findings in accessible formats (policy briefs) to bridge the gap between evidence and policy.

Nonsalient ideology: A condition in which the informing institution carries no salient or recognizable partisan affiliation, operationalized here by a foreign research university professor (LSE) whose institutional identity does not carry a clear left-right signal in the Spanish political context.

Three-stage policy adoption framework: The authors’ conceptual structure positing that ideology can interfere at three sequential stages: (1) selective exposure — whether policymakers choose to access information once the messenger’s ideology is revealed; (2) belief updating — whether policymakers revise their assessment of a policy’s effectiveness upon receiving evidence; and (3) policy implementation — whether policymakers act on updated beliefs to adopt the policy.

Selective exposure: The tendency of individuals to avoid information from sources whose ideology conflicts with their own prior beliefs; in this paper, operationalized as differential click-through rates on links to policy briefs or news articles after the informing institution’s identity is revealed.

Motivated reasoning: A documented tendency, also observed in policymakers, to reject or discount evidence that contradicts ideologically held prior beliefs — the mechanism proposed to explain why opposite-ideology information fails to update beliefs as effectively as aligned-ideology information.

Market Segmentation through Information

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks what market outcomes an information designer — modeled as an internet platform that knows consumers’ preferences — can achieve by choosing what information to disclose to competing oligopolistic firms who then make personalized price offers. The model features n firms each producing a single differentiated product at zero cost, a continuum of consumers with unit demand and multidimensional valuations (one per product), and a designer who commits to a mapping from consumer types to joint distributions over messages sent to firms before they play a simultaneous pricing game. The designer’s objective spans the full range from maximizing producer surplus to maximizing consumer surplus.

The paper establishes two main results. First, under a necessary and sufficient condition called Aggregate Incentive Compatibility (AIC), the designer can implement full surplus extraction by firms — the producer-optimal outcome — in which every consumer buys her most preferred product at a price exactly equal to her valuation for it, capturing 100% of available surplus for producers. The AIC condition requires, for each firm i and each candidate deviation price p_hat_i, that the infra-marginal losses firm i would bear on its natural customers (those in Ei who value i most) from lowering price to p_hat_i must be weakly greater than the maximum business-stealing profit available from consumers who prefer other products but have valuation for i above p_hat_i. The condition is easier to satisfy when consumer preferences are more polarized, i.e., when consumers have stronger relative preferences for their most-preferred product. When firms offer homogeneous products the condition fails everywhere and no information structure can generate any producer surplus — Bertrand competition drives all profits to zero under any signal structure.

Second, the paper characterizes the consumer-optimal information structure, which achieves the maximum possible consumer surplus across all equilibria induced by any information structure. The upper bound on consumer surplus is CS* = (total surplus) minus sum_i Pi*_i, where Pi*_i is the profit firm i can guarantee itself by ignoring the designer’s signal and setting the best uniform price assuming all rivals price at zero. This bound is tight: the designer can implement it by publicly partitioning consumers into groups by most-preferred product, inducing rival firms to price at marginal cost (zero) for consumers who prefer another firm’s product, and then applying the Bergemann-Brooks-Morris (2015) extremal segmentation within each firm’s natural customer set to preserve each firm’s guarantee profit while achieving efficiency.

The illustrative two-firm example shows the quantitative stakes concretely. With no information disclosure, firms charge 4/5 and total producer surplus is about 76% of total surplus S*, consumer surplus is just under 10% of S*, and some consumers are excluded. With full disclosure, producer surplus rises to about 81% of S* and consumer surplus to 19%. The producer-optimal information structure (Case 3) achieves 100% of S* as producer surplus by pooling consumers who prefer different products into the same message submarket, giving each firm an incentive to price for its highest-valuing customers and ignore the others. The consumer-optimal information structure (Case 4) brings producer surplus down to about 57% of S* — its guaranteed lower bound — and delivers roughly 43% of S* to consumers, an outcome unattainable by full disclosure alone.

Both producer-optimal and consumer-optimal outcomes are efficient: all consumers buy their most-preferred product in both cases. The paper further characterizes the full efficient frontier between consumer- and producer-optimal outcomes, showing that mixing the consumer-optimal and full-information structures (or consumer-optimal, full-information, and producer-optimal structures when the latter is implementable) spans every point on the frontier.

The model assumes firms will price-discriminate if they can, that the designer has full knowledge of consumer types, and that the game is played once. The core results extend to continuous type distributions as shown in Online Appendix B.2. The analysis is restricted to a monopoly platform; competition among platforms is left for future work.

Q: What is the central research question and why does the two-benchmark comparison used by antitrust authorities miss important possibilities?

A: The paper asks what market outcomes — combinations of consumer and producer surplus — an information designer (a platform) can achieve by choosing among all possible information structures, not just the two benchmarks of no-information and full-information. Antitrust analysis that compares only those two cases misses a vast middle ground: an intermediary can package information in ways that, for instance, implement perfect collusion (extracting all surplus as producer surplus) while appearing to use privacy-protective technologies, or can intensify competition well beyond the full-information benchmark to benefit consumers.

Q: What is the producer-optimal information structure and when does it exist?

A: A producer-optimal information structure is one that induces an equilibrium in which every consumer buys her most-preferred product at a price exactly equal to her valuation — full surplus extraction. It exists if and only if, for every firm i and every candidate deviation price p_hat_i, the Aggregate Incentive Compatibility (AIC) condition holds: the aggregate infra-marginal losses firm i would suffer on its natural customers Ei from lowering price to p_hat_i must be at least as large as the maximum business-stealing profit from consumers outside Ei who have valuation for i weakly above p_hat_i. This is a condition on the distribution of consumer valuations, not on the information structure per se.

Q: What is the economic mechanism behind the producer-optimal structure — how does pooling consumers implement full surplus extraction?

A: The designer assigns consumers who prefer product A to the same message submarket as consumers who prefer another product but have a lower valuation for A. Firm A is then price-recommended its highest-valuing customers’ willingness to pay. The presence of the “outside” consumers in the same message makes it unprofitable for firm A to deviate downward to capture them, because the infra-marginal loss on the natural customers exceeds the additional revenue. Simultaneously, the rival firm cannot identify and undercut for A’s natural customers because the messages do not allow it to distinguish them. The result is that each firm plays a niche strategy, setting price equal to the valuation of its highest-type natural customers and excluding the others from its offer.

Q: When does polarization of consumer preferences help achieve the producer-optimal outcome?

A: Proposition 1 states that if a producer-optimal information structure exists under distribution f, it also exists under any distribution f_tilde that is more polarized than f — where more polarized means the mass of consumers who prefer i and have valuation above any threshold for i increases, and the mass of consumers who prefer j but have valuation above that threshold for i decreases. Intuitively, polarization slackens the Firm IC constraints because it reduces the business-stealing temptation: fewer consumers with high cross-product valuations are available for firm i to capture by undercutting. Concrete continuous-distribution examples include: uniform over the unit square (producer-optimal always exists), Hotelling anti-correlated values (exists everywhere), and truncated normal with mean 1/2 — producer-optimal is feasible for all standard deviations sigma > 0.15.

Q: Why does the producer-optimal outcome fail entirely when products are homogeneous?

A: Proposition 2 states that when all consumer types have equal valuations across products (the support of f lies on the diagonal of V^n), then for any information structure and any induced equilibrium, every consumer buys at price zero and all firms earn zero profit. The logic extends the standard Bertrand undercutting argument: with homogeneous products, any positive price a firm charges is undercut by a rival who can always profitably steal demand, and this applies to any posterior distribution induced by any signal realization. Even private signals cannot prevent this outcome because no signal realization can give a firm a non-contestable position.

Q: How is the consumer-optimal information structure constructed, and what is its key economic logic?

A: Theorem 2 shows the consumer-optimal structure has three layers. First, consumers are partitioned into n groups by most-preferred product (Ei). Second, firms j not equal to i are induced — by publicly revealing which group a consumer belongs to — to set price zero for consumers outside their group, because competing for those consumers is hopeless when their preferred firm is identified. Third, within each Ei, consumers are further partitioned into submarkets using the Bergemann-Brooks-Morris (2015) extremal segmentation applied to residual valuations (theta_i minus the maximum of competing valuations), ensuring firm i earns exactly its guarantee profit Pi*_i. By holding each firm down to its guarantee profit, the residual goes to consumers, maximizing CS.

Q: What is the guarantee profit Pi*_i and how does it bound consumer surplus?

A: Pi*i is the maximum profit firm i can achieve by ignoring all designer signals and setting a single uniform price to all consumers, against the worst-case scenario in which all other firms price at zero. Formally, Pi*i = max{pi} sum{theta in Ei: theta_i - pi >= max_{j not equal i} theta_j} pi * f(theta). Since firm i can always achieve Pi*_i regardless of the information structure (by simply ignoring signals), no information structure can push firm i’s profit below Pi*_i. The sum of these guarantee profits across all firms provides a lower bound on total producer surplus — and therefore an upper bound on consumer surplus — achievable by any information structure.

Q: In the two-firm numerical example, what is the quantitative comparison across the four cases?

A: Total available surplus S* = 0.84. Under no information (Case 1): producer surplus approximately 76% of S*, consumer surplus just under 10% of S*, and consumers of types (3/5, 2/5) and (2/5, 3/5) do not trade. Under full disclosure (Case 2): producer surplus approximately 81% of S*, consumer surplus 19% of S*, efficient. Under the producer-optimal structure (Case 3): producer surplus = 100% of S* (all surplus extracted), consumer surplus = 0%, efficient. Under the consumer-optimal structure (Case 4): producer surplus approximately 57% of S*, consumer surplus approximately 43% of S*, efficient. All cases except Case 1 are efficient; the no-information case excludes some consumers from trading.

Q: Is the full-information disclosure structure consumer-optimal?

A: Not in general. Proposition 3 states that full information is consumer-optimal if and only if all consumers in Ei have identical residual valuations (theta_i minus their second-best alternative) — a condition that generically fails. When residual valuations within Ei are heterogeneous, the designer can do strictly better for consumers by applying the extremal segmentation within each Ei rather than revealing full information, which would allow firms to price-discriminate on individual residual valuations and extract more surplus.

Q: Can the designer trace out the entire efficient frontier between consumer- and producer-optimal outcomes?

A: Yes, under two conditions. First, by mixing the consumer-optimal structure (point A) with the full-information structure (point B) using fractions lambda and 1-lambda respectively, the designer can implement any point on the efficient frontier between A and B. Second, when the producer-optimal outcome (point C) is also implementable, mixing the full-information structure with the producer-optimal structure by applying them to fractions lambda and 1-lambda of the consumer population respectively spans every point between B and C. The key insight is that the AIC condition, if it holds for f, also holds for any rescaled sub-distribution of f (it is scale-invariant), so the producer-optimal sub-problem remains feasible.

Q: What are the regulatory implications of the analysis?

A: The paper identifies a fundamental tension: banning information use sacrifices efficiency (some consumers excluded, wrong products purchased), but unrestricted use permits platforms to implement perfect collusion through information design. Critically, the paper shows that privacy-enhancing technologies that pool consumers into cohorts — like Google’s Privacy Sandbox — are equally consistent with the producer-optimal (collusive) and consumer-optimal (competitive) structures; the two differ only in the principle by which consumers are grouped. The paper suggests regulators could mandate that consumers in the same cohort share the same most-preferred product and that information be disclosed symmetrically across firms — the defining features of the consumer-optimal structure. This would block the producer-optimal grouping (which mixes consumers with different most-preferred products) while preserving efficiency.

Q: How does this paper relate to and extend Bergemann, Brooks, and Morris (2015)?

A: Bergemann, Brooks, and Morris (2015) characterize achievable consumer and producer surplus outcomes when a designer discloses information to a single monopolist who can price-discriminate. The present paper extends this to oligopoly, where competition between firms creates both additional constraints (firms may undercut each other) and additional instruments (the designer can play firms against each other). The consumer-optimal construction directly applies the BBM (2015) extremal segmentation within each firm’s natural customer set Ei, but the outer layer — using public revelation of group membership to induce rival firms to price at zero — is new and arises specifically from the oligopoly setting.

Information designer: An entity (modeled as a platform) that observes the full joint distribution of consumer valuations over all products and commits, before firms price, to a mapping from consumer types to joint distributions over messages sent to competing firms; the designer can be interpreted as an internet intermediary choosing how to package and share consumer data.

Aggregate Incentive Compatibility (AIC): The necessary and sufficient condition on the distribution of consumer valuations for the existence of a producer-optimal information structure; for each firm i and each candidate deviation price p_hat_i, the aggregate infra-marginal losses firm i would incur on its natural customers by lowering price to p_hat_i must weakly exceed the maximum revenue firm i could gain by attracting consumers who prefer rival products but have valuation for i above p_hat_i.

Producer-optimal information structure: An information structure that induces an equilibrium in which every consumer buys her most-preferred product at a price exactly equal to her full valuation for it, extracting 100% of available surplus as producer surplus — the outcome equivalent to the firms’ fully collusive joint surplus maximum.

Consumer-optimal information structure: An information structure that achieves the maximum consumer surplus attainable across all equilibria induced by any information structure, holding each firm to its guarantee profit Pi*_i (the best uniform-price profit the firm can secure by ignoring all signals) and allocating all residual surplus to consumers while maintaining allocative efficiency.

Guarantee profit (Pi*i): The maximum profit firm i can secure unilaterally by ignoring the designer’s signal and setting an optimal uniform price, computed against the worst case in which all rival firms price at zero; it equals max{pi} times the sum of f(theta) over all types in Ei for which theta_i minus pi exceeds all rival valuations.

Polarization of preferences: A stochastic dominance condition under which, relative to a baseline distribution, the mass of consumers who prefer product i and have high valuations for it increases while the mass of consumers who prefer rival products but have high valuations for i decreases; higher polarization weakens the Firm IC constraints and makes the producer-optimal outcome easier to implement (Proposition 1).

Separation and Consistency: Two structural properties any producer-optimal information structure must satisfy: Separation requires that the messages firm i sends to different consumers in Ei who have distinct valuations for i are disjoint in support; Consistency requires that every message firm i can send to any consumer type is contained in the union of messages firm i sends to consumers in Ei, preventing firm i from ever inferring that a consumer prefers a rival’s product.

Misspecified Expectations among Professional Forecasters

Mon, 01 Jan 0001 00:00:00 +0000

Analyzing panel data from the U.S. Survey of Professional Forecasters (SPF, 1992Q1–2019Q4, 77 forecasters, 1,520 forecaster-quarter observations), Julio Ortiz finds that a “misspecified expectations” model — in which forecasters perceive an AR(2) data-generating process to be an AR(1), causing them to misperceive its underlying persistence — tends to outperform a noisy-information rational benchmark and two leading non-FIRE alternatives (overconfident and diagnostic expectations) when fit to forecast errors and revisions. The models are estimated by maximum likelihood and ranked using forecast-encompassing weights; for the baseline real GDP growth case, misspecified expectations earns the largest encompassing weight (0.539 vs. 0.462 for diagnostic, ~0 for rational and overconfident) and the highest log-likelihood. Across 14 macroeconomic variables, misspecified expectations provides the best fit for most series both in-sample and out-of-sample, though diagnostic expectations fits better for some (e.g., GDP deflator, industrial production, real residential investment) and rational expectations fits the unemployment rate best. The author argues misspecified expectations succeeds in part because its bias enters both the prediction and updating equations, producing overreaction to new information plus overextrapolation across horizons, which makes forecast errors longer-lived; he concludes it can serve as a “suitable approach” / useful benchmark to model professional-forecaster expectation formation, while emphasizing the results are specific to the context of professional forecasting and may not carry over to household or firm expectations.

Summary of a forthcoming paper, AI-assisted and human-reviewed. See the linked original for the authoritative claims and full conditions.

In depth

Q1. What question does the paper address?

The paper undertakes a formal comparison of competing non-FIRE theories of expectation formation to move toward establishing a benchmark non-FIRE model in the context of professional forecasting. Ortiz motivates this with the observation that survey forecast errors are predictably correlated with real-time information — a violation of full-information rational expectations (FIRE) — but that, as noted in Reis (2020), the literature “has not yet settled on a benchmark non-FIRE model.” The paper offers “a partial answer to this question.”

Q2. What models are compared?

Four models are estimated: a noisy-information rational expectations baseline plus three biased non-FIRE models — overconfident expectations (Daniel et al., 1998), diagnostic expectations (Bordalo et al., 2020), and misspecified expectations (in the spirit of Fuster et al., 2010). All are embedded in a common noisy-information environment where the latent variable is unobservable and forecasters update via a Kalman filter from a noisy private signal. Overconfidence has forecasters misperceive their signal noise as smaller than it is; diagnostic expectations introduces a representativeness distortion ϕ > 0 generating overreaction to recent news; misspecified expectations has forecasters treat an AR(2) process as an AR(1).

Q3. What exactly is “misspecified expectations” in this paper?

Misspecified expectations is a model in which the underlying state follows an AR(2) process but forecasters treat it as an AR(1), so they misperceive the true persistence of the data-generating process. The author notes this version is “closest to natural expectations as modeled in Fuster et al. (2010),” with forecasters neglecting longer lags. Importantly, forecasters still understand the information structure. If the perceived persistence loads excessively onto the first lag, forecasters overextrapolate. The author flags three technical differences from Fuster et al. (2010): he does not model an AR(2) in levels with AR(1)-in-growth-rates forecasting; the perceived persistence is estimated from the data rather than defined as a function of the true autocorrelation parameters; and he does not define expectations as a weighted average of rational and naive AR(1) expectations.

Q4. What data and sample are used?

The estimation uses U.S. SPF panel data from 1992Q1 to 2019Q4, yielding 77 unique forecasters and 1,520 forecaster-quarter observations for the baseline. The 1992 start is chosen to avoid spanning different regimes and because the survey redefined output from GNP to GDP in 1992. The procedure requires unbroken observation sequences, so only each forecaster’s longest spell is kept, with a minimum spell length of eight quarters (because entry/exit may be non-random, per Engelberg et al., 2011). Real GDP growth is the baseline variable; 13 other macroeconomic variables are also estimated. Real-time forecast errors (not errors based on revised figures) are used, following the literature.

Q5. How are the models estimated and compared?

The models are estimated via a three-step maximum likelihood procedure, and their relative fit is compared using forecast-encompassing weights (West, 2001; Harvey et al., 1998; West, 2006), supplemented by AIC and a Vuong (1989) non-nested likelihood-ratio test. Step 1 estimates the fundamental process parameters (ρ₁, ρ₂, σ_w) from the macro time series and fixes them across models; step 2 estimates the signal-noise dispersion σ_v from the rational model and calibrates it across the other three; step 3 estimates each bias parameter (α_v, ϕ, ρ̂) by MLE on SPF data. This keeps fundamental and information parameters consistent across biased models so they are evaluated solely on the biases they generate, and makes identification transparent (notably, σ_v and α_v cannot be jointly identified in the overconfidence model). Encompassing weights are obtained from a constrained linear regression of realizations on model-based one-quarter-ahead forecasts, with weights summing to 1.

Q6. What are the baseline real GDP growth results?

For real GDP growth, the misspecified expectations model produces the highest log-likelihood and the largest encompassing weight, 0.539, versus 0.462 for diagnostic expectations and approximately 0.000 for both rational and overconfident expectations. The fundamental process estimates imply relatively low persistence (first-order autocorrelation ρ₁ ≈ 0.434, second-order ρ₂ ≈ −0.006). The estimated bias parameters are: overconfidence ≈ 0.72, diagnosticity ≈ 0.23, and perceived persistence ρ̂ ≈ 0.564. Because ρ̂ ≈ 0.56 exceeds the estimated ρ₁ ≈ 0.43, the misspecified model implies forecasters overestimate the first-order autocorrelation and neglect the partial reversal in the second lag, generating overreactions. The signal-to-noise ratio implied by the estimated private noise dispersion is σ_w/σ_v ≈ 1.09. AIC rankings (and BIC) do not change the ordering relative to the maximized likelihoods.

Q7. Does the result hold across other macroeconomic variables?

Across the 14 SPF macroeconomic variables, misspecified expectations provides the best in-sample fit for most series, but not all. Diagnostic expectations registers larger encompassing weights for certain series — the GDP deflator (0.771), industrial production (1.000), and real residential investment (0.624). Rational expectations provides the best fit for the unemployment rate (0.745) and housing starts (in-sample). For the bulk of the remaining variables (e.g., CPI 0.859, payroll employment 1.000, real consumption 0.777, real federal spending 1.000, real GDP 0.539, real nonresidential investment 1.000, real state/local spending 1.000, 3-month Treasury bill 0.713, 10-year bond 0.746), misspecified expectations carries the largest weight. Overconfident expectations “does not yield particularly large encompassing weights for any variable.”

Q8. Why does misspecified expectations fit better, and for which variables especially?

The author finds that, among variables exhibiting overreactions, misspecified expectations tends to offer a better fit for less persistent series, because the scope for it to generate overreaction (ρ̂ − ρ₁) is greater when ρ₁ is low. Unlike the alternatives, the persistence bias ρ̂ − ρ₁ can be positive or negative, allowing the model to account for both overreacting and underreacting variables; the alternative models cannot generate forecaster-level underreaction. Figure 2 plots the encompassing weight on misspecified expectations against the sum of autoregressive coefficients and suggests (with some exceptions) that less persistent variables have higher weight on misspecified expectations.

Q9. Does the model perform out of sample?

The misspecified expectations model also provides a better out-of-sample fit for more of the variables, estimated on 1992Q1–2005Q4 and evaluated on the latter half of the sample. However, out of sample diagnostic expectations now outperforms for the GDP deflator (0.987), industrial production (0.959), payroll employment (0.813), and real federal government expenditures (0.591); overconfident expectations outperforms for the 10-year government bond (0.653); and rational expectations outperforms for housing starts (0.502) and the unemployment rate (1.000). The author cautions that these results do not imply forecasters could improve their forecasts in real time, because the MLE observations include contemporaneous individual and consensus forecast errors that are not known to forecasters when they issue forecasts; for the same reason, the results are “not inconsistent with” Eva and Winkler (2023) on the poor out-of-sample performance of error-predictability regressions.

Q10. Could the apparent advantage of misspecified expectations just reflect learning?

The author argues that learning about the data-generating process does not appear to drive the relative model rankings in favor of misspecified expectations, based on two exercises. First, using the full pre-COVID sample (1968Q4–2019Q4) over 25-year rolling windows (three-year roll), the misspecified model outperforms diagnostic expectations in six of ten sub-samples and all models in five of ten, while diagnostic expectations wins four of ten — patterns that “do not indicate that learning over time favors misspecified expectations.” Second, splitting forecasters by “age”/tenure (a proxy for experience), misspecified expectations outperforms the others among experienced (above-median age) forecasters (encompassing weight 0.766, with overconfidence 0.234) and is dominant among inexperienced ones (1.000). The author concedes learning “is likely reflected in professional forecasts” but does not appear to drive the rankings.

Q11. What additional moments does misspecified expectations match?

Beyond overall fit, the author shows in the appendix that misspecified expectations matches five features of the data — overreaction, underreaction, overshooting, persistent disagreement, and updating behavior — and is the only model generating delayed overshooting. All three non-rational models generate individual-level overreaction (Bordalo et al., 2020 errors-on-revisions regression) and aggregate underreaction (Coibion-Gorodnichenko, 2015 consensus regression). But when simulating impulse responses, “only the misspecified expectations model generates a sign switch in the forecast error,” indicating delayed overshooting (Angeletos et al., 2020). The author reports “stronger evidence” favoring misspecified expectations on two further moments: it better generates persistent disagreement across horizons, and it better matches the relative weights forecasters place on priors versus news — because its bias also enters the prediction equation (not just the update equation), producing longer-lived errors.

Q12. What are the scope conditions and limitations the author stresses?

The author emphasizes that the results are specific to the context of professional forecasting and that the relative model rankings “may be different” for household or firm expectations, or for micro-level expectations rather than aggregate forecasts. He notes professional forecasters are arguably the most well-informed agents, so the literature has treated their predictions as informative about a lower bound on economy-wide information frictions and biases. The paper abstracts away from learning in the model setup and from theories that generate only underreaction. Models excluded from the comparison (e.g., imperfect memory, multi-frequency forecasting, asymmetric attention, learning) are set aside mainly because they cannot be flexibly nested into the common setting and would introduce additional parameters posing identification challenges.

Ortiz concludes that misspecified expectations “can serve as a suitable approach” / useful benchmark to model expectation formation among professional forecasters for a variety of macroeconomic aggregates, while framing this as only “a partial answer” to the search for a non-FIRE benchmark. He highlights a practical advantage: embedding this form of misspecified expectations into a quantitative model “only requires introducing two parameters into an otherwise standard model.” He also notes misspecification can arise either from a behavioral bias or because adopting parsimonious forecasting models is optimal (Branch and Evans, 2006; Pfajfar, 2013). A promising avenue for future research is whether evidence favors misspecified expectations in other settings.

Key concepts

Full-information rational expectations (FIRE): The benchmark in which forecast errors are uncorrelated with any information in the forecaster’s time-t information set; the orthogonality conditions it implies “tend to be violated in the data,” motivating non-FIRE models.
Misspecified expectations: The paper’s focal bias — the true state follows an AR(2) process, xₜ = ρ₁xₜ₋₁ + ρ₂xₜ₋₂ + wₜ, but forecasters treat it as an AR(1), xₜ = ρ̂xₜ₋₁ + uₜ, misperceiving its persistence; forecasters retain the correct information structure. The bias enters both the predict and update equations.
Persistence bias (ρ̂ − ρ₁): The gap between perceived AR(1) persistence and true first-order autocorrelation; positive values generate overextrapolation/overreaction, negative values generate underreaction, and its overreaction scope is larger when ρ₁ is low.
Overconfident expectations: Forecasters misperceive their private signal noise as smaller (σ̃_v = α_v σ_v, α_v ∈ [0,1]) than it truly is, placing excessive weight on new private information.
Diagnostic expectations: A representativeness-based distortion (Bordalo et al., 2020; Gennaioli-Shleifer, 2010) in which, with diagnosticity ϕ > 0, forecasters overweight outcomes representative relative to a “no news” reference scenario, generating overreaction to recent news.
Encompassing weight: The model-comparison metric — a weight wₖ from a constrained linear regression of realized one-quarter-ahead values on competing models’ forecasts, with weights summing to one; a larger weight indicates a better-fitting model.
Delayed overshooting: The Angeletos et al. (2020) pattern of initial underreaction followed by later overreaction to a shock; in this paper, only misspecified expectations produces the sign switch in the forecast-error impulse response that signals it.
Overreaction vs. underreaction: Individual-level overreaction is measured via the Bordalo et al. (2020) errors-on-revisions regression; aggregate/consensus-level underreaction via the Coibion-Gorodnichenko (2015) regression — the data exhibit both, and a successful non-FIRE model must reproduce both.

Online Business Models, Digital Ads, and User Welfare

Mon, 01 Jan 0001 00:00:00 +0000

Acemoglu, Huttenlocher, Ozdaglar, and Siderius develop a two-sided platform model to study the welfare consequences of digital advertising as an online business model. The platform intermediates between a firm selling a horizontally differentiated product and a continuum of users who derive utility from both entertaining content and informative signals about product quality embedded in ads. Users have a two-dimensional type: a sophistication dimension (sophisticated with probability lambda, naïve with probability 1-lambda) and a product-quality dimension (high quality with prior probability q). The central departure from the standard informational-advertising literature is that sophisticated users hold the correct model of the ad signal process, while naïve users underestimate the false-positive rate — the probability that a low-quality product generates a positive ad signal (phi_0). Naïve users perceive this false-positive rate to be phi_{0,N} = omega_N * omega_P * phi_0, where omega_N <= 1 captures inherent naïveté and omega_P <= 1 captures failure to understand personalized targeting, so phi_{0,N} < phi_0. The equilibrium concept is Berk-Nash equilibrium (Esponda and Pouzo 2016), meaning all agents are Bayesian given their subjective model.

The platform chooses ad load alpha (Poisson rate of ad displays), subscription fees, and the monetary transfer from the firm; the firm sets product price p after observing the platform’s contract. The central finding (Proposition 2) is that when the objective false-positive rate phi_0 exceeds a threshold phi-hat_0(lambda, phi_1, phi_{0,N}) — which is increasing in lambda and phi_{0,N} and decreasing in the true-positive rate phi_1 — the unique equilibrium is an advertising-based plan that fully segments the market: naïve users receive an ad load that extracts all their surplus, while sophisticated users are excluded entirely. In this regime the firm charges a strictly higher price p-hat* > p-bar*, where p-bar* = (beta*q + c)/2 is the monopoly price without advertising. The ad-based equilibrium emerges precisely when ads are more misleading (larger gap between phi_0 and phi_{0,N}), not when they are more informative — a comparative static the authors describe as paradoxical.

Welfare consequences (Proposition 4) are unambiguous in the advertising regime: both naïve and sophisticated users are strictly worse off than the baseline without any platform. Naïve users over-purchase due to inflated posteriors from misread signals; sophisticated users are harmed through the price channel — the firm’s higher profit-maximizing price p-hat* applies to all buyers. In the fully rational benchmark (phi_{0,N} = phi_0), the unique equilibrium is subscription-based and user welfare equals the no-platform baseline (Proposition 3).

These results extend to richer menus (Proposition 5), mixed subscription-plus-advertising plans (Proposition 7), and to multi-firm and multi-platform competition (Propositions 9-12). Digital ads soften Bertrand competition by generating endogenous horizontal differentiation among otherwise identical firms, so equilibrium prices can exceed marginal cost even with two competing firms. Platform competition similarly fails to restore welfare: platforms compete away subscription fees but both adopt ad-based plans targeting naïfs when phi_1 exceeds a threshold, maintaining the welfare loss.

On policy, the first best (planner observes types) cannot be decentralized because naïve users prefer more ads than is socially optimal, inverting the usual self-selection constraint. The second best (planner subject to incentive-compatibility constraints) is a single pooling plan with an intermediate ad load alpha^{SB} in [alpha^{FB}_N, alpha^{FB}_S] and yields average welfare above the no-platform baseline, though below first best (Proposition 13). This second best can be decentralized with a nonlinear digital ad tax, a per-unit product subsidy, and a platform subscription subsidy (Proposition 14). A simpler flat tax on digital ad revenues — above a threshold gamma-bar < 1 — also improves welfare relative to the ad-based equilibrium, though it does not restore the second best (Proposition 15).

Four robustness extensions are developed: endogenous manipulation (platform always chooses the most manipulative environment, lowest phi_{0,N}); naïve learning dynamics (learning raises the sophisticate share in steady state, making ad-based models less profitable but not overturning the main results); imperfect price discrimination by the firm (naïfs are unambiguously worse off, threshold for advertising equilibrium shifts down); and an added price-sensitivity dimension (the platform runs a 2x2 menu separating by both sophistication and price sensitivity, preserving the result that naïve users tolerate and receive more ads than sophisticates in every stratum).

Q: What is the key asymmetry between naïve and sophisticated users that drives the main results? A: Sophisticated users hold the correct Bayesian model of the ad signal process and thus correctly account for the false-positive rate phi_0 when updating beliefs from positive ad signals. Naïve users perceive the false-positive rate as phi_{0,N} = omega_N * omega_P * phi_0 < phi_0, so they treat positive signals as stronger evidence of high product quality than they actually are. Because naïve users overestimate the informativeness of ads, their (interim) subjective valuation of an ad-based plan is higher, making them more tolerant of ad loads and more willing to join platforms with heavy advertising. This asymmetry is what makes it profitable to target naïfs with high ad loads while excluding or charging subscription fees to sophisticates.

Q: Why does advertising to sophisticated users generate no additional firm profit, while advertising to naïve users does? A: Lemma 1 establishes that with linear-quadratic utility the firm extracts no surplus from advertising to sophisticates: because sophisticated agents are fully Bayesian, their expected posterior equals the prior (E_S[pi_i] = q), so expected demand after advertising is identical to demand before advertising. By contrast, Lemma 2 shows that the firm’s profit from naïve agents is positive and strictly increasing in ad load alpha, because naïve users’ average demand curve drifts upward as alpha rises — their inflated perceived informativeness of ads causes them to over-update on positive signals, systematically raising their willingness to pay. The platform captures this surplus from the firm via the advertising transfer m*.

Q: What is the threshold condition determining whether the equilibrium is subscription-based or advertising-based? A: Proposition 2 identifies a threshold phi-hat_0(lambda, phi_1, phi_{0,N}) that is increasing in the sophisticate share lambda and in the naïve false-positive perception phi_{0,N}, and decreasing in the true-positive rate phi_1. When the objective false-positive rate phi_0 is below this threshold, the profit-maximizing business model is subscription-based with price P* = T - v and product price p* = p-bar* = (betaq + c)/2. When phi_0 exceeds the threshold, the advertising model dominates: the platform sets a high ad load alpha-hat that makes naïve users exactly indifferent between participating and their outside option v, excludes sophisticates, and the firm charges p-hat* > p-bar*. The threshold falls with phi_1, meaning more informative ads expand the range of phi_0 over which the advertising equilibrium obtains.

Q: How does allowing the platform to offer menus change the results relative to the baseline two-plan case? A: Proposition 5 shows that with menus the platform can simultaneously serve both user types: sophisticates receive a subscription plan at P* = T - v and naïve users receive an ad-based plan with the same high load alpha-hat* as in the baseline. The threshold for the advertising equilibrium shifts down to phi*0(lambda, phi_1, phi{0,N}) < phi-hat_0, so advertising business models arise for a strictly larger set of parameters. Welfare consequences are unchanged (Corollary 1): when phi_0 > phi*_0, both types have welfare strictly below the no-platform baseline. Proposition 6 further shows consumer welfare is monotonically decreasing in both phi_0 and phi_1: higher phi_1 (more informative true-positive signals) also reduces welfare because any surplus from greater informativeness is fully captured by the platform.

Q: What is the welfare ranking across the three regimes: no platform, advertising equilibrium, and subscription equilibrium? A: In the subscription equilibrium (regime (a) of Proposition 2 or 4), user welfare for both types equals the no-platform base case W_base(tau) — the platform captures all surplus it creates and users are no better or worse off. In the advertising equilibrium (regime (b)), both naïve and sophisticated users are strictly worse off than with no platform: W-hat*(tau) < W_base(tau) for both tau in {S, N}. The first-best, where a planner controls ad loads separately by type, yields W^{FB}(tau) > W_base(tau) for both types because informative ads can genuinely improve sophisticated users’ decisions and a constrained amount improves naïve users’ decisions too.

Q: How does firm-level competition interact with digital advertising to affect prices and welfare? A: Without advertising, two ex ante identical firms compete à la Bertrand and price at marginal cost (p*_1 = p*_2 = c). Proposition 9 establishes that when phi_1 > phi^F_1 and phi_0 >= phi^F_0(phi_1), the platform offers an ad-based plan and equilibrium prices p-hat*_1 and p-hat*_2 are both strictly above p-bar* — the monopoly price without advertising. The mechanism is endogenous horizontal differentiation: users who see positive ad signals for one firm’s product form higher valuations for that product, so the two products become differentiated in the eyes of consumers even though they are ex ante identical, breaking Bertrand logic. Example 1 further illustrates that advertising can be more prevalent with competition than without: a second firm’s entry can push the equilibrium from no-advertising to separating.

Q: Does platform competition protect users from the welfare losses associated with digital advertising? A: Not fully. Proposition 11 shows that with two competing platforms (M=2, N=1) and no advertising, platforms compete away both subscription fees and ad loads, and welfare reaches the fully rational benchmark. However, when phi_1 exceeds threshold phi^P_1, both platforms adopt ad-based plans targeting naïve users, charge no subscription fees, and the product price rises to p-hat*_P > p-bar* (Proposition 12). Competition reduces subscription fees to zero but does not eliminate the incentive to target naïfs with heavy ads, because naïve users’ over-valuation of ads means they remain willing to join ad-heavy plans. The fundamental inefficiency from naïve users’ misspecified model persists under platform competition.

Q: Why is the first-best allocation not implementable as a decentralized equilibrium? A: Proposition 13 explains the obstacle: the social planner would ideally offer naïve users fewer ads (alpha^{FB}_N) than sophisticated users (alpha^{FB}_S), with alpha^{FB}_N <= alpha^{FB}_S. However, naïve users have a higher subjective valuation for ads than sophisticates because they believe ads are more informative. If offered a menu with both options, naïve users would self-select into the plan with the higher ad load alpha^{FB}_S — the exact opposite of what the planner wants. The incentive-compatibility constraints therefore force the planner toward a single pooling plan with an intermediate ad load alpha^{SB} in [alpha^{FB}_N, alpha^{FB}_S]. Average welfare under the second best exceeds the no-platform baseline, confirming that some advertising is socially valuable, but falls short of the first best whenever alpha^{FB}_N > 0.

Q: How does a flat digital ad tax improve welfare, and what are its limitations? A: Proposition 15 establishes that whenever the equilibrium features an ad-based plan, a flat tax on digital ad revenues at rate gamma > gamma-bar < 1 improves welfare by discouraging advertising-based business models and inducing the platform to shift toward subscription-based plans. The mechanism is that taxing ad revenue reduces the platform’s marginal gain from increasing ad load, making the subscription plan relatively more profitable. However, the flat tax does not achieve the second best because it operates linearly rather than targeting the nonlinear distortion: the optimal nonlinear tax-subsidy scheme (Proposition 14) requires a threshold-style ad tax at rate mu > mu-bar combined with a per-unit product subsidy delta* and a platform subscription subsidy eta > eta-bar.

Q: What happens when the platform can endogenously choose how manipulative its ads are? A: Proposition 16 shows that a profit-maximizing platform always chooses the lowest feasible phi_{0,N} = phi-bar — the most manipulative environment. Two reinforcing channels drive this: the pricing channel (lower phi_{0,N} amplifies naïve demand shifts per positive signal, so the downstream firm raises price and sales, increasing ad revenues extracted by the platform) and the participation channel (lower phi_{0,N} raises naïve users’ perceived informational value of ads, relaxing their participation constraint and permitting a higher ad load alpha). Platform competition constrains the equilibrium ad load through tighter participation constraints but does not alter the choice of phi_{0,N} = phi-bar, so competition limits ad quantity but not ad manipulativeness.

Q: How do naïve learning dynamics affect the main results? A: Proposition 17 introduces a birth-death environment where exposure to disconfirming evidence gradually converts naïve agents to sophisticates. A unique steady-state sophisticate share lambda*(alpha_N, phi_0) exists; both higher ad load alpha_N and higher phi_0 accelerate the conversion of naïfs, raising future sophisticate share and reducing future ad revenues. This creates a new intertemporal trade-off that constrains the platform’s choice of ad loads relative to the static case. The key result (part ii) is that the main characterization of Proposition 7 carries through under a modified cutoff phi-tilde^{dynamic}0 >= phi-tilde_0(lambda-tilde, phi_1, phi{0,N}), so learning dynamics make the ad-based business model less likely but do not overturn the fundamental welfare results.

Q: How does imperfect price discrimination by the firm affect naïve users? A: Proposition 18 considers a firm that observes a user’s sophistication type with probability kappa in [0,1]. With price discrimination, the firm sets type-specific prices satisfying p*_N >= p* >= p*_S, moving toward the type-specific monopoly levels. Naïfs are unambiguously worse off: when identified (with probability kappa), they face the higher price p*_N and a higher equilibrium ad load. The threshold for the advertising equilibrium also shifts down relative to the baseline, meaning advertising business models emerge for a larger parameter range when price discrimination is possible.

Q: How does the paper define and measure user welfare, and why is ex post rather than interim welfare the relevant concept? A: User welfare W(tau_i) is defined as ex post utility, which depends on the actual product quality theta_i realized after consumption, not on interim beliefs formed after viewing ads. Naïve users’ interim assessment inflates expected product quality, but their ex post utility depends on whether the product is genuinely high quality for them (theta_i = 1 with probability q, theta_i = 0 with probability 1-q). Because naïve users over-purchase due to misread signals — consuming more than optimal when theta_i = 0 — their ex post utility is strictly lower than their interim expected utility, and strictly lower than the no-platform baseline in the advertising equilibrium. The ex post welfare concept is the relevant one precisely because it captures the actual material consequences of manipulation, not the subjectively perceived gains from ads.

Naïve vs. Sophisticated Users: The paper’s primary user heterogeneity dimension. Sophisticated users hold the correct model of the ad signal process, setting phi_{0,S} = phi_0 (the true false-positive rate). Naïve users hold a misspecified model with phi_{0,N} = omega_N * omega_P * phi_0 < phi_0, underestimating the probability that a low-quality product generates a positive ad signal, due to inherent naïveté (omega_N) and failure to understand personalized targeting (omega_P).

Ad Load (alpha): The Poisson rate at which ads are displayed to a user per unit time. Total ad displays follow a Poisson(alpha*T) distribution. Higher ad load means less time on entertaining content — expected entertainment time is (1-alpha)T — and a higher probability (1 - exp(-alphaT)) that the user sees the ad at least once. The platform chooses alpha as its primary instrument for extracting surplus from naïve users.

False-Positive Rate (phi_0): The objective probability that a low-quality product (theta_i = 0) generates a positive (“good”) ad signal. The gap between phi_0 (objective) and phi_{0,N} (naïve users’ perceived rate) is the key parameter driving all welfare results: a larger gap implies greater de facto manipulation and a stronger incentive for the platform to adopt an advertising-based model.

Berk-Nash Equilibrium: The solution concept from Esponda and Pouzo (2016), used to model agents with misspecified subjective models. All agents are Bayesian conditional on their own subjective model. Sophisticates’ subjective model equals the objective model (standard Bayesian), while naïfs update using the misspecified phi_{0,N}. Perfection requires sequential rationality at each information set given beliefs.

De Facto Manipulation: The paper’s term for a situation in which the platform and firm exploit naïve users’ misspecified model to boost demand and extract surplus, without requiring any outright deception in the formal sense. It arises because naïve users voluntarily choose high-ad-load plans (believing ads to be highly informative) and voluntarily over-purchase (having updated on what they mistakenly think are strong positive signals). The manipulation is “de facto” because it operates through the users’ own rational (but misspecified) decision-making.

Separating Equilibrium: An equilibrium in which naïve and sophisticated users self-select into distinct platform plans. In the advertising equilibrium, naïve users join an ad-heavy plan (extracting all their surplus via inflated willingness to pay for ads) while sophisticated users are either excluded or placed on a subscription plan. This separation is the vehicle through which the platform maximizes revenue from naïf manipulation while limiting the disciplining force of sophisticates.

Second-Best Allocation: The welfare-maximizing allocation subject to the incentive-compatibility constraints that users self-select into plans. Because naïve users prefer more ads than sophisticated users (the inverse of what the planner desires), the second best is a single pooling plan with an intermediate ad load alpha^{SB} in [alpha^{FB}_N, alpha^{FB}_S]. This is strictly worse than the first best but achieves average welfare above the no-platform baseline, and can be decentralized with a nonlinear ad tax, product subsidy, and platform subscription subsidy.

Peer Effects in Consideration and Preferences

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a general nonparametric model of discrete choice in which peers influence agents through two distinct channels: (1) the set of alternatives an agent considers (consideration set effects) and (2) the agent’s preferences over those alternatives (preference effects). The framework embeds these peer mechanisms in a continuous-time Markov process where agents revise choices at Poisson alarm-clock rates. A peer is classified as a consideration peer, a preference peer, or both, and the network is encoded as two directed edge sets rather than one.

The central identification challenge is recovering network structure, consideration probabilities, and preferences simultaneously, without relying on exogenous variation in covariates or the menu of available options. The paper shows this is achievable using time-series variation in the choices made by connected agents. The key insight is that consideration peers who adopt alternative v change the probability that the focal agent considers v — entering only the “consideration” term of the conditional choice probability (CCP) — while preference peers who adopt alternatives other than v change only the “conditional-on-consideration” selection probability. These cross-alternative patterns in the CCPs allow the researcher to distinguish the two channels. Once consideration-only peers are isolated, their choices serve as exclusion restrictions that mimic artificial menu variation, enabling nonparametric recovery of preferences.

Identification proceeds in stages: (i) recover the full reference group of each agent from changes in CCPs; (ii) separate consideration-only peers from preference-affecting peers using cross-order effects across alternatives; (iii) distinguish preference-only peers from consideration-and-preference peers under an exclusion restriction (Assumption 4) requiring that an agent with a dual-channel peer also has at least one single-channel peer; (iv) recover consideration ratios Q(v|n+1)/Q(v|n) and then the full choice rule. The results allow arbitrary heterogeneity across agents and do not require exogenous menu variation or covariate shifters.

For continuous-time data (Dataset 1), the CCPs and Poisson rates are exactly identified from the observed revision history. For discrete-time panel data (Dataset 2), identification is generic under a mild eigenvalue condition on the transition rate matrix.

The empirical application studies store-opening decisions by China’s two dominant high-end tea chains — Heytea and Nayuki — across prefecture-level cities from their founding through end-2020. By that date, Nayuki had 485 stores in 57 cities and Heytea had 729 stores in 46 cities, in an industry whose total revenue grew from 42.2 to 83.1 billion yuan between 2017 and 2020. Each firm-market pair is modeled as an agent deciding whether to open a new store. The key exclusion restriction is that the cumulative store count of either firm in geographically neighboring markets shifts consideration probabilities but does not enter marginal profitability directly.

Estimation via maximum likelihood yields four substantive findings: (1) Firms exhibit limited consideration — consideration probabilities for markets with no prior presence by either firm are substantially below one. (2) Stores in neighboring markets significantly raise consideration probabilities for a given market, for both own-firm and rival stores; this peer effect in consideration is described as economically large. (3) Own-market store density raises marginal profitability (density economies) while rival presence lowers it (competitive effects). (4) A full-consideration model that omits the attention stage overestimates the negative competitive effect and underestimates positive density effects.

Counterfactual simulations show that removing attention constraints (full consideration) accelerates market penetration substantially: firms enter new markets earlier and achieve broader geographic coverage. Removing peer effects in consideration only — while retaining attention constraints — slows the diffusion of store openings across neighboring markets, because peer effects in consideration function as an informational cascade. Limited consideration also reduces competition by delaying rival entry into high-profitability markets, explaining a significant share of the geographic concentration in first- and second-tier cities during the early expansion phase. The paper’s scope is limited to settings with repeated, non-durable choices; it does not model forward-looking behavior or multiple equilibria, which the authors note as directions for future research.

Q: What are the two peer-effect channels in the model, and how do they differ structurally? A: A consideration peer influences whether an alternative enters the agent’s consideration set — specifically, the probability Q_a(v | n) that alternative v is considered is a function of the number n of consideration peers currently adopting v. A preference peer influences the choice rule R_a(v | y, C) — the probability that v is selected conditional on it being in the consideration set. Importantly, the paper models the two channels as affecting logically separate stages of the decision process, so the observed CCP factors into a consideration term and a conditional-selection term that respond to distinct sets of peers.

Q: Why does the standard identification approach of varying menus fail here, and how does the paper substitute for it? A: Menu variation requires the researcher to observe the same agent facing different sets of available alternatives, which is unavailable in many empirical settings. The paper replaces exogenous menu variation with endogenous variation generated by consideration-only peers: when a consideration-only peer adopts alternative v, the focal agent’s probability of considering v rises, effectively mimicking the removal of other alternatives from her consideration set. This peer-induced variation in consideration is then used to trace out the choice rule R_a over counterfactual menus without any actual menu changes.

Q: How does the paper separate consideration peers from preference peers in the data? A: The decomposition exploits an asymmetry in how the two peer types appear in the log-CCP. When a consideration peer switches to alternative v, the term ln Q_a(v | .) changes but the conditional-selection term ln D_a(v | .) remains unchanged, because the agent already considers v. Conversely, when a preference peer adopts an alternative other than v, only the conditional-selection term shifts. The paper formalizes this via cross-order effects of peers across alternatives in the CCPs (Propositions 3.1–3.3) and invokes Assumption 4 — requiring at least one single-channel peer when a dual-channel peer exists — to complete the separation.

Q: What is Assumption 4 and why is it necessary? A: Assumption 4 states that if agent a has a peer in N_CR_a (a peer affecting both consideration and preferences), then a also has at least one additional peer affecting only consideration or only preferences. Without this exclusion restriction, the consideration and preference effects of a dual-channel peer are not separately identified from each other; the single-channel peer provides the variation needed to pin down each component separately.

Q: What does Proposition 2.1 establish and what does it require? A: Proposition 2.1 establishes existence and uniqueness of an invariant equilibrium distribution mu over choice configurations, with full support. It requires Assumptions 1 (independent consideration), 2(i) (strictly positive consideration probability for every alternative), and 3(i) (strictly positive probability of selecting any non-default alternative from some reachable consideration set). The continuous-time Poisson structure ensures zero probability of simultaneous revisions, which rules out multiple equilibria in the data-generating process.

Q: How does the paper handle discrete-time panel data, where only periodic snapshots of choices are observed? A: The paper invokes results from Blevins (2017, 2026) to show that the transition rate matrix W of the continuous-time process is generically identified from the discrete-time transition matrix observed at interval Delta, provided the eigenvalues of W do not differ by integer multiples of 2pii/Delta. Once W is identified, the CCPs P and Poisson rates lambda_a are recovered. This result is described as generic, meaning it holds except on a measure-zero set of parameter values.

Q: What data does the empirical application use, and what are the key sample statistics? A: The application uses city-level store registration data sourced from the National Enterprise Credit Information Publicity System (via CnOpenData, 2021), supplemented by regional statistics from the China City Statistical Yearbook (2016–2021). The sample ends in 2020 to avoid COVID-19 demand shifts. By end-2020, Nayuki had 485 stores across 57 cities and Heytea had 729 stores across 46 cities. The high-end tea industry’s total revenue grew from 42.2 to 83.1 billion yuan between 2017 and 2020.

Q: What is the key exclusion restriction in the empirical specification, and why is it plausible? A: Stores in geographically neighboring markets (parameterized by distance bins d(m,m’)) enter the attention index pi_tilde but are excluded from the marginal profit index pi_bar. The rationale is that nearby store counts are informative signals that draw managerial attention to a market (an informational spillover) but do not directly alter the profitability of operating in that market — profitability depends on local demand, competition within the market, and own firm density, not on activity in adjacent markets. This restriction identifies the consideration-only peer channel.

Q: What does the paper find about biases from ignoring limited consideration? A: When the two-stage model (consideration + choice) is replaced by a single-stage full-consideration model, the estimated payoff parameters differ substantially. Specifically, the full-consideration model overestimates the negative effect of competition (rival presence in the same market) and underestimates the positive effect of own-store density. The intuition is that correlated entry patterns driven by shared consideration spillovers are misattributed to payoff interactions when the consideration stage is omitted.

Q: What do the counterfactual simulations show about the role of limited consideration in market dynamics? A: Three counterfactuals are compared against the baseline. Under full consideration (no attention constraints), market penetration is substantially faster — firms enter new markets earlier and achieve broader geographic coverage. Removing peer effects in consideration while retaining attention constraints slows geographic diffusion because the informational cascade that propagates entry to neighboring markets is eliminated. Limited consideration also reduces competition by delaying rival entry into high-profitability markets; markets with high potential demand remain underserved for longer. Collectively, limited consideration explains a significant portion of the geographic concentration of tea chain stores in first- and second-tier cities during the early expansion period.

Q: What forms of heterogeneity does the identification allow, and what does it not require? A: The nonparametric identification results accommodate arbitrary heterogeneity across agents in consideration mechanisms Q_a, choice rules R_a, Poisson revision rates lambda_a, and network positions. The identification requires neither exogenous covariates that shift preferences or consideration, nor variation in the set of available alternatives across observations. It relies solely on time-series variation in the choices made by connected agents, which are endogenous to the model and are themselves identified in the first stage.

Q: How does the paper model history dependence, and does it change the main identification results? A: Section 4.1 extends the model to allow consideration probabilities and choice rules to depend on the agent’s own choice history h_t in addition to the current configuration y. Proposition 4.1 states that under Assumptions 1–4 applied conditional on both y_{at} and h_t, all identification propositions from Section 3.1 remain valid. The extension also allows consideration probabilities to equal one, enabling nontrivial dynamics in consideration sets driven by past choices.

Q: How is the unobservable default handled in the empirical application? A: When the default alternative (e.g., “do not open a store”) is unobserved, the Poisson revision rate lambda_a cannot be separately identified from the CCPs without normalization. The paper normalizes lambda_a = 1 for each agent in the empirical application, treating the revision opportunity rate as fixed and recovering all remaining primitives under this normalization.

Consideration set: The subset C of the full menu Y that agent a actually attends to at the moment of revision; formed before the choice rule is applied. Alternative v enters C independently with probability Q_a(v | n), where n is the number of consideration peers currently adopting v. The default alternative is always in the consideration set.

Conditional choice probability (CCP): P_a(v | y), the ex-ante probability that agent a selects alternative v given choice configuration y; equal to the product of the consideration probability Q_a(v | .) and the conditional-selection probability D_a(v | .), integrated over all possible consideration sets.

Choice configuration: The vector y = (y_a)_{a in A} recording the current alternative selected by every agent in the network simultaneously; the state variable of the continuous-time Markov process.

Consideration-only peer: A peer a’ in N_C_a \ N_R_a whose choices enter the consideration probability Q_a but not the choice rule R_a. Variation in the choices of consideration-only peers serves as an exclusion restriction that mimics artificial menu variation for identifying preferences.

Preference-only peer: A peer a’ in N_R_a \ N_C_a whose choices enter the choice rule R_a but not the consideration probability Q_a.

Cross-order peer effect: The pattern in the CCP by which a consideration peer’s adoption of alternative v changes ln P_a(v | .) but not the conditional-selection component, while a preference peer’s adoption of a different alternative v’ changes the conditional-selection component but not the consideration component; this asymmetry is the key to separating the two channels.

Limited consideration: The situation in which Q_a(v | n) is strictly less than one for at least some alternatives v and peer counts n, so that the agent does not evaluate all available options before choosing; distinct from full rationality in which all alternatives are always considered.

Mean attention index (pi_tilde): The latent index governing the consideration probability in the empirical specification; it depends on own and rival store counts in the same and neighboring markets and on firm fixed effects, but is excluded from the marginal profit index — constituting the empirical exclusion restriction that separates the consideration and payoff channels.

Screening and Segmenting: A Consumer Surplus Perspective

Mon, 01 Jan 0001 00:00:00 +0000

Bergemann, Heumann, and Wang study consumer surplus when a monopolist simultaneously engages in second-degree price discrimination (screening consumers within each market segment through quality-differentiated menus) and third-degree price discrimination (offering different menus across segments). The central question is which market segmentation maximizes aggregate consumer surplus, and under what conditions any segmentation benefits consumers at all.

The model features a monopolist selling vertically differentiated goods of quality q at strictly convex cost c(q) to a continuum of buyers with privately known values v drawn from an aggregate market m*. A segmentation is any decomposition of m* into submarkets, each receiving a profit-maximizing screening menu. The seller observes segment identity but not individual values. The problem of finding the consumer-optimal segmentation is, on its face, an optimization over distributions of distributions — an infinite-dimensional object.

The paper’s central methodological contribution is a dramatic dimensional reduction. Theorem 1 establishes that the maximum consumer surplus achievable by any segmentation equals the maximum of the expected local information rent, u(v,h) = h·Q(v−h), over all inverse hazard rate functions h satisfying a majorization constraint h ≺ h* (where h* is the aggregate market’s inverse hazard rate). The local information rent captures both the extensive margin (h measures the mass of higher-value buyers per unit of value-v buyers who earn rent from v’s allocation) and the intensive margin (Q(v−h) is the quality allocated to value v, decreasing in h as distortion increases). The two margins trade off: raising h widens the base of rent-earning buyers but worsens allocative distortion, making u(v,h) hump-shaped in h with an interior maximizer h̄(v).

The consumer-optimal segmentation has a striking structural property: every buyer of a given value v receives the same quality in every segment in which they appear, even though the monopolist could in principle offer different qualities across segments. Prices, however, differ across segments for identical buyers. This holds because the optimal segmentation is always a uniform segmentation — one in which the inverse hazard rate hm(v) is equalized across all segments containing value v.

Under log-concavity of both aggregate demand (equivalently, a non-increasing aggregate inverse hazard rate h*(v), satisfied by uniform, normal, logistic, and exponential distributions) and the supply function Q(v) (equivalent to c’’’(q)q/c’’(q) ≥ −1, satisfied by all power cost functions), the optimal segmentation takes a transparent two-regime form (Proposition 3): for values below a threshold v̂ where h*(v̂) = h̄(v̂), the inverse hazard rate is reduced to h̄(v) by concentrating low-value buyers; for values above v̂, the aggregate market is left unchanged. The resulting segments are nested convex intervals [vm, v̄], all sharing the same upper bound v̄, with pricing differing across segments only by a quality-independent base price Tm that increases with vm (Theorem 2).

Corollary 3 delivers the sharpest policy-relevant finding: under log-concave demand and supply, zero segmentation is optimal — any segmentation harms consumers — if and only if h*(v̲) ≤ h̄(v̲) at the lowest value v̲. For iso-elastic costs c(q) = q^γ/γ (γ > 1), this becomes η*(v̲) ≤ γ/(1−γ), where η*(v̲) is the aggregate demand elasticity at the bottom of the distribution. When demand is sufficiently elastic relative to supply, the monopolist’s screening already provides near-optimal consumer rents and no redistribution of buyers across segments can improve them. More elastic supply (lower γ) shrinks the set of markets where zero segmentation is optimal (Proposition 4, Zγ’ ⊂ Zγ for γ’ < γ); more inelastic supply (higher γ) expands it, and in the limit γ → ∞ zero segmentation is suboptimal only when the aggregate allocation itself is efficient.

For iso-elastic costs, the optimal segmentation assigns each segment a Pareto distribution below v̂ with shape parameter α = γ/(γ−1), and the aggregate market above v̂ (Corollary 1). Each segment’s demand elasticity equals the constant γ/(1−γ) below v̂ and the aggregate elasticity above (Corollary 2): the supply elasticity 1/(γ−1) determines how elastic demand must be made within segments to counteract monopoly distortions. The paper also extends the framework to adverse selection (where seller cost rises with buyer type), with the full reduction to inverse hazard rate optimization preserved when the rate of increase in adverse selection satisfies τ’’(v)v/τ’(v) ∈ [0,1] (Proposition 5).

Q: What is the local information rent and why is it central? A: The local information rent is u(v,h) = h·Q(v−h), where h is the inverse hazard rate at value v and Q is the inverse marginal cost (supply) function (equation 9). The factor h captures the extensive margin — the mass of higher-value buyers per unit of value-v buyers who earn rent from v’s quality allocation — while Q(v−h) captures the intensive margin — the quality allocated to v via the virtual value v−h, which falls as h rises. Because u is hump-shaped in h, there is an interior rent-maximizing inverse hazard rate h̄(v) for each value. Lemma 2 establishes that in every regular market, total consumer surplus equals the integral of u(v,hm(v))dFm(v), so the entire segmentation problem reduces to choosing h.

Q: What is the majorization constraint and what does it exactly characterize? A: The majorization constraint h ≺ h* requires that for all v ∈ V, the integral from v̲ to v of [h*(t) − h(t)]dF*(t) ≥ 0 (equation 18). Proposition 1 shows that for any segmentation σ, the average inverse hazard rate hσ must satisfy hσ ≺ h*. A partial converse holds: given h ≺ h* under regularity conditions, a uniform segmentation implementing h exists. The constraint is strictly weaker than the pointwise bound h ≤ h* available in the binary case because it permits h to exceed h* at some values (dilution) provided it falls sufficiently below h* at higher values (concentration) to maintain the cumulative inequality.

Q: What are concentration and dilution, and how do they interact? A: Concentration gathers buyers of a given value into fewer segments, lowering their inverse hazard rate below h*(v). Dilution raises the inverse hazard rate of value v by placing v in segments where immediately higher values are missing — creating gaps in the support — thereby increasing the support increment Δm(v) and hence hm(v) (equation 12). Dilution at v requires that values just above v have already been concentrated elsewhere to create the gaps; concentration thus enables dilution, linking the two tools. With only binary values, only concentration is available; with a continuum, dilution can strictly expand achievable consumer surplus by permitting h to exceed h* at low values.

Q: What does Theorem 1 establish and why is it a major simplification? A: Theorem 1 states that the maximum consumer surplus over all segmentations of m* equals the maximum of ∫u(v,h(v))dF*(v) over all h satisfying the majorization constraint h ≺ h* (equation 25). The original problem maximizes over distributions on the infinite-dimensional space of probability measures on V; the reduced problem is a standard optimal control problem over a single real-valued function h: V → R+, amenable to Karush-Kuhn-Tucker methods and often yielding closed-form solutions. Furthermore, every optimal segmentation is a uniform segmentation implementing some h solving the reduced problem, so the reduction is exact. The optimal h always satisfies regularity (h’(v) ≤ 1), meaning v − h(v) is non-decreasing, which ensures segments in the optimal uniform segmentation are themselves regular.

Q: What is the structural property of consumer-optimal segmentations regarding quality across segments? A: In any consumer-optimal segmentation, every buyer of value v receives the same quality in every segment in which they appear (the uniform quality property following from Theorem 1). This holds because the optimal inverse hazard rate h(v) is equalized across segments (uniform segmentation), and quality in a regular market is qm(v) = Q(v − hm(v)), which depends on the market only through hm(v). Prices, however, differ across segments for identical buyers: the monopolist does not redesign its product line across segments but adjusts only quality-independent base prices. This is counterintuitive because nothing in the monopolist’s problem requires quality uniformity — it emerges purely from the consumer surplus maximization.

Q: What conditions guarantee the simple two-regime convex segmentation structure? A: Log-concavity of aggregate demand — equivalently, h*(v) non-increasing in v, satisfied by uniform, normal, logistic, and exponential families — and log-concavity of the supply function Q(v), equivalent to c’’’(q)q/c’’(q) ≥ −1, together guarantee the structure of Proposition 3 and Theorem 2. Under these conditions, h̄(v) is strictly increasing in v (log-concave supply) while h*(v) is decreasing (log-concave demand), so they cross exactly once at v̂. The optimal h equals h̄(v) below v̂ and h*(v) above. Only concentration (not dilution) is ever used because log-concave supply makes u concave in h and log-concave demand ensures monotone ordering of marginal local information rents across values, so the binding majorization constraint becomes the pointwise constraint at the bottom.

Q: What is the structure of convex segmentations and their menus (Theorem 2)? A: Under log-concave demand and supply, the consumer-optimal segmentation consists of segments m with absolutely continuous supports [vm, v̄] for varying lower bounds vm ≤ v̂, all sharing the same upper bound v̄ (Part 1 of Theorem 2). Pricing across these segments differs only by a quality-independent base price Tm that is increasing in vm — more concentrated segments (lower vm) face a lower base price and carry higher information rents — while the quality menu p(q) is uniform across segments (Part 2). Equivalently, the monopolist offers nested menus all sharing the same efficient upper bound quality Q(v̄), differing in how far down the menu is extended and in the price of the lowest offered quality.

Q: What do Corollaries 1 and 2 say for iso-elastic cost functions? A: With iso-elastic cost c(q) = q^γ/γ (γ > 1) and log-concave demand, the consumer-optimal segmentation assigns each segment a Pareto distribution with shape parameter α = γ/(γ−1) below the threshold v̂, and the aggregate distribution above v̂ (Corollary 1). This delivers a constant demand elasticity of γ/(1−γ) within each segment below v̂, matching the aggregate market’s elasticity above v̂ (Corollary 2). The Pareto shape — and thus the degree of demand manipulation — is determined entirely by the supply elasticity 1/(γ−1): more elastic supply (lower γ) mandates a higher shape parameter α and more elastic within-segment demand to counteract larger monopoly distortions.

Q: When is zero segmentation optimal, and what is the precise elasticity condition? A: Under log-concave demand and supply, zero segmentation is optimal if and only if h*(v̲) ≤ h̄(v̲) — the aggregate inverse hazard rate at the lowest value already lies at or below its rent-maximizing level (Corollary 3). Since h* is decreasing under log-concavity, this condition at v̲ implies it holds everywhere, so the designer cannot improve rents at any value. For iso-elastic cost, the condition becomes η*(v̲) ≤ γ/(1−γ): aggregate demand elasticity at the bottom must be at least as large in magnitude as one plus the supply elasticity. For a Pareto aggregate distribution with shape parameter α, zero segmentation is optimal when α ≥ γ/(γ−1).

Q: How does supply elasticity govern the scope for beneficial segmentation (Proposition 4)? A: Proposition 4 establishes that for iso-elastic cost, the set of markets Zγ where zero segmentation is optimal is strictly nested increasing in γ: for any γ’ < γ, Zγ’ ⊂ Zγ. More elastic supply (lower γ) amplifies monopoly distortions and enlarges the set of markets where segmentation benefits consumers; more inelastic supply (higher γ) makes quality provision rigid, reducing segmentation’s scope. In the limit γ → ∞ (approaching unit demand), zero segmentation is suboptimal only if the aggregate allocation is already efficient — but this limit also means very inelastic supply, so the potential benefits from segmentation have shrunk toward zero simultaneously.

Q: How does this paper compare to and depart from Haghpanah and Siegel (2023)? A: Haghpanah and Siegel (2023) showed that in generic markets with a finite number of goods, some segmentation always improves consumer surplus relative to the aggregate market. This paper shows that with a continuum of qualities, this universal improvement result fails: Corollary 3 identifies a large, non-degenerate class of markets satisfying Haghpanah and Siegel’s genericity conditions where zero segmentation is optimal for consumers. The discrepancy arises because the log-concave supply condition (equation 27) is violated in finite-good environments — Haghpanah and Siegel explicitly provide a counterexample showing their result fails with a continuum of goods. This paper characterizes exactly when the finite-good gains vanish as the quality space becomes continuous, providing the precise elasticity conditions.

Q: What changes and what is preserved when extending to adverse selection? A: In the adverse selection specification, buyer net value v is private and the seller’s cost per unit is τ(v) − v, increasing in v when τ’(v) > 1. The local information rent becomes w(v,h) = u(v, τ’(v)·h), where adverse selection enters by amplifying the effective inverse hazard rate by τ’(v) (equation 40). Proposition 5 confirms that the full reduction to majorization-constrained optimization over h goes through, and the optimal segmentation features more elastic within-segment demand when adverse selection is more severe. The reduction requires τ’’(v)v/τ’(v) ∈ [0,1] (equation 39), bounding the rate of increase of adverse selection severity; if this fails, the key inequality (35) driving the optimality of uniform segmentations may break down.

Q: What are the policy implications for regulation of price discrimination? A: The results imply that blanket restrictions on market segmentation may harm consumers by preventing welfare-enhancing price discrimination in markets where demand is sufficiently inelastic relative to supply (the region outside the zero-segmentation condition). In markets satisfying η*(v̲) ≤ γ/(1−γ), allowing segmentation yields no consumer benefit, so restrictions are harmless to consumers. The key policy-relevant primitives are demand and supply elasticities, which are in principle measurable. The findings also imply that the welfare effects of data-driven personalized pricing depend critically on the interaction between consumer heterogeneity (demand shape) and cost structure (supply elasticity), rather than on the degree of segmentation per se.

Local information rent: u(v,h) = h·Q(v−h), the total consumer surplus generated per unit mass of buyers at value v as a function of the inverse hazard rate h. The factor h is the extensive margin (mass of higher-value buyers per unit of value-v buyers who earn rent) and Q(v−h) is the intensive margin (quality allocated to v via the virtual value v−h). It is hump-shaped in h with interior maximizer h̄(v), and the segmentation problem reduces entirely to maximizing its expectation.

Inverse hazard rate hm(v): in a continuous market, (1−Fm(v))/fm(v); generalized to accommodate atoms and support gaps (equation 12). It simultaneously determines the virtual value ϕm(v) = v − hm(v) (governing allocative distortion) and the scaled mass of higher-value buyers per unit of value-v buyers (governing the extensive margin of rents). The dual role requires both a continuum of qualities and endogenous segmentation.

Majorization constraint h ≺ h*: for all v, the cumulative integral of [h*(t)−h(t)]dF*(t) from v̲ to v is non-negative (equation 18). It is the exact characterization of inverse hazard rate functions achievable by some segmentation of m*, strictly weaker than the pointwise bound h ≤ h* of the binary case because it permits h to exceed h* at some values (dilution) provided it falls sufficiently below h* at higher values (concentration).

Uniform segmentation: a segmentation in which every buyer of value v faces the same inverse hazard rate hm(v) = hσ(v) in every segment containing v (equation 22). Theorem 1 establishes that every consumer-optimal segmentation is uniform; this class converts the double integral over segments and values into a single integral against F*, enabling the dimensional reduction of Theorem 1.

Concentration and dilution: the two tools by which segmentation modifies inverse hazard rates. Concentration gathers buyers of a given value into fewer segments, lowering hm(v) below h*(v). Dilution raises hm(v) above h*(v) by placing value v in segments where immediately higher values are absent, creating support gaps. Dilution requires prior concentration of adjacent higher values, so the two tools are linked; under log-concave demand and supply, only concentration is used in the optimal segmentation.

Convex segmentation: a segmentation whose constituent segments have nested convex interval supports [vm, v̄] all sharing the same upper bound v̄, with varying lower bounds vm. This is the consumer-optimal structure under log-concave demand and supply (Theorem 2). For iso-elastic cost, each segment below the threshold v̂ corresponds to a Pareto distribution with shape parameter α = γ/(γ−1) determined by cost convexity γ.

Zero-segmentation condition: the condition under which no segmentation can improve consumer surplus over the aggregate market. Under log-concave demand and supply with iso-elastic cost c(q) = q^γ/γ, it is η*(v̲) ≤ γ/(1−γ): aggregate demand elasticity at the lowest value must be at least as large in magnitude as one plus the supply elasticity (Corollary 3). When this holds, any redistribution of buyers across segments strictly reduces consumer surplus.

Silence to Solidarity: How Communication About a Minority Affects Discrimination

Mon, 01 Jan 0001 00:00:00 +0000

This paper examines how two types of communication about a minority group affect discriminatory behavior: (i) horizontal communication between majority-group members, and (ii) top-down communication from agents of authority such as the legal system. The setting is urban Chennai, India, where the paper measures discrimination against thirunangai — a community of transgender women who are India’s most visible LGBTQ+ group — in a field experiment with 3,397 participants.

Discrimination is measured using incentivized hiring choices. Participants are offered a free grocery delivery and make 10 binary choices over which worker will carry out the delivery, with worker gender (cisgender male, cisgender female, or transgender) varying across options. The stakes are real: one choice is randomly selected and implemented 2–9 weeks later. Participants in the control condition are highly discriminatory: they are 19 percentage points (32%) less likely to hire a transgender worker than a non-transgender worker (p<0.001), and are willing to sacrifice grocery items worth 1.9 times their median daily per capita food expenditure to avoid a 15-minute interaction with a transgender worker.

The first main treatment involves randomly assigning participants to a 3-person group discussion with two neighbors, in which they discuss and make collective hiring choices over the same options. The key outcome is participants’ subsequent private, individual hiring choices. The discussion eliminates anti-transgender discrimination on average: participants in the discussion arm are 17 percentage points (42%) more likely to select a transgender worker in their private post-discussion choices relative to the control group (p<0.001), so that discrimination is no longer statistically distinguishable from zero (p=0.30). The discussion’s effect is partially persistent: approximately one month later, discussion participants are still 4 percentage points more likely to select transgender workers in hypothetical hiring choices (p=0.03), representing roughly 25% of the short-run effect.

The second main treatment cross-randomizes a video shown before hiring choices. The legal rights video informs participants of a Supreme Court ruling affirming that transgender people hold the same fundamental constitutional rights as other citizens. This reduces discrimination by 10.3 percentage points (p<0.001). A rights messaging video — which argues that transgender people should have equal rights without invoking legal authority — reduces discrimination by a smaller 5.8 percentage points (p=0.001), and there is some evidence the legal-authority version is more effective (p of difference in [0.01, 0.12]). However, the legal rights video’s effect is only 59% as large as the discussion’s effect (p of difference in [0.002, 0.04]), and it does not persist at the one-month follow-up (p in [0.12, 0.51]).

The paper rules out two candidate mechanisms for the discussion’s effects and supports a third. First, the discussion does not work primarily through correcting misperceived norms: while control-group participants do overestimate peer discrimination by 5 percentage points, the discussion reduces predicted discrimination by 24 percentage points — far more than a corrected misperception could explain (at most 21% of the effect under generous assumptions). Second, the discussion does not work through virtue signaling alone: a “No discussion (public)” arm in which participants make individually-visible choices shows no reduction in discrimination on average (p=0.83). Third, the paper provides affirmative evidence for a persuasion channel: participants in a “listener” arm, who silently observe a 2-person discussion without participating, discriminate 13 percentage points less than the control group (p<0.001), an effect that is highly persistent at the 2–9 week follow-up (11 percentage points, p<0.001). The persuasion mechanism is further supported by the finding that pro-trans participants are more vocal: each additional transgender worker chosen in post-discussion private choices is associated with a 32% higher probability of speaking first (p=0.03) and a 27% higher probability of dominating the discussion (p=0.02). Statements about transgender workers during discussions were 5.7 times more likely to be positive than negative. Listeners who heard moral argumentation about equality, rights, and giving opportunities subsequently discriminated less (p<0.001).

Scope conditions: the study is conducted among urban Chennai residents (85% female), where transgender identity is visually recognizable and socially salient, awareness of the 2014 Supreme Court ruling is low (36% could not identify a single legal right transgender people hold), and a wedge exists between descriptive norms (high actual discrimination) and prescriptive norms (93% of the control group rate explicit discrimination as wrong). The model’s “sweet spot” logic implies these effects may not generalize to settings where discrimination is either near-universal (no privately pro-trans individuals to be vocal) or already minimal (no incentive to persuade).

Q: How is anti-transgender discrimination measured in the experiment? A: Participants make 10 incentive-compatible binary hiring choices over grocery delivery workers, with one choice randomly selected and implemented 2–9 weeks later. Discrimination is defined as the reduction in the probability of selecting the alternative worker when that worker is transgender versus non-transgender, conditional on other option characteristics such as items offered and reliability score. Participants are told they will have a 15-minute conversation with the selected worker, ensuring anticipated social contact. The design is framed as market research to obfuscate the study’s purpose; only 8% correctly guessed the true focus.

Q: How large is baseline discrimination in the control group? A: In the No discussion (private) control condition, participants are 19 percentage points (32%) less likely to hire a transgender worker than a non-transgender worker (p<0.001). In willingness-to-pay terms, participants sacrifice grocery items worth 1.9 times their median daily per capita food expenditure (Rs. 127 on a base of Rs. 67) to avoid selecting a transgender worker. Even when a transgender worker dominates on both items and reliability score, participants in the control group still select the non-transgender worker 47% of the time.

Q: What is the main effect of the 3-person group discussion on subsequent discrimination? A: Participants who engage in a group discussion with two neighbors are 17 percentage points more likely to select a transgender worker in their subsequent private individual choices (p<0.001). This eliminates average discrimination entirely: in the discussion arm, the probability of selecting a transgender worker is not statistically distinguishable from the probability of selecting a non-transgender worker (p=0.30). The willingness-to-pay to avoid a transgender worker falls from Rs. 127 to Rs. 13 (p of difference < 0.001), and is no longer significantly different from zero (p=0.265).

Q: How persistent are the effects of the group discussion? A: At the 2–9 week follow-up survey (mean 35 days), discussion participants are approximately 4 percentage points more likely to select transgender workers in hypothetical hiring choices (p=0.03). This represents approximately 25% of the short-run 17 percentage point effect, a decay rate comparable to the persistence of US political advertising effects in the political science literature (Hill et al., 2013, estimate 10–15% remaining after 30 days).

Q: What is the effect of the legal rights video, and how does it compare to the discussion? A: The legal rights video — informing participants of the Supreme Court ruling affirming transgender people’s fundamental constitutional rights — increases the probability of selecting a transgender worker by 10.3 percentage points (p<0.001). The rights messaging video, which argues that transgender people should have equal rights without invoking legal authority, increases it by 5.8 percentage points (p=0.001). The legal rights video’s effect is only 59% as large as the discussion’s 17 percentage point effect (p of difference in [0.002, 0.04]), and unlike the discussion, neither video’s effect is detectable at the one-month follow-up (p in [0.12, 0.51]).

Q: Does the legal rights video work through a different channel than the rights messaging video? A: There is evidence that the legal authority of the Supreme Court matters beyond the content of the rights message. The legal rights video is more effective than the rights messaging video at reducing discrimination (p of difference in [0.01, 0.12]), and the legal rights video (but not the rights messaging) affects participants’ beliefs about the legal status of transgender people (as measured by a summary index). Both videos shift perceived descriptive norms — participants predict others will select transgender workers more, by 2–6 percentage points — but neither significantly affects attitudes as measured by a list experiment or disapproval questions.

Q: Does the discussion work through correcting misperceived norms? A: This channel can account for at most a small fraction of the effect. Control-group participants do overestimate peer discrimination by 5 percentage points in incentivized predictions (p<0.001, as measured by predicted probability of selecting a transgender worker). However, the discussion reduces predicted discrimination by 24 percentage points (p<0.001), far exceeding the initial misperception. Even under generous assumptions in which the misperception is precisely corrected, this mechanism could account for no more than 21% of the discussion’s treatment effect (95% CI: [8.9%, 32.5%]).

Q: Does the discussion work through virtue signaling? A: The evidence rules out virtue signaling as the primary channel. The “No discussion (public)” treatment arm makes participants’ individual hiring choices visible to their group members, exogenously increasing social image concerns in the absence of a discussion. This has no detectable average effect on discrimination (p=0.83), indicating that social image concerns alone — without the persuasive content of an actual discussion — do not explain the reduction in discrimination generated by the group discussion.

Q: What is the evidence for the persuasion mechanism? A: The “listener” treatment arm provides direct evidence. In this arm, one participant silently observes a 2-person discussion without speaking, then makes private individual choices. Listeners discriminate 13 percentage points less than the control group (p<0.001), an effect statistically indistinguishable from full discussion participants. Since listeners changed their behavior based solely on what they heard and saw, this constitutes evidence of persuasion. The listener effect is highly persistent at the 2–9 week follow-up (11 percentage points, p<0.001) and holds on a robustness outcome designed to be completely private. The implied persuasion rate is 29%, described as high relative to values in the literature (DellaVigna & Gentzkow, 2010).

Q: Why do pro-trans participants persuade others — what drives the discussion’s content? A: Pro-trans participants are disproportionately vocal. Each additional transgender worker chosen in post-discussion private choices (a proxy for pro-trans private attitudes) is associated with a 32% higher probability of speaking first (p=0.03) and a 27% higher probability of dominating the discussion (p=0.02), but only when discussing a choice involving a transgender worker. The overall tone of discussions is strongly pro-trans: statements about transgender workers are 5.7 times more likely to be positive than negative. Participants who hear moral argumentation about equality, rights, and giving opportunities subsequently discriminate significantly less (p<0.001).

Q: Does the discussion work by changing statistical (belief-based) discrimination? A: Partially, baseline discrimination in the control group is partly statistical: despite transgender workers having the same average reliability scores as others, participants rate them as less likely to complete a delivery, and revealing the true reliability score makes participants 2.9 percentage points more likely to select a transgender worker (an effect unique to transgender workers). However, the discussion does not significantly affect beliefs about transgender workers’ reliability, and there is no detected reduction in the belief-based component of discrimination in the discussion arm (though the test is underpowered).

Q: Are the effects of the discussion and the legal rights video additive? A: The two interventions appear to combine approximately linearly for the legal rights video: there are no detected interaction effects (p in [0.83, 0.96]). By contrast, there is weak evidence of a negative interaction between the rights messaging video and the discussion, suggesting these two may be substitutes — consistent with the rights messaging video’s content being similar to the pro-trans moral argumentation already present in discussions.

Q: What alternative explanations are ruled out? A: The paper tests and finds no support for: (i) photo characteristics such as perceived caste driving results; (ii) social image concerns affecting even post-discussion private choices (the “extra private” robustness outcome designed to be unobservable by neighbors yields similar results); (iii) increased contemplation or deliberation about choices; (iv) experimenter demand effects or social desirability bias (treatment effects do not differ for the 8% who guessed the study’s purpose); (v) increased salience of the transgender category; and (vi) cheap talk from low stakes (choices were incentive-compatible and implemented).

Q: What is the study’s theoretical model for why pro-trans participants speak out? A: The paper develops a model combining social signaling (people want to fit in with their group; Bénabou & Tirole, 2006) with direct persuasion (participants can change each other’s preferences through messages). Under the right conditions, only pro-trans participants send persuasive pro-trans messages. This occurs in a “sweet spot” range: when average discrimination is not so strong that no one is privately pro-trans, and not so weak that pro-trans participants lack an incentive to persuade (since they are already in the majority). The context in Chennai — high actual discrimination but strong social norms against it — satisfies this sweet spot condition.

Q: What are the policy implications regarding horizontal versus top-down communication? A: In this context, facilitating horizontal communication between neighbors is a more effective tool for reducing discrimination than top-down communication about legal rights: the discussion’s effect is 1.7 times larger than the legal rights video (17 p.p. vs. 10.3 p.p.) and partially persists at one month, whereas the legal rights video’s effect does not persist. However, the legal rights video does reduce discrimination relative to the rights messaging video, suggesting that communicating the legal authority of the Supreme Court carries independent weight beyond rights advocacy messaging. Both interventions are complementary when combined.

Horizontal communication: Communication between members of the majority group about a minority, as distinct from contact between majority and minority groups or top-down communication from authority. In this paper, operationalized as a group discussion among three neighbors who make collective hiring choices.

Top-down communication: Communication from agents of authority — here, the legal system — about a minority group’s rights. Measured via a video informing participants of a Supreme Court ruling affirming transgender people’s constitutional rights.

Anti-transgender discrimination: In the paper’s own measurement, the reduction in the probability that a worker is chosen because they are transgender (relative to being non-transgender), conditional on other delivery option characteristics. Measured in incentivized, privately-elicited binary hiring choices.

Expressive law hypothesis: The theory that changes in the law affect behavior by changing people’s perception of the prevailing social norm, not (only) through deterrence. The paper tests this by comparing a legal rights video (invoking Supreme Court authority) to a rights messaging video with identical content but no legal backing, finding the legal-authority version more effective.

Persuasion channel: The mechanism by which discussion participants change each other’s preferences through persuasive messages, particularly moral arguments about equality and rights. Distinguished in the paper from virtue signaling (publicly visible pro-trans behavior) and norm correction (updating misperceived beliefs about peer behavior).

Pluralistic ignorance: A setting in which people misperceive how common discriminatory attitudes are among their peers, potentially hiding genuine minority support for the discriminated group. The paper tests this as a candidate mechanism and finds it can account for at most 21% of the discussion effect.

Sweet spot condition: The range of average group discrimination levels in which pro-trans participants have both the motivation and opportunity to speak out persuasively — discrimination is not so universal that no one is privately pro-trans, and not so minimal that the pro-trans participants feel no need to persuade others. The paper argues the Chennai context satisfies this condition.

Soft landing and inflation scares

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Overview

Research Question

Why did the 2021–2023 US inflation surge end in a soft landing — disinflation without a major recession — while the Volcker disinflation of 1979–1987 required substantial output losses? And was the timing and strength of the Federal Reserve’s reaction to the inflation surge decisive in achieving this outcome?

Methodology and Model

The paper develops and estimates a micro-founded Heterogeneous-Expectation New Keynesian (HENK) model in which agents hold idiosyncratic, dispersed beliefs about the long-run (steady-state) level of inflation. The key departure from full-information rational expectations (FIRE) is that information about the long-run value of inflation is dispersed and sticky: agents update their beliefs through pairwise social learning (SL), adopting the forecasting model of the agent whose belief produced lower recent inflation forecast errors. This tournament process — inspired by genetic algorithms — generates a time-varying cross-sectional distribution of subjective inflation beliefs.

The model admits a closed-form solution that retains the entire time-varying distribution of beliefs and can be estimated with standard full-information Bayesian methods using the inversion filter (Cuba-Borda et al. 2019). The FIRE benchmark is nested as the special case in which the average belief deviation from the target is zero at all times.

Estimation uses four US macroeconomic observables (output gap, CPI inflation, one-quarter-ahead average SPF inflation expectation, and the proxy funds rate of Choi et al. 2022 that captures both conventional and unconventional monetary policy) over 1985Q1–2023Q4. A formal model comparison rejects the RE null hypothesis (p < 0.0001) in favor of the HENK specification.

Main Findings With Quantitative Magnitudes

Inflation scares are endogenous: In the model, inflation scares arise whenever repeated above-target inflation outcomes validate and diffuse above-target beliefs through social interactions. Under the historical scenario, the share of agents holding long-run inflation beliefs between 1 and 3 percent (annualized) falls to 40 percent in mid-2022 before recovering above 90 percent by end-2023, indicating a partial but not complete unanchoring of expectations.
Timing dominates strength: Counterfactual simulations show that the timing — not the strength — of the Fed’s reaction to the inflation surge is the key determinant of inflation expectations management and subsequent macroeconomic outcomes. Varying the Taylor-rule inflation coefficient by +/-10 percent (moving from 1.64 to 2.00) produces negligible differences in inflation and output gap dynamics, with welfare ratios of 1.052 and 0.981 relative to benchmark respectively under the ad-hoc loss function. By contrast, varying the timing via the interest-rate smoothing parameter by +/-10 percent produces much larger divergences.
The Fed fell behind the curve: Under a scenario in which the Fed had strictly followed its estimated Taylor rule (removing the negative monetary policy shocks observed from mid-2020 to mid-2022), inflation would have peaked approximately 3 percentage points lower on a yearly basis. Inflation expectations would have remained lower for almost a year longer, and the subsequent rise in expectations would have been more gradual and lower-peaking. Crucially, the output gap in this preemptive-tightening scenario would have been only briefly negative (in 2022Q2) and not deep enough to trigger a recession.
Further delays would have been highly costly: A delay of the tightening by one, two, four, or eight quarters would have produced successively worse outcomes. A two-year delay generates runaway inflation and 100 percent loss of target credibility (complete unanchoring). A delay of approximately three quarters would have resulted in a sizable, self-reinforcing entrenchment of above-target inflation expectations. The welfare cost of an eight-quarter delay is 5.76 times the benchmark loss under the ad-hoc measure (1.167 under the microfounded measure).
Early rate cuts would have reignited inflation: A counterfactual 100-basis-point cut as early as 2022Q3 would have pushed annual inflation approximately 2 percent above the historical scenario through end-2023, with inflation expectations rebounding by about 1 percent (annualized) immediately after the cut. Under no early-cut scenario would inflation or expectations have converged back to target by end-2023.
Expectation heterogeneity amplifies shocks: Greater initial dispersion in beliefs amplifies and prolongs the impact of all shocks (demand, supply, monetary policy, expectation). After a one-standard-deviation cost-push shock, higher initial belief dispersion produces larger and more persistent deviations in inflation, output, and interest rates. The model-implied interquartile range of beliefs is correlated 0.538 with the SPF interquartile range and the cross-sectional standard deviation is correlated 0.483 (both p < 0.001).
Historical decomposition: Over the 2010s, negative expectation shocks account for a substantial fraction of the persistent below-target inflation (“missing inflation”). From approximately mid-2022 onward, positive expectation shocks account for most of the variance of inflation in the model. The recent disinflation is attributed to a combination of: easing supply pressures, normalization of monetary policy, and re-anchoring of inflation expectations.

Scope Conditions

Results are conditional on the estimated HENK model applied to US data, 1985Q1–2023Q4, using a stylized three-equation NK backbone (no labor market dynamics, no financial sector, no capital). The proxy funds rate is more volatile than the federal funds rate, which affects the welfare comparison for large preemptive tightening scenarios. Counterfactual scenarios are implemented through unexpected monetary policy shocks; anticipated shocks would only strengthen the inflationary effects of delays.

Layer 2 — Q&A

Q1: What is the core mechanism by which an inflation scare can develop in the HENK model?

A: When inflation repeatedly exceeds the target — whether due to shocks or delayed policy — agents whose beliefs are already above-target incur lower forecast errors than those anchored at the target. During pairwise social interactions (the tournament step of social learning), above-target beliefs spread through the population because they are selected as the “better” forecasting model. The resulting upward shift in the average belief feeds higher inflation through the New Keynesian Phillips Curve, which validates above-target beliefs further, creating a self-reinforcing loop. This mechanism differs from rational-expectations models, where beliefs mean-revert automatically.

Q2: How does the model retain a closed-form solution despite the nonlinearity of the social-learning process?

A: Two assumptions deliver the closed-form. First, beliefs are private and dispersed (Assumption 1): agents observe only the belief of their matched mate, not the population distribution. Second, a quasi-rational-expectations (quasi-RE) observer treats aggregate beliefs as a random walk in expectations (Assumption 2: a martingale). Under these conditions, the aggregate subjective inflation expectation equals the average subjective belief about steady-state inflation plus the rational-expectations forecast. This augmented minimum-state-variable (MSV) solution can be estimated with full-information methods (the inversion filter) via standard Dynare tooling.

Q3: What data are used and how are observables mapped to model variables?

A: The estimation uses four quarterly US observables from 1985Q1–2023Q4: the output gap (real GDP from FRED, HP-filtered with a one-sided adjusted filter); the CPI inflation rate (CPIAUCSL, FRED); one-quarter-ahead average CPI inflation expectation from the Survey of Professional Forecasters (CPI3); and the proxy funds rate of Choi et al. (2022), which captures both QE and QT so that unconventional monetary policy is reflected in the instrument. Inflation and expectations are demeaned by the sample average to express them as deviations from steady state. The discount factor is calibrated at 0.99; all other parameters are estimated via Bayesian methods with Metropolis-Hastings (8 parallel chains x 100,000 iterations, acceptance rate ~30%).

Q4: What are the key estimated parameter values for the social-learning block?

A: The posterior mean of the decay parameter in the fitness evaluation (discounting of past forecast errors) is 0.775, implying a half-life of past forecast errors of approximately 3 quarters. The frequency of news shocks has a posterior mean of 0.436, meaning approximately 40 percent of agents receive an inflation news shock every quarter. The standard deviations of the aggregate and idiosyncratic news shocks are very small (posterior means of 0.0004 and 0.0006, respectively) but strictly positive. The 95 percent confidence intervals for both exclude zero.

Q5: How does the HENK model outperform the RE benchmark in fitting the data?

A: Formal model comparison rejects the RE null (p < 0.0001) with equal prior model weights (50/50). On second moments, only the HENK model replicates positive autocorrelation in inflation (0.428 vs. 0.162 for RE, against an empirical interval of [0.239; 0.579]), in inflation expectations (0.824 vs. 0.161, empirical interval [0.839; 0.927]), and in inflation forecast errors (0.122 vs. -0.145). Additionally, the HENK model reproduces the untargeted cross-sectional dispersion of beliefs over the business cycle, including the increase during the GFC and the COVID-19 era and the low dispersion during the Great Moderation — with correlations of 0.538 and 0.483 between model and SPF dispersion measures.

Q6: What does the historical shock decomposition reveal about the recent inflation surge?

A: The decomposition (Section 3.3) shows that in the initial phase of the COVID-19 shock (2020Q2-Q3), negative demand and monetary policy shocks drove inflation down. Adverse cost-push (supply) shocks dominate from early 2021 into 2022. Expectation shocks — the contribution of dispersed beliefs — are negative throughout the 2010s (explaining part of the “missing inflation”) and remain briefly negative at the pandemic’s onset before turning sharply positive and driving most of the variance of inflation in the final two years of the sample (2022-2023). The loose monetary policy stance (negative monetary policy shocks from mid-2020 to mid-2022, visible in the Taylor-rule residuals) also contributes substantially to the inflation dynamics.

Q7: What does the Taylor-rule counterfactual show, and why doesn’t preemptive tightening cause a recession in the model?

A: Removing the monetary policy shocks after 2020Q4 so that the proxy rate follows the estimated Taylor rule would have reduced the inflation peak by approximately 0.75 percentage points per quarter (equivalent to about 3 percentage points annualized) and kept expectations lower-anchored for almost a year longer. The output gap under the Taylor-rule scenario is only briefly negative (2022Q2) and does not constitute a recession. This occurs because the preemptive tightening exploits the sluggishness of subjective expectations stemming from information frictions: by raising rates earlier when beliefs are still anchored (or only weakly above target), the CB prevents the social-learning mechanism from diffusing above-target beliefs, which in turn softens the stabilization trade-off between inflation and output.

Q8: What is the U-shaped welfare relationship between preemptive tightening size and welfare?

A: Both the ad-hoc and microfounded welfare measures show a U-shaped relationship as the size of the front-loaded tightening in 2021Q1 increases from 100 bps to 400 bps to 800 bps. At 100 bps, the welfare ratio is 0.336 (ad-hoc, improvement over benchmark at 1.0); at 400 bps it improves further to 0.304; but at 800 bps (front-loading the entire subsequent tightening cycle) the ratio rises to 0.555, reflecting that the output costs of a very large early rate increase become prohibitive amid the series of supply shocks that hit in 2022. The maximum welfare gain in the microfounded criterion occurs at a slightly larger early increase than in the ad-hoc criterion, attributed to the absence of a financial sector and use of the more volatile proxy funds rate.

Q9: Does increasing the hawkishness of the Taylor rule compensate for falling behind the curve?

A: No. Varying the inflation reaction coefficient by +/-10 percent (to 2.00 for “hawk” and 1.64 for “dove”) from the posterior mean of approximately 1.82 produces negligible differences in inflation and output gaps. The hawkish scenario achieves marginally earlier rate increases but does not reduce the inflation gap relative to the historical benchmark. Welfare ratios are 0.960 (hawkish, slight improvement) and 1.057 (dovish, slight deterioration) under the ad-hoc measure, and 0.981 and 1.052 under the microfounded measure. The joint simulations varying both smoothing (timing) and hawkishness (strength) confirm that timing is the dominant factor: the two “earlier reaction” scenarios are clustered together and well-separated from the two “later reaction” scenarios, regardless of the inflation coefficient.

Q10: How does the model handle the role of initial belief dispersion in monetary policy transmission?

A: Impulse response function exercises varying the initial standard deviation of beliefs (as a share of the maximum model-generated standard deviation under the filtered shocks) show that greater initial dispersion uniformly amplifies and prolongs the macroeconomic response to all shock types (demand, cost-push, monetary policy, expectation). The mechanism is: greater dispersion means the population contains more “extreme” (far-from-target) beliefs; a shock that temporarily moves inflation off target temporarily validates extreme beliefs (lower forecast errors), causing them to spread in social interactions and shift the average belief further from target. This raises nominal rates (through the Taylor rule), deepens output losses, and prolongs the return to steady state.

Q11: What are the implications of early interest rate cuts in the counterfactual scenarios?

A: A 100-basis-point cut in any quarter from 2022Q3 through 2023Q2 would have reignited inflation expectations. The 2022Q3 scenario is most severe: expectations rebound approximately 1 percentage point higher (annualized) immediately post-cut, and annual inflation remains on average 2 percent above the historical path through end-2023. Across all early-cut scenarios, neither inflation nor inflation expectations would have returned to target by end-2023; instead, inflation would have been landing approximately 2 percent above the 2 percent target. The welfare ratios for early cuts range from 1.200 (cut in 2022Q3) down to 1.079 (cut in 2023Q2) under the ad-hoc measure — all welfare-worsening.

Key Concepts

Inflation scare (Goodfriend 1993, as used in this paper): A situation in which the public’s long-run inflation expectations become unanchored from the central bank’s target, making beliefs about above-target steady-state inflation self-fulfilling via the New Keynesian Phillips Curve. In the HENK model, a scare arises endogenously when above-target inflation outcomes repeatedly validate above-target beliefs, causing them to spread through social interactions. Measured in the paper by the share of idiosyncratic beliefs falling between 1 and 3 percent (annualized); lower share = more severe scare.

Social learning (SL): The belief-updating mechanism in which agents are paired at random each period and compare their inflation forecasting models; the agent whose model produced lower recent forecast errors (measured by the discounted sum of squared forecast errors with half-life approximately 3 quarters) is adopted by both members of the pair. This evolutionary tournament process — analogous to a genetic algorithm — generates a nonlinear, history-dependent distribution of beliefs that can drift persistently away from the target.

Steady-state learning: The restriction that agents’ heterogeneous beliefs concern only the low-frequency (intercept) component of inflation — i.e., their subjective perception of the steady-state inflation rate — while the rest of their inflation forecast (the effects of transitory shocks and lagged variables) coincides with rational expectations. This assumption, combined with internal rationality, permits a closed-form MSV solution of the HENK model.

Internal rationality: The assumption that each agent uses a perceived law of motion that is consistent with the true MSV solution of the HENK economy (including the effect of heterogeneous beliefs on dynamics), even if their intercept differs from the rational-expectations value. Agents internalize how the aggregate deviation of expectations from RE affects inflation, but they disagree about the long-run level.

Quasi-rational-expectations (quasi-RE) observer: An observer (or central bank) who, lacking information about how individual private beliefs are formed and aggregated, treats aggregate beliefs as a martingale — i.e., the expected future aggregate belief equals its current value. This assumption closes the model and permits estimation with full-information (inversion filter) methods, while preserving consistency between subjective beliefs and the law of motion.

Belief dispersion / expectation heterogeneity: The time-varying cross-sectional standard deviation (or interquartile range) of idiosyncratic beliefs in the population. In the model this is an endogenous, history-dependent outcome of the SL process. Greater dispersion amplifies the response of all macroeconomic variables to any shock by providing more “extreme” beliefs that can gain traction in pairwise tournaments when inflation temporarily deviates from target. Measured empirically by the interquartile range and standard deviation of individual SPF forecasts.

Proxy funds rate (Choi et al. 2022): A summary measure of the US monetary policy stance that incorporates both conventional interest rate policy and the effects of unconventional policies (quantitative easing and tightening), used in the paper in place of the federal funds rate to capture the full stance of monetary policy in the estimation and historical decomposition.

Inversion filter (Cuba-Borda et al. 2019): A computationally efficient estimation algorithm that, rather than the Kalman or particle filter, inverts the observation equation analytically to recover the sequence of structural shocks for a given parameter vector. It enables full-information Bayesian estimation of the nonlinear HENK model by separating the linear part of the solution from the nonlinear social-learning residual.

The role of wage expectations in the labor market

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a Mortensen-Pissarides (DMP) search and matching model with internally rational (IR) agents who hold subjective beliefs about wages rather than perfect knowledge of the Nash bargaining outcome. The standard DMP model struggles with two empirical regularities: high volatility of U.S. labor market variables relative to productivity, and a near-zero correlation between labor market tightness and productivity post-1989. The IR model significantly improves alignment with U.S. labor market data relative to the standard rational expectations benchmark, by generating a self-referential belief mechanism: shifts in beliefs about the future returns to labor affect current wages, which agents use to update beliefs. Wage expectations in the model are consistent with European Commission professional forecasters data, and an econometric test rejects the rational expectations null hypothesis for survey real wage expectations.

Summary based on a working paper version, AI-assisted and human-reviewed. See the linked published article for the authoritative version.

In depth

Q1. What is internal rationality and how does it differ from standard rational expectations?

Internal rationality (IR) means agents know all internal aspects of their optimization problem and maximize their objectives given their knowledge, but lack perfect information about the equilibrium wage function that emerges from Nash bargaining; they therefore hold subjective beliefs about wages. Under standard rational expectations, workers and firms know the exact wage function from Nash bargaining. Under IR, they have limited foresight about the outcome of wage negotiations and use a subjective model to form wage expectations. This is a small but disciplined departure from RE: the paper considers belief systems implying only a small deviation from rational expectations that match aspects of survey wage expectations.

Q2. What is the empirical failure of the standard DMP model that motivates the paper?

The standard DMP model fails on two counts: it cannot reproduce the high observed volatility of unemployment, vacancies, and market tightness relative to productivity, and it cannot generate the near-zero post-1989 correlation between productivity and labor market tightness. The first failure—the Shimer (2005) puzzle—has attracted extensive research, but the near-zero tightness-productivity correlation has been largely neglected. The paper shows that allowing for small deviations from rational expectations in the form of internal rationality resolves both puzzles simultaneously.

Q3. What is the self-referential belief mechanism and how does it generate extra dynamics?

The model has a self-referential mechanism: shifts in beliefs about future returns to labor affect current wages, and agents use realized wages to update their beliefs about future wages; this creates an additional dynamic source beyond technology shocks that helps match the data. When firms and workers revise beliefs about future wages upward, current wages rise through the Nash bargaining outcome (since reservation values of both parties shift); this realization then feeds back into updating beliefs, generating wage and employment dynamics not tied to current productivity. This mechanism provides a microfoundation for previous adaptive learning models of unemployment.

Q4. What is the empirical validation of the model’s wage expectations?

Wage expectations in the IR model are validated against survey data from European Commission professional forecasters, and an econometric test rejects the rational expectations null hypothesis for real wage expectations from survey data. The consistency between model-implied and surveyed wage expectations provides external validation for the IR departure from RE, showing that the subjective beliefs assumed in the model correspond to beliefs actually held by professional forecasters rather than to arbitrary deviations.

Key concepts

internal rationality (IR) : a bounded rationality concept in which agents fully optimize given their beliefs and knowledge of their own decision problem, but lack perfect knowledge of equilibrium objects (here, the wage function emerging from Nash bargaining); allows small, disciplined deviations from rational expectations. DMP model : the Mortensen-Pissarides-Diamond search and matching model; the standard theory of equilibrium unemployment; criticized for generating insufficient labor market volatility relative to productivity (the Shimer puzzle) and for counterfactual positive tightness-productivity correlation. belief shock : an exogenous shift in agents’ subjective beliefs about future wages; generates employment and wage dynamics independently of current productivity shocks via the self-referential mechanism; introduced as an additional structural shock in the IR-DMP model.

To Own or to Rent? The Effects of Transaction Taxes on Housing Markets

Mon, 01 Jan 0001 00:00:00 +0000

Layer 1 — Summary

Using sales and leasing transaction records for the Greater Toronto Area (2006–2018), this paper finds three novel effects of a higher property transaction tax: higher buy-to-rent transactions alongside lower buy-to-own transactions despite both being taxed, a lower sales-to-leases ratio, and a lower price-to-rent ratio. The empirical identification exploits the City of Toronto’s introduction of a city-level Land Transfer Tax (LTT) in February 2008 — covering only the city and not surrounding GTA municipalities — comparing outcomes on opposite sides of the city border before and after the tax change. A 1.3 percentage-point higher effective LTT rate causes buy-to-rent purchases to rise by 9.3% while owner-occupier purchases fall by 9.6%; the leases-to-sales ratio rises by 26% and the price-to-rent ratio falls by 3.8%. To explain these facts, the paper develops a search model featuring household tenure choice (own vs. rent) subject to heterogeneous credit costs, endogenous homeowner moving decisions, and free entry of buy-to-rent investors; the key mechanism is that the LTT reduces homeowners’ mobility — because owner-occupiers expect to transact multiple times over their lifetimes and thus bear the tax repeatedly — discouraging entry into ownership and raising demand for rentals, which in turn attracts investor entry even though investors too pay the tax, since investors need not re-transact whenever a tenant vacates. The implied deadweight loss is large at 111% of tax revenue, with more than half of this due to distorting decisions to own or rent; taking the rental market into account accounts for losses equal to 73% of tax revenue, which is two-thirds of the total loss.

Layer 2 — Q&A

Q: What are the three novel empirical facts documented in this paper?

A: Using MLS data on both sales and leases in the Greater Toronto Area, the paper documents: (1) a 1.3 pp higher effective LTT rate causes buy-to-rent (BTR) investor purchases to increase by 9.3%, in stark contrast to a 9.6% fall in owner-occupier (buy-to-own) purchases — a divergence that is counterintuitive because both types of buyer are subject to the same tax; (2) the ratio of leases to sales rises by 26%, indicating that rental-market activity increases relative to ownership-market activity; and (3) the price-to-rent ratio falls by 3.8%, meaning house prices decline relative to rents.

Q: What is the empirical identification strategy and why is it credible?

A: The paper uses a geographic regression discontinuity approach comparing communities on opposite sides of the Toronto city border, where the new city-level LTT applies on one side but not the other, in a difference-in-differences framework spanning January 2006–January 2008 (pre-policy) and February 2008–February 2012 (post-policy). The sample is restricted to properties within 3 or 5 km of the boundary. The paper verifies that property characteristics do not differ significantly across the border and that cross-border differences do not change after the LTT, supporting the parallel-trends assumption. The effective LTT rate increase is measured at 1.3 percentage points (assuming 40% first-time buyers, who receive a partial exemption). Buy-to-rent transactions are identified in the MLS data by matching properties that appear in both the sales and leases datasets within an 18-month window following sale.

Q: What is the intuition for why the LTT raises buy-to-rent investment even though it taxes investors?

A: The mechanism hinges on the asymmetry in expected future transaction costs between owner-occupiers and investors. Owner-occupiers face idiosyncratic match-quality shocks — they periodically want to move to a different property as their circumstances or preferences change — so choosing homeownership means expecting to pay the LTT on each future move. This makes homeownership less attractive relative to renting, reducing household entry into the ownership market and increasing demand for rental properties. Investors (landlords), by contrast, do not need to re-transact in the ownership market simply because a tenant moves out; they retain the property and find a new tenant. Investors therefore face a lower expected frequency of LTT payments per year of property holding than owner-occupiers. As a result, the LTT’s negative effect on investor returns is smaller in magnitude than the increase in rental demand it generates. In equilibrium, the price-to-rent ratio falls by enough to attract more BTR investors in spite of the direct cost the tax imposes on them, and investor purchases rise.

Q: How does the LTT affect homeowner mobility (the “lock-in” effect) and what are its welfare implications within the ownership market?

A: The LTT makes existing homeowners more tolerant of poor match quality with their current property, since the cost of moving — paying the tax again — has risen. Moving rates therefore decline as households remain in properties for longer on average. To mitigate future tax costs, buyers also become more selective (“picky”) when initially matching with a property, requiring higher match quality before purchasing. This reduces the frequency of moves but increases the cost and duration of search for new buyers. The welfare consequences within the ownership market are: (a) misallocation of properties among owner-occupiers as average match quality falls because households move less often to renew it; partially offset by (b) higher initial match quality for newly matched buyers, but at the cost of longer search. The LTT-induced distortions within the ownership market account for a loss equal to 38% of tax revenue.

Q: What are the model’s quantitative predictions for the four-year post-reform period, and how do they compare to the empirical estimates?

A: The model is calibrated to the City of Toronto for 2006–8 (homeownership rate ~54%) and simulated for a 1.3 pp LTT increase, with the mobility hazard rate used as the internal calibration target. For the four-year period following the tax change, the model predicts: owner-occupier transactions fall by 14%; buy-to-rent transactions rise by 35%; the leases-to-sales ratio rises by 15%; the price-to-rent ratio falls by 1.6%; and the homeownership rate falls by 0.23 percentage points. These figures are broadly consistent in magnitude with the estimated LTT effects on the variables not directly targeted in calibration (i.e., the transaction-volume and price-to-rent results from the empirical estimation).

Q: What are the long-run (steady-state) effects and why do they differ from the four-year effects?

A: Tenure-choice variables are very slow to adjust because annual flows are small relative to housing stocks. In the new steady state, the homeownership rate falls by 2.4 percentage points and the leases-to-sales ratio rises by 23% — both substantially larger than the four-year effects. By contrast, four-year effects on owner-occupier transactions and the price-to-rent ratio are already close to their new steady states. Buy-to-rent transactions overshoot their steady-state level (the four-year rise of 35% compares to a steady-state rise of 5.1%) because of a one-off surge in investor entry as the rental market absorbs the transition; once the stock of rental properties has adjusted, the flow of new buy-to-rent purchases settles lower.

Q: How are the welfare (deadweight) losses decomposed across distortion channels?

A: The new LTT generates a total welfare loss equivalent to 111% of the extra revenue it raises. The decomposition is: distortions to flows between the rental and ownership markets (i.e., the tenure-choice margin) account for a loss equal to 60% of extra revenue; distortions within the rental market account for 13% of tax revenue; distortions within the ownership market (lock-in and match-quality misallocation) account for 38% of tax revenue. The presence of the rental market in the analysis — encompassing both the across-market and within-rental-market channels — accounts for a loss equivalent to 73% of tax revenue, which is two-thirds of the total loss. The paper characterises this as “large.”

Q: What is the across-market misallocation mechanism behind the 60% welfare loss from tenure distortions?

A: Because owner-occupiers expect to transact more frequently than buy-to-rent investors, the same ad valorem tax falls more heavily on owner-occupiers. In equilibrium, the cost of credit paid by the marginal home-buyer must fall — that is, fewer creditworthy households enter ownership. This displaces some creditworthy households into the rental market, creating a misallocation: properties are allocated away from owner-occupiers (who value them as a place of residence and benefit from match quality) toward rentals intermediated through investors. The welfare loss arises because credit-worthy households who would prefer to own are now renters, and the resource costs of intermediating through investors are incurred unnecessarily.

Q: What policy experiment does the paper consider beyond the baseline LTT analysis?

A: The paper studies an alternative tax structure that imposes a higher LTT rate on buy-to-rent investors relative to owner-occupiers, calibrated to nullify the implicit tax advantage investors enjoy under a uniform rate. By raising barriers to investor entry, this differential tax reduces the across-market welfare losses from lower homeownership. However, the paper notes an important caveat: pushing the investor tax rate ever higher to boost homeownership would ultimately produce large welfare costs in the opposite direction, as households who cannot qualify for mortgage credit (uncreditworthy households) would be displaced into the ownership market by a shortage of rental properties. Investors play a socially valuable role in providing housing access to households who cannot or choose not to bear the costs of credit.

Q: What data source is used and why is it unusually well-suited to this analysis?

A: The paper uses Multiple Listing Service (MLS) records from the Toronto Regional Real Estate Board covering the Greater Toronto Area, 2006–2018. The dataset is distinctive in including both sales transactions and lease transactions, allowing the paper to match the two and construct the novel buy-to-rent identifier. MLS data cover approximately 78% of detached-house transactions in the Toronto Land Registry for 2006–2012, and the rental listings capture over 90% of properties listed on alternative platforms. This combination of sales and lease records is what makes it possible to document the three novel empirical facts and to study both the ownership and rental markets jointly.

Key Concepts

Buy-to-rent (BTR) transaction: In this paper’s definition, a sale in the ownership market where the buyer subsequently lists the same property on the rental market within 18 months. BTR buyers are investors/landlords who supply rental housing by purchasing from the ownership market. Distinct from buy-to-own (owner-occupier purchases) and buy-to-sell (flipping) transactions. Identified in the MLS data by matching address and transaction dates across the sales and leases databases.

Buy-to-own (BTO) transaction: A sale in the ownership market where the buyer occupies the property as a homeowner — the residual category after removing BTR and buy-to-sell transactions from total sales. In the City of Toronto, the fraction of all transactions classified as BTO declined from 89% to 84% between 2006 and 2017.

Effective LTT rate: The mean land transfer tax paid as a percentage of the sales price, combining provincial- and city-level taxes, averaged over detached-house transactions in the City of Toronto and adjusted for first-time buyer exemptions. The introduction of the city-level LTT in February 2008 raised the effective LTT rate by 1.3 percentage points (assuming 40% first-time buyers).

Match quality: In the paper’s search model, the idiosyncratic value a particular household places on a particular property, which evolves stochastically over time. When match quality deteriorates sufficiently, a homeowner wishes to move to a better-matched property. Match quality is the source of the “lock-in” effect: higher transaction taxes raise the threshold quality decline a household is willing to tolerate before moving, reducing mobility. Because investors are not tied to a specific property in the same way (a tenant moving out does not require the investor to transact), this mechanism falls more heavily on owner-occupiers than on BTR investors.

Lock-in effect: The reduction in homeowner mobility caused by a higher transaction tax. Homeowners become more tolerant of deteriorating match quality (stay longer in poorly matched properties) and more selective when initially purchasing (require higher match quality to justify the transaction cost). The paper treats this as operating on the intensive margin of homeownership decisions, contrasted with the extensive margin (the own-vs.-rent choice).

Credit cost / credit friction: Heterogeneous household-level costs of accessing mortgage finance or credit. In the model, a household must pay a credit cost to enter the ownership market. Households with lower credit costs are more likely to choose homeownership; a higher transaction tax effectively raises the total cost of ownership (since it must be paid on each future move), shifting the margin at which the credit cost equals the net benefit of owning, thereby reducing the equilibrium homeownership rate.

Leases-to-sales ratio: The ratio of new lease transactions to sales transactions in the housing market, used as a measure of the relative activity of the rental and ownership markets. A higher ratio indicates more households are being accommodated in the rental market relative to the ownership market. The LTT raises this ratio by 26% in the empirical estimation and 15% in the four-year model simulation, with a steady-state increase of 23%.

Price-to-rent ratio: The ratio of house prices to rents, used as a summary statistic for the relative cost of owning versus renting. In the paper’s model, a fall in the price-to-rent ratio is the price signal that attracts additional buy-to-rent investor entry: as tenure-choice distortions shift more households toward renting, rents rise relative to prices, improving the return to BTR investment until the rental market clears. The LTT lowers the price-to-rent ratio by 3.8% empirically and 1.6% in the four-year model simulation.

Deadweight loss as a fraction of tax revenue: The welfare cost of the LTT measured in units of tax revenue raised, allowing comparison across tax instruments. The paper finds a deadweight loss of 111% of tax revenue for the Toronto LTT. Prior literature, which focused only on the intensive margin (mobility distortions within the ownership market), missed the across-market and within-rental-market channels that together account for 73 percentage points of this total.

Summary based on published open-access version. AI-assisted, human review pending.

What's My Employee Worth? The Effects of Salary Benchmarking

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies how salary benchmarking tools — products that reveal aggregate market pay statistics for specific job titles — affect employee compensation. The research question is whether firms’ access to such tools causally changes how they set salaries, and what this implies about information frictions in labor markets and the policy debate over benchmarking regulation.

The authors collaborated with the largest U.S. payroll processing company (serving 650,000 firms and 20 million workers), exploiting the staggered roll-out of a proprietary Compensation Benchmark Tool. The tool aggregates payroll data into salary benchmarks by standardized job title, with the median base salary as its most prominent statistic. The study draws on three linked administrative datasets: payroll records (January 2017 to July 2021), tool usage logs (September 2019 to August 2021), and historical benchmark snapshots. The main analytical sample covers new hires at 586 treatment firms that gained tool access and 1,419 matched control firms that did not, within a 10-quarter window around each firm’s onboarding date.

The identification strategy is difference-in-differences, exploiting three sources of variation: which firms gain access; the staggered timing of access (driven by the arbitrary order in which sales representatives introduced the tool); and within treatment firms, whether a specific position was actually searched in the tool. New hires are classified into Searched positions (5,266 hires at treatment firms for positions eventually looked up), Non-Searched positions (39,686 hires at treatment firms for positions not looked up), and Non-Searchable positions (156,865 hires at control firms). Event-study analyses confirm flat pre-trends across all groups, supporting causal interpretation.

The primary finding is that benchmark access reduces salary dispersion around the median market benchmark by 25%. Before onboarding, the average absolute deviation of offered salaries from the median benchmark in Searched positions was 19.8 percentage points (pp). After onboarding, this fell to 14.9 pp — a drop of 5.0 pp using Non-Searched positions as control (p-value < 0.001) and 6.2 pp using Non-Searchable positions as control (p-value < 0.001). Compression runs in both directions: firms previously paying above the benchmark reduce salaries toward the median, and firms previously paying below raise salaries toward the median. The probability of setting a salary within 2.5% of the median benchmark nearly doubled, from 11.6% to 22.1% after onboarding.

Effects are heterogeneous by skill level. For low-skill positions (approximately 42% of the sample, e.g., bank teller, receptionist), dispersion falls from 14.5 pp to 8.7 pp — a 40% reduction. For high-skill positions (e.g., software developer), dispersion falls from 24.0 pp to 20.5 pp — a 14.6% reduction. For low-skill positions, compression from below dominates, producing a net average salary increase of +5.0% to +6.7% (p-values 0.014 and 0.001 depending on control group). For high-skill positions, the average salary effect is small and statistically insignificant overall. Twelve-month retention rates for low-skill workers increase by 6.6 to 6.8 pp after benchmarking, and the implied retention elasticity is consistent with prior literature estimates.

The authors propose a theoretical model to rationalize these findings. Firms are assumed uncertain about the wage distribution (aggregate uncertainty), with private information about their own value of filling a position and affiliated valuations across firms. In equilibrium, firms with higher values make higher offers — generating wage dispersion among identical workers without monopsony power, efficiency wages, or amenity differences. When a firm gains benchmark access, it adjusts its offer toward the threshold wage needed to hire, compressing offers from both sides. In the full-information equilibrium where benchmarks are common knowledge, the mean salary is weakly higher than without benchmarks, because the marginal firm had previously underestimated labor market tightness and offered too little, capturing extraordinary profits. Benchmarking eliminates these informational rents, intensifying competition and raising average pay.

The scope of the empirical findings is restricted to new hires at firms in the top quartile of U.S. firm size by employment, across all industries and U.S. states, over 2017–2020. The estimated effect is the incremental causal impact of one additional high-quality benchmarking source, since most firms already had access to some pay information through other channels.

Q: What is the main causal finding of the paper? A: Access to the salary benchmarking tool reduces the absolute deviation of new-hire salaries from the median market benchmark by approximately 25%. Specifically, average dispersion in Searched positions falls from 19.8 pp before onboarding to 14.9 pp after, a drop of 5.0 pp (using Non-Searched controls, p-value < 0.001) or 6.2 pp (using Non-Searchable controls, p-value < 0.001). The two estimates are statistically indistinguishable from each other, and both are robust to a wide range of specification checks.

Q: How does compression operate — does it raise or lower salaries? A: Compression operates in both directions. Firms that would otherwise have paid above the median benchmark reduce salaries toward the median (“compression from above”), and firms that would otherwise have paid below the median benchmark raise salaries toward the median (“compression from below”). The probability of offering a salary within 2.5% of the median benchmark nearly doubled, from 11.6% before onboarding to 22.1% after.

Q: What is the identification strategy, and why is the treatment considered as good as random? A: The authors use a difference-in-differences design with three sources of variation: which firms gain tool access, the staggered timing of access, and whether specific positions were actually searched within a treatment firm. The payroll company introduced the tool through sales representatives contacting clients in an arbitrary order, not in response to firm characteristics or outcomes. This is corroborated by empirical tests: event-study pre-trends for Searched versus Non-Searched (and Non-Searchable) positions are flat and statistically indistinguishable from zero (pre-treatment coefficients of -0.346 and -0.310, p-values 0.749 and 0.604, respectively).

Q: How large are the effects for low-skill versus high-skill positions? A: For low-skill positions (approximately 42% of the sample, e.g., bank teller, receptionist), dispersion drops from 14.5 pp to 8.7 pp — a 40% decline (p-value < 0.001). For high-skill positions (e.g., software developer), dispersion drops from 24.0 pp to 20.5 pp — a 14.6% decline (p-value = 0.021). The larger effect for low-skill positions is consistent with anecdotal accounts from compensation managers, who report treating low-skill candidates as interchangeable and therefore wanting to offer exactly the market rate.

Q: Does benchmarking raise or lower average salaries? A: On average across all skill levels, the effect on mean salary is small and statistically insignificant: -0.2% (p-value = 0.756) using Non-Searched controls and +1.7% (p-value = 0.308) using Non-Searchable controls. For low-skill positions specifically, average salaries increase by +5.0% (p-value = 0.014) using Non-Searched controls and +6.7% (p-value = 0.001) using Non-Searchable controls. This net increase for low-skill workers reflects compression from below dominating compression from above in that subset.

Q: What are the effects on employee retention? A: For low-skill workers, benchmarking increases the probability of remaining employed at the hiring firm 12 months after the hire date by +6.6 pp (p-value = 0.101) using Non-Searched controls and +6.8 pp (p-value = 0.029) using Non-Searchable controls. The implied retention elasticity from the ratio of salary and retention effects is consistent with average estimates in the prior literature (Sokolova and Sorensen, 2021). No retention effects are reported for high-skill positions.

Q: What is the theoretical mechanism through which aggregate uncertainty generates wage dispersion? A: The model assumes a unit mass of firms simultaneously making wage offers to a mass Q < 1 of workers, with only the top Q offers accepted. Firms have private information about their value of filling the position, and values are affiliated (correlated in the sense of Milgrom and Weber, 1982). Because each firm is uncertain about what other firms will offer, higher-value firms rationally form higher beliefs about the prevailing wage distribution and make higher offers. This generates equilibrium wage dispersion among identical workers without monopsony power, efficiency wages, or amenity differences.

Q: What does the model predict about the equilibrium effects of benchmarking when all firms have access? A: When the benchmark is common knowledge, all firms make offers with full information about the wage distribution. The firms with the highest values win workers at a uniform wage that makes the marginal firm indifferent between hiring and not hiring. The model proves that the mean salary is higher in expectation under the benchmark equilibrium than in the no-benchmark equilibrium. The intuition is that without benchmarks, the marginal firm underestimates labor market tightness, offers less than the full-information competitive wage, and thereby captures extraordinary profits; benchmarking eliminates those rents and intensifies competition.

Q: What are the policy implications of the findings regarding antitrust concerns? A: In 2023, the DOJ and FTC rescinded a long-standing antitrust “safety zone” for salary benchmarks due to concerns that they could facilitate wage collusion. A 2021 executive order had mandated that agencies consider procompetitive effects as well. The authors’ model addresses the collusion concern directly: in equilibrium, benchmarking raises (not lowers) average salaries. The empirical evidence is consistent with this — low-skill workers see average salary increases of 5-7% after benchmarking — suggesting a procompetitive justification for the tools.

Q: How robust are the main results? A: The main estimates are robust across a wide range of specification checks, including alternative winsorization levels, log-difference and binary (>10% deviation) dependent variables, heteroskedasticity-robust standard errors, exclusion of controls, inclusion of firm fixed effects, exclusion of tipping positions, restriction to Searched positions only, dropping SOC reweighting, and age restrictions. Two additional pieces of evidence corroborate the quasi-experimental findings: a survey experiment with SHRM HR managers shows that hypothetical benchmarks compress stated salary offers from both above and below; and quasi-random benchmark shocks (when large firms abruptly raise a position’s base salary by 10% or more) cause firms with tool access to converge to the new benchmark faster than firms without access.

Q: What does the survey of HR managers reveal about how firms use benchmarks? A: In a survey of 2,696 HR professionals conducted through SHRM’s research panel, 87.6% of those involved in salary-setting report using salary benchmarks. The vast majority (97.4%) use benchmarks to set pay for new hires. The most popular sources are industry surveys (68.0%) and free online data (58.1%), with payroll data services used by 23.2%. The median salary is ranked the most important benchmark statistic by 56.73% of respondents. Most respondents apply filters by state (84.15%) and industry (87.33%) when using the tool.

Q: What are the main sources of potential attenuation or amplification bias in the estimated effects? A: Attenuation bias may arise because (1) the benchmark tool studied is among the most advanced available, so firms already had some wage information from other sources, meaning the estimates capture only the incremental effect of one additional high-quality source; and (2) not all positions at treatment firms were searched, so the sample is restricted to positions where firms actually engaged with the benchmark. Potential upward bias could arise if firms adopting the tool were also undergoing broader HR system changes, but the flat event-study pre-trends argue against this explanation.

Salary Benchmarking: The practice of using aggregated market pay data — provided by third parties such as payroll processors, consulting firms, or online platforms — to identify typical salaries for specific job titles and set internal pay accordingly. In the paper’s context, this refers specifically to an online tool that allows employers to look up the median and distributional statistics of base salaries for standardized position titles, filtered by industry and state.

Aggregate Uncertainty: The paper’s label for a distinct source of information friction in which firms are uncertain about the distribution of wages offered by other firms in the market — as opposed to uncertainty about individual worker characteristics. This uncertainty is assumed to be the primitive that generates equilibrium wage dispersion in the model, and its resolution through benchmarking is the mechanism driving the empirical results.

Salary Dispersion (around the benchmark): Measured empirically as the average absolute percentage difference between a new hire’s starting base salary and the median market benchmark for that position, expressed in percentage points. This is the paper’s primary outcome variable. Dispersion reflects firms’ deviation from the market rate in either direction.

Compression from Above / Compression from Below: Compression from above refers to the reduction in salaries at firms that would otherwise have paid more than the median benchmark after gaining benchmark access. Compression from below refers to the increase in salaries at firms that would otherwise have paid less than the median benchmark. Both directions of adjustment are documented empirically and are predicted by the model.

Searched / Non-Searched / Non-Searchable Positions: The paper’s classification of new hires into three groups for identification purposes. Searched positions are those at treatment firms for which the firm actually looked up the benchmark. Non-Searched positions are at treatment firms but were not looked up, serving as a within-firm control. Non-Searchable positions are at control firms with no tool access, serving as a cross-firm control.

Affiliation (across firm values): A technical condition borrowed from auction theory (Milgrom and Weber, 1982) used in the paper’s model to characterize the correlation structure of firms’ private valuations of filling a position. Affiliation implies that when one firm has a high value, others are also more likely to have high values, and hence to offer high wages — generating the model’s equilibrium wage dispersion.

Procompetitive Effect of Benchmarking: The paper’s term for the welfare-improving property of salary benchmarks identified in the model: by resolving aggregate uncertainty, benchmarks cause the marginal firm to offer closer to the full-information competitive wage, reducing extraordinary profits that arise from informational rents and raising the mean salary in equilibrium. This is the key concept in the paper’s contribution to the antitrust policy debate.

D83 | Macro Paper Warehouse

A Cognitive Theory of Reasoning and Choice

Central bank communication by ??? The economics of monetary policy leaks

Layer 1 — Overview

Layer 2 — Q&A

Key Concepts

Customer accumulation, returns to scale, and secular trends

Dynamic Concern for Misspecification

Layer 1 — Overview

Layer 2 — Q&A

Key Concepts

From Doubt to Devotion: Trials and Learning-Based Pricing

Ideological Alignment and Evidence-Based Policy Adoption

Market Segmentation through Information

Misspecified Expectations among Professional Forecasters

In depth

Q1. What question does the paper address?

Q2. What models are compared?

Q3. What exactly is “misspecified expectations” in this paper?

Q4. What data and sample are used?

Q5. How are the models estimated and compared?

Q6. What are the baseline real GDP growth results?

Q7. Does the result hold across other macroeconomic variables?

Q8. Why does misspecified expectations fit better, and for which variables especially?

Q9. Does the model perform out of sample?

Q10. Could the apparent advantage of misspecified expectations just reflect learning?

Q11. What additional moments does misspecified expectations match?

Q12. What are the scope conditions and limitations the author stresses?

Q13. What does the author conclude and recommend?

Key concepts

Online Business Models, Digital Ads, and User Welfare

Peer Effects in Consideration and Preferences

Screening and Segmenting: A Consumer Surplus Perspective

Silence to Solidarity: How Communication About a Minority Affects Discrimination

Soft landing and inflation scares

Layer 1 — Overview

Layer 2 — Q&A

Key Concepts

The role of wage expectations in the labor market

In depth

Q1. What is internal rationality and how does it differ from standard rational expectations?

Q2. What is the empirical failure of the standard DMP model that motivates the paper?

Q3. What is the self-referential belief mechanism and how does it generate extra dynamics?

Q4. What is the empirical validation of the model’s wage expectations?

Key concepts

To Own or to Rent? The Effects of Transaction Taxes on Housing Markets

Layer 1 — Summary

Layer 2 — Q&A

Key Concepts

What's My Employee Worth? The Effects of Salary Benchmarking