L13 | Macro Paper Warehouse

Competing under Information Heterogeneity: Evidence from Auto Insurance

Mon, 01 Jan 0001 00:00:00 +0000

This paper studies imperfect competition in selection markets where competing firms have heterogeneous information about consumers — a layer of asymmetry distinct from the classic buyer-seller information gap. The central questions are: how do inter-firm information asymmetries shape equilibrium pricing, consumer sorting, and market efficiency; and whether a centralized bureau that aggregates and equalizes firms’ risk information can promote competition and improve welfare.

The empirical setting is the Italian mandatory motor vehicle liability insurance market (Responsabilità Civile Auto). The authors use the IPER dataset from IVASS, a nationally representative panel of matched insurer-insuree contracts covering 124,428 liability insurance contracts for new customers in the province of Rome from 2013 to 2021. The panel tracks consumers across insurer switches, enabling construction of individual-specific risk estimates from ex-post claim records using Poisson regressions for claim frequency and log-normal regressions for claim severity. The analysis focuses on the top 10 largest firms plus a composite fringe firm.

The paper’s empirical strategy proceeds in three stages. First, individual risk types are estimated from multi-year claim panels. Second, demand parameters — price sensitivity and firm-level unobserved product attributes — are recovered using a novel fixed-point algorithm (extending Berry et al. 1995) that infers the full offered-price distribution from observed transaction prices alone, without parametric restrictions on price distributions across firms. Third, supply-side parameters — pricing coefficients, signal variances, and cost parameters — are identified by exploiting the monotone mapping between offered prices and private signals, borrowing from the nonparametric auction literature.

The model features firms that each draw a private Gaussian signal about a consumer’s true risk type theta, with firm-specific signal standard deviation sigma_j. Lower sigma_j means higher information precision. Firms set prices as a linear function of their posterior risk rating: p_j = alpha_j + beta_j * E(theta | theta_j, D=j). Firms simultaneously choose pricing coefficients to maximize expected profits.

Key empirical findings: (1) Firms differ substantially in how sensitively their premiums respond to realized consumer risk — a reduced-form measure of information precision — with Figure 2 showing wide cross-firm variation in premium-to-risk coefficients. (2) Structural estimation confirms substantial heterogeneity in signal standard deviations sigma_j across all 11 firms. Firms with less accurate risk-rating algorithms (higher sigma_j) tend to have more efficient cost structures (lower claim-processing cost parameter k_j), generating distinct comparative advantages. (3) Baseline pricing coefficients alpha_j and risk-sensitivity coefficients beta_j vary dramatically across firms. (4) Senior drivers are less price sensitive; urban drivers are more price sensitive. Lower-risk consumers show stronger preferences for Firms 3 and 5, while higher-risk consumers disproportionately choose Firm 8.

Counterfactual simulations assess three information policies relative to the baseline. Under a centralized risk bureau — which collects each firm’s signal, aggregates them weighted by precision, and distributes the combined signal equally — average premiums fall by 21.6% and consumer surplus rises by 15.7%. The efficiency benchmark (firms observe true risk perfectly) yields a 25.7% premium reduction and a 16.9% consumer surplus gain, so the bureau recovers almost all the efficiency gap. The privacy benchmark (all firms restricted to the coarsest signal in the market) raises surplus for high-risk consumers by 6.9% but harms low-risk consumers.

The bureau’s price reduction operates through two channels: it eliminates the market power that accrues to firms with superior private information, and it aligns firms’ risk evaluations, enabling sharper undercutting. The bureau also reduces average costs by 12 euros per contract by enabling more efficient insurer-insuree matching — cost-efficient claim processors can better target the consumer types they have a comparative advantage in serving.

The analysis is confined to new customers in Rome’s provincial market to avoid complications from dynamic pricing and consumer-firm learning. The model abstracts away from optional contract clauses (treated as observable characteristics) and does not model the specific mechanisms generating information heterogeneity.

Q: What is the paper’s core research question? A: The paper asks how information asymmetries between competing firms (not just between buyers and sellers) shape equilibrium pricing strategies, consumer sorting, and market efficiency in a selection market, and whether a centralized bureau that equalizes firms’ access to aggregated risk information can improve competition and welfare. This extends the classic Akerlof-Rothschild-Stiglitz framework by introducing a second layer of asymmetry — across sellers themselves.

Q: Why is the Italian auto insurance market well suited for this study? A: Italy mandates liability insurance for all drivers and prohibits rejections, so the analysis focuses entirely on how consumers sort across insurers rather than on participation margins. The IPER dataset from IVASS is a nationally representative panel tracking policyholders even across insurer switches, providing both premium and ex-post claim records needed to construct individual risk types. The market has roughly 50 competing firms using demonstrably heterogeneous pricing algorithms, documented through a survey of major insurers and reduced-form regressions.

Q: How do the authors measure firm-level information precision in the reduced-form analysis? A: They estimate individual-specific risk types from a panel of claim records using Poisson regressions (claim frequency) and log-normal regressions (claim severity), then regress each firm’s premiums on those estimated risk measures. Firms whose premiums respond more sensitively to realized risk are inferred to have higher information precision. Figure 2 shows that these premium-to-risk coefficients vary significantly across firms — for example, Firm 7’s premiums are considerably more sensitive to risk than Firm 8’s — providing reduced-form evidence of heterogeneous information precision before any structural estimation.

Q: What is the structural model’s signal structure? A: Each firm j draws a private signal theta_j ~ N(theta, sigma_j^2) about a consumer’s true risk type theta, where sigma_j is the firm-specific signal standard deviation. A smaller sigma_j means higher precision. Signals are independent across firms conditional on theta, analogous to common-value auctions where firms receive noisy estimates of a shared unknown value (expected claim payouts). The parameter sigma_j is the key structural object the paper identifies and estimates.

Q: What is novel about the demand estimation strategy? A: Standard demand estimation assumes the same price is offered to all consumers or that the full price menu is observed. Here, only transaction prices are observed — the prices of unchosen insurers are not in the data. The authors apply the Wu and Xin (2024) fixed-point algorithm, which jointly estimates consumers’ sorting probabilities, offered price distributions, and demand parameters by adding an outer loop over sorting propensities to the Berry (1994) contraction mapping. No parametric restrictions are imposed on the offered price distributions, and they are allowed to vary fully across firms.

Q: How are firms’ signal variances identified separately from pricing coefficients? A: There is a one-to-one mapping between a firm’s offered price and its signal (prices increase monotonically in the signal, analogous to bids in auctions). After recovering the offered price distribution from the demand step, the authors observe price dispersion at a fixed risk level. By focusing on average prices conditional on each risk level, signal noise averages out, identifying the pricing coefficients beta_j. The residual price dispersion at fixed risk then identifies signal variance sigma_j^2.

Q: What does structural estimation reveal about the relationship between information precision and cost efficiency? A: Firms with higher signal standard deviations (less precise risk evaluation) tend to have lower claim-processing cost parameters k_j — they are more efficient at handling claims. This creates distinct comparative advantages: some firms excel at risk identification but face higher processing costs, while others process claims cheaply but evaluate risk less precisely. This heterogeneity means information-equalizing policies have differentiated firm-level impacts.

Q: What are the quantitative effects of the centralized risk bureau on premiums and consumer surplus? A: The bureau reduces average premiums by 21.6% relative to baseline and increases consumer surplus by 15.7%. The efficiency benchmark — where firms observe consumers’ true risk perfectly — produces a 25.7% premium reduction and a 16.9% consumer surplus gain. The bureau therefore closes nearly all of the gap to the first-best allocation in surplus terms (15.7% vs. 16.9%).

Q: Through what mechanisms does the bureau reduce prices? A: Two distinct channels are identified. First, equalizing information precision eliminates the informational market power held by firms with superior signals, compelling them to compete more aggressively on price. Second, when all firms share the same risk evaluation of a consumer, they can undercut each other more precisely, which intensifies price competition further. Both channels operate simultaneously under the bureau.

Q: How does the bureau affect consumer surplus distribution across risk types? A: The bureau primarily benefits low-risk consumers because improved information allows firms to price discriminate more accurately on risk type, lowering prices for those who are low risk. High-risk consumers see smaller benefits and may face relatively higher premiums. This contrasts with the privacy benchmark, where restricting all firms to the coarsest signal in the market raises high-risk consumers’ surplus by 6.9% — because it becomes harder for firms to distinguish them from low-risk consumers.

Q: What is the cost efficiency effect of the bureau? A: Under the centralized risk bureau, average costs per contract fall by 12 euros. This reflects more efficient insurer-insuree matching: when firms have equal and better information, those with cost advantages in claims processing can better identify and attract the consumer types they are relatively best equipped to serve. The authors note that given the scale of the Italian auto insurance market (approximately 31 million contracts annually), this per-contract saving implies a substantial aggregate impact.

Q: What happens to firm profits under the bureau, and is the impact uniform? A: Average profits decline overall due to lower prices. However, the impact is heterogeneous across firms. Firms that rely most heavily on superior information precision — often smaller, more specialized firms — experience greater profit losses, since the bureau most directly erodes their competitive advantage.

Q: How does the privacy benchmark differ from the bureau scenario? A: The privacy benchmark simulates a regulation that restricts all firms to using only basic consumer information, setting signal variance to the highest level observed in the market. Unlike the bureau (which improves and equalizes information), this benchmark degrades information uniformly. It produces opposite distributional effects: high-risk consumers gain 6.9% in surplus as cross-subsidization from low-risk to high-risk consumers increases, while low-risk consumers are worse off.

Q: Why does the paper focus on new customers only? A: Focusing on new customers avoids complications from dynamic pricing, where insurers update premiums based on accumulated claim history with a specific consumer, and from consumer-firm learning dynamics. This follows standard practice in the empirical asymmetric information literature, as cited in Chiappori and Salanie (2000) and Crawford et al. (2018).

Q: How does this paper relate to and extend prior work on selection markets? A: Prior empirical work on imperfect competition in selection markets — including Einav et al. (2010), Crawford et al. (2018), and related studies — assumes that competing firms have symmetric information about consumers. This paper is described as introducing the first tractable empirical framework for analyzing selection markets where firms have heterogeneous information. It also incorporates multidimensional cost heterogeneity on the supply side, adding to work by Salanié (2017) and Nelson (2025).

Q: What do the reduced-form regressions reveal about pricing heterogeneity across insurers? A: Firm-level regressions of premiums on observable risk factors show R-squared values ranging from 0.39 to 0.59. Estimated coefficients on key risk factors vary dramatically: being one year older reduces premiums by 0.25 to 1.68 euros depending on the firm; a higher bonus-malus class increases premiums by 12 to 32 euros; one additional accident in the previous five years raises premiums by 74 to 181 euros. These ranges reflect genuine differences in actuarial algorithms, not just sampling variation.

Q: What is the bonus-malus system and why does its saturation matter for the paper’s setting? A: Italy’s bonus-malus (BM) system assigns drivers to one of 18 risk classes based on accident history. Because approximately 80% of policyholders are in the best class (BM class 1), the public BM system provides limited granularity for risk evaluation. This saturation creates strong incentives for firms to develop proprietary risk-rating algorithms, which is the institutional basis for the substantial information heterogeneity that the paper documents and models.

Information Precision (sigma_j): In the paper’s model, the firm-specific parameter measuring the dispersion of a firm’s private signal about a consumer’s true risk type. Firm j draws signal theta_j ~ N(theta, sigma_j^2); 1/sigma_j is information precision. A smaller sigma_j means the firm more accurately identifies consumer risk. This is not merely a theoretical construct — the paper identifies and estimates sigma_j structurally for each of the 11 firms.

Heterogeneous Information: The condition where competing firms hold signals of different precision about the same consumer’s unobserved risk type, introducing asymmetry not just between buyers and sellers (as in Akerlof 1970) but among sellers themselves. This is the paper’s central departure from prior literature on selection markets, which assumed symmetric information among firms.

Centralized Risk Bureau: A policy institution that collects each firm’s analyzed risk signal, aggregates them weighted by each firm’s information precision (producing a combined signal more precise than any individual firm’s signal), and makes the aggregated information equally accessible to all firms. The bureau is the paper’s primary policy counterfactual, and it is modeled as equalizing both the level and heterogeneity of information precision across competitors.

Offered vs. Accepted Price Distribution: A distinction central to the paper’s identification strategy. The accepted price distribution is what is observed in transaction data — prices conditional on the consumer having chosen that firm. The offered price distribution is the full set of prices the firm would charge across all consumers, including those who did not select it. The paper recovers the offered distribution from the accepted distribution using a fixed-point algorithm, without imposing parametric restrictions.

Selection Loop: The paper’s methodological extension of the Berry (1994) BLP contraction mapping for mean utilities. An outer loop iterates over consumers’ sorting propensities to jointly recover offered price distributions, sorting probabilities, and demand parameters when only transaction prices are observed. This technique handles the endogeneity of which prices are accepted.

Risk Rating: The firm’s posterior assessment of a consumer’s expected cost, computed as the posterior mean E(theta | theta_j, D=j) — the expected true risk type conditional on the firm’s private signal and the consumer selecting that firm. Firms set prices as a linear function of their risk rating: p_j = alpha_j + beta_j * E(theta | theta_j, D=j).

Comparative Advantage (information vs. cost): The paper’s finding that firms with lower information precision (higher sigma_j) tend to have more efficient cost structures (lower k_j), and vice versa. This cross-sectional negative correlation between information advantage and cost advantage means that policy interventions that equalize information precision shift the basis of competition from information asymmetry to cost specialization.

Competitive Advertising and Pricing

Mon, 01 Jan 0001 00:00:00 +0000

Hwang, Kim, and Boleslavsky study how firms in an oligopoly simultaneously choose prices and advertising strategies, where advertising is modeled as the choice of how much product information to disclose to consumers. The paper extends the canonical Perloff-Salop (1985) random-utility discrete-choice framework — in which n firms engage in Bertrand competition for a consumer whose value for each product is independently drawn from a common distribution F — by endogenizing the information environment: each firm may choose any mean-preserving contraction (MPC) of F as its advertising strategy, with no structural restriction on feasible content. This full flexibility, drawn from the information design literature, allows each firm to choose the consumer’s effective value distribution, ranging from full information (choosing F itself) to complete concealment (a degenerate distribution at the mean). The model is silent on advertising costs, which are assumed to be zero throughout.

The central result is that intense competition forces firms to provide precise product information. Formally, the full information equilibrium — in which every firm chooses F — exists in the advertising game (the subgame in which prices are fixed symmetrically) if and only if F^(n-1) is convex over its support. Because F^(n-1) represents the distribution of the consumer’s best outside option, convexity means the consumer likely faces an attractive alternative, incentivizing each firm to maximize the chance of offering the highest possible value. Crucially, this convexity condition is guaranteed to hold when n is sufficiently large, regardless of the shape of F, because the power function x^(n-1) becomes more convex as n rises. This establishes that under sufficiently intense competition, full information disclosure is the unique symmetric equilibrium.

The general equilibrium advertising strategy G* — which governs cases where full information is not an equilibrium — satisfies two necessary and sufficient conditions: (i) (G*)^(n-1) is convex over the support of G*, and (ii) for almost all values in the support, G* either coincides with F (where the MPC constraint binds, preventing further dispersion) or (G*)^(n-1) is locally linear (where the firm is locally risk-neutral and has no incentive to alter its distribution). The paper proves existence and uniqueness of G* for any F satisfying the stated regularity conditions (density positive, continuously differentiable, bounded, with finitely many peaks). When F has log-concave density, a unique symmetric pure-price equilibrium (p*, G*) exists in the full game.

The paper demonstrates that strategic advertising has ambiguous implications for prices and consumer welfare. Strategic advertising necessarily reduces social surplus through information loss, since consumers select suboptimal products with positive probability when G* differs from F. However, it compresses the support of the value distribution relative to F, which — by a new result (Proposition 3) — tends to lower the equilibrium price. Offsetting this, strategic advertising also redistributes marginal consumers in ways that may raise or lower the price. In the duopoly case with power distributions F(v) = v^alpha on [0,1], strategic advertising lowers the market price if and only if alpha > 1/sqrt(2) (approximately 0.7071), and raises consumer surplus if and only if alpha > 0.7928.

The paper examines three extensions: (1) a binding consumer outside option, (2) multi-unit (k-out-of-n) demand, and (3) asymmetric firms with two types. In all three cases, full information cannot be a strict equilibrium for any finite n under the relevant structural condition, yet the equilibrium distribution G* converges pointwise to F as n tends to infinity, preserving the paper’s core asymptotic insight.

Q: What is the main research question? A: The paper asks how much product information firms will voluntarily disclose when they compete both on price and advertising content in an oligopoly. Unlike the monopoly literature, the oligopoly context creates strategic interdependencies — each firm’s optimal disclosure depends on rivals’ disclosure choices — that the paper characterizes fully.

Q: How is advertising modeled, and why use mean-preserving contractions? A: Each firm’s advertising strategy is modeled as a choice of any mean-preserving contraction (MPC) of the true value distribution F. An MPC preserves the expected value but reduces dispersion, capturing the idea that a firm can selectively conceal information (moving toward a degenerate distribution) but cannot fabricate value dispersion beyond what F allows. Because consumers are risk-neutral and buy based on expected values net of prices, this MPC formulation captures full flexibility in information design without loss of generality.

Q: What is the precise necessary and sufficient condition for the full information equilibrium in the advertising game? A: The full information equilibrium — in which every firm chooses F — exists if and only if F^(n-1) is convex over its support [v, v̄]. The “only if” direction follows from Lemma 1: in any equilibrium, (G*)^(n-1) must be convex, so if F^(n-1) is not convex, F is not an equilibrium. The “if” direction follows because a convex F^(n-1) makes each firm locally risk-loving, so no MPC of F yields a higher payoff than F itself.

Q: Why does sufficiently intense competition force full information disclosure? A: For any distribution F with positive, continuously differentiable, bounded density f with bounded derivative f’, the second derivative of F^(n-1) satisfies F(v)^(n-1)’’ >= (n-1)F(v)^(n-3)[(n-2)epsilon^2 - M], where epsilon = min f(v)^2 > 0 and M = max |f’(v)| < infinity. This expression is strictly positive for n sufficiently large, so F^(n-1) is convex and the full information equilibrium exists. Economically, with many competitors each firm wins the consumer only when it offers the highest possible value, so providing full information is optimal.

Q: What are the two necessary and sufficient properties characterizing the general equilibrium advertising strategy G?* A: First (Lemma 1), (G*)^(n-1) must be convex over the support of G* — this prevents any firm from profitably concentrating mass to reduce dispersion. Second (Lemma 2), for almost all values in the support, either G* = F locally (the MPC constraint binds, preventing further dispersion) or (G*)^(n-1) is locally linear (the firm is locally risk-neutral and indifferent over distributions with the same local mean). Theorem 1 proves these two conditions are both necessary and sufficient, and that G* is unique for any F satisfying the stated regularity conditions.

Q: What structure does G take when F^(n-1) has strictly quasi-concave density?* A: By Corollary 2(1), there exists a cutoff v* in [v, v̄] such that G*(v) = F(v) for v <= v* (full information below the cutoff) and (G*)^(n-1) is linear above v*. As n increases, v* rises, meaning the region of full disclosure expands, and G* increases in convex order — so consumers receive strictly more information. One immediate implication is that consumer surplus strictly increases in n: consumers benefit both from more options and from more accurate information about each product.

Q: What happens when F^(n-1) is concave? A: By Corollary 3, when F^(n-1) is concave, (G*)^(n-1) is linear over the entire support, with lower bound v. In the illustrative Example 1 (truncated exponential with n=2), this yields G* = U[0, 2*mu_F] — a uniform distribution on an interval whose upper bound is twice the mean of F.

Q: Does strategic advertising raise or lower equilibrium prices, and consumer surplus? A: Both effects are ambiguous and depend on the shape of F. Strategic advertising compresses the support of the value distribution (since G* is an MPC of F), which by Proposition 3(1) tends to lower equilibrium prices. But it also reshapes the distribution of marginal consumers, which may raise or lower prices. In the power distribution example (n=2, F(v) = v^alpha on [0,1]), strategic advertising lowers the market price if and only if alpha > 1/sqrt(2) ≈ 0.7071, and raises consumer surplus if and only if alpha > 0.7928. Thus even with deadweight loss from information suppression, consumers can be better off under strategic advertising than under forced full disclosure.

Q: What does Proposition 3 contribute about equilibrium prices in the Perloff-Salop model? A: Proposition 3 delivers two results about how the distribution of marginal consumers (integral (F^(n-1))’ dF) determines equilibrium prices. First, the measure of marginal consumers decreases if F is proportionally stretched over a larger support, confirming that longer support raises equilibrium prices. Second — presented as novel — among all distributions with support in [v, v̄], the power distribution F(v) = ((v-v)/(v̄-v))^(2/n) minimizes the measure of marginal consumers, corresponding to the maximum equilibrium price. The key property is that marginal consumers are uniformly distributed under this power distribution, and any deviation from uniformity allows a “flattening” adjustment that increases the measure of marginal consumers and lowers the price.

Q: Under what condition does the full game (price plus advertising) have a unique symmetric pure-price equilibrium? A: Theorem 2 states that log-concavity of the density f is sufficient for existence and uniqueness of a symmetric pure-price equilibrium (p*, G*) as characterized in Theorems 1 and 2. Log-concavity ensures that the equilibrium distribution G* has a convex-linear structure (as in Corollary 2), which preserves log-concavity of each firm’s profit function even under compound deviations (simultaneous changes to both price and advertising strategy), making the first-order conditions sufficient for global optimality.

Q: Can strategic advertising create or destroy pure-price equilibria relative to the Perloff-Salop benchmark? A: Yes, both directions are possible. When F^(n-1) is convex (so G* = F), equilibrium existence in the Perloff-Salop (PS) model is necessary but not sufficient for existence in the full model, because compound deviations (changing both price and advertising) may be profitable even when pure price deviations are not. Conversely, when G* differs from F, the changed distribution of marginal consumers can sustain an equilibrium in the full model even when none exists in PS. Appendix E of the paper provides a specific example of the latter phenomenon.

Q: What happens with a binding consumer outside option? A: Proposition 4 shows that a full information equilibrium never exists in the advertising game when the consumer has a binding outside option (p* in (v, v̄)). The firm’s value function acquires a discrete jump at p* due to the indicator 1_{v >= p*}, making it optimal to pool mass around p* rather than disclose fully. Nevertheless, Proposition 5 proves that G* converges pointwise to F as n tends to infinity, because the jump of size F(p*)^(n-1) vanishes exponentially fast as n grows.

Q: Does the full information result survive multi-unit demand? A: No. Proposition 6 shows that with k > 1 units demanded (out of n products), the full information equilibrium never exists for any finite n or F. The reason is that phi’(v; F) — the firm’s marginal value of offering value v — is zero at v̄ when k > 1, so the firm can profitably pool values near the top of the support. However, Proposition 7 shows that G* converges pointwise to F as n tends to infinity (with k fixed), preserving the asymptotic full information result.

Q: What happens with asymmetric firms differing in their value distribution supports? A: Proposition 8 shows a sharp dichotomy. If both firm types share the same upper bound of their value supports (v̄_1 = v̄_2), the full information equilibrium exists whenever both F_1^(n1-1) and F_2^(n2-1) are convex. If the supports have different upper bounds (v̄_1 < v̄_2), the full information equilibrium never exists regardless of n_1 and n_2, because type-2 firms face a downward kink in their winning probability at v̄_1 and always have an incentive to pool mass there. The authors conjecture that G*_1 and G*_2 still converge to F_1 and F_2 asymptotically but do not prove this due to technical complexity.

Q: How does this paper relate to Ivanov (2013)? A: Ivanov (2013) also uses the Perloff-Salop framework and shows that full information is an equilibrium when n is sufficiently large, but restricts advertising to rotation-ordered strategies (in the sense of Johnson and Myatt, 2006). The present paper imposes no structural restriction and strengthens Ivanov’s result by: (a) providing a necessary and sufficient condition for the full information equilibrium (not just a sufficient condition for large n); (b) fully characterizing G* when full information is not an equilibrium; and (c) demonstrating robustness across multiple model variants.

Q: What policy implication does the ambiguity result carry? A: The paper warns against assuming that mandating full information disclosure is unambiguously consumer-beneficial. While strategic advertising creates deadweight loss through information suppression, it can simultaneously compress support and alter the marginal consumer distribution in ways that lower equilibrium prices significantly. The power distribution example (alpha > 0.7928) shows consumers can be strictly better off under strategic advertising than under forced full disclosure. This ambiguity is a cautionary tale for disclosure regulation.

Mean-Preserving Contraction (MPC): A distribution G_i is an MPC of F if it has the same mean as F but less dispersion (in the sense of second-order stochastic dominance). In the paper, each firm’s feasible advertising strategies are exactly the set MPC(F) — this captures all informationally feasible disclosures without structural restriction on content.

Advertising Game: A restricted subgame of the full market game in which firms choose their advertising strategies G_i taking the symmetric price as given. An equilibrium in the advertising game is a necessary condition for equilibrium in the full game. The advertising game’s equilibrium uniquely pins down G* independently of the price level (under the baseline model without binding outside option).

Full Information Equilibrium: An equilibrium of the advertising game in which every firm chooses the true underlying distribution F as its advertising strategy. This corresponds to complete, unobstructed product disclosure. The paper’s central result is that this equilibrium exists if and only if F^(n-1) is convex over its support.

Convexity of F^(n-1): The key distributional condition governing advertising equilibria. F^(n-1) is the distribution of the consumer’s best alternative among (n-1) rivals’ products. Convexity of F^(n-1) means its density is increasing, signaling a likely attractive outside option, which makes each firm risk-loving and induces full disclosure. This convexity is guaranteed for n sufficiently large.

Locally Linear (G)^(n-1):* A region of the equilibrium distribution where (G*)^(n-1) has constant slope, making the firm locally risk-neutral. Over such a region, the firm is indifferent among all distributions with the same local mean, and the equilibrium G* need not coincide with F — it is only required to be an MPC of F on that interval. This alternating structure (coinciding with F on strictly convex regions; linear elsewhere) fully characterizes G*.

Marginal Consumers: In the Perloff-Salop pricing formula, the equilibrium price p* = (1/n) / integral [(G*(v)^(n-1))’ dG*(v)]. The integrand (G*(v)^(n-1))’ * g*(v) is the density of consumers who are indifferent between a given firm’s product and their best alternative at value v. A larger measure of marginal consumers implies lower equilibrium prices through greater competitive pressure.

Compound Deviation: In the full game, a deviation by a firm that changes both its price p_i and its advertising strategy G_i simultaneously, rather than varying only one dimension. The possibility of compound deviations is what distinguishes equilibrium existence conditions in the full model from those in the standard Perloff-Salop model, even when G* = F.

Diversification, Market Entry, and the Global Internet Backbone

Mon, 01 Jan 0001 00:00:00 +0000

This paper investigates how buyer demand for supplier diversification shapes entry incentives and market structure, using the global undersea fiber-optic cable industry as the empirical setting. The research question has two parts: first, how much of observed cable entry and surplus generation is attributable to buyers’ diversification motives rather than standard price competition; and second, whether market forces produce too much or too little diversification relative to the social optimum.

The empirical setting spans 2005–2021 and covers the worldwide network of undersea cables that carries more than 98% of all international internet traffic. Cables fail frequently — hundreds of faults per year — and industry professionals confirm that “no customer would buy capacity on a single cable.” The median monthly price for a 10Gbps lease fell from $55,500 in 2005 to $2,200 in 2021, and the number of active cables roughly doubled over the sample period.

The authors use proprietary data from TeleGeography covering cable characteristics (construction costs, capacity, landing points, entry dates), quarterly bandwidth prices at the city-pair level, annual used bandwidth at the country-pair level, and 168 documented cable faults. Markets are defined as country-pairs in calendar quarters.

The theoretical model begins with a representative buyer who splits bandwidth purchases equally across n symmetric cable operators to minimize expected disruption costs. Because disruption shocks are i.i.d. across cables, adding suppliers reduces the variance of realized bandwidth delivery, lowering the required over-provisioning buffer. This generates a “market expansion” channel: entry increases aggregate demand holding prices fixed, not just through price competition. The aggregate demand equation takes log-linear form with cable count indicators alongside price and demand shifters.

The structural model adds a dynamic oligopoly game where firms make entry and exit decisions as a non-stationary Markov Perfect Equilibrium, with Cournot competition in each period. The three-step estimation procedure recovers: (1) price elasticities and diversification parameters from an IV demand regression using electricity generation cost shares as instruments; (2) marginal costs from firms’ first-order conditions; (3) entry and fixed costs from a nested pseudo-likelihood (NPL) estimator, supplemented by construction cost data to separately identify entry costs given the near-absence of observed exits.

Key demand results: the IV price elasticity is −1.36. The market expansion effect is large and exhibits decreasing marginal returns — entry of a second cable expands demand by as much as a 28.3% price decrease; a third cable is equivalent to a 19.3% price decrease; an eighth cable is equivalent to a 7.5% price decrease. The demand model achieves R² = 95%.

The first counterfactual removes the diversification channel entirely (entry raises competition only). Without diversification, cable investment falls by 12%. The net present value of total surplus per market over the sample period averages $1.11 billion under the observed equilibrium; supplier diversification accounts for 11% of total surplus and 27% of consumer surplus.

The second counterfactual quantifies two opposing distortions relative to the social optimum. Business-stealing creates excessive entry (entrants reduce incumbents’ output), while diversity effects create insufficient entry (marginal entrants generate surplus through diversification they cannot fully capture). At end-of-sample (2021-Q4), diversity distortions in terms of number of entrants range from 54% to 125% of the business-stealing distortion. Business-stealing tends to dominate for most markets, producing moderately excessive entry. Relative to the market outcome, total surplus under the social planner’s solution is on average 10% higher: 53% of this welfare gap is attributable to diversity effects and 47% to business-stealing effects. These findings hold across market heterogeneity in entry costs, market size, and demand growth.

The paper concludes that profit-maximizing suppliers fail to fully internalize diversification-related social benefits, and that targeted entry subsidies would pass cost-benefit tests in settings where diversity distortions dominate.

Q: What is the core mechanism by which supplier diversification expands demand? A: When buyers split purchases across n cable operators whose disruption shocks are i.i.d., adding a supplier reduces the variance of realized delivered bandwidth. The buyer therefore needs to hold a smaller over-provisioning buffer to achieve the same expected level of used bandwidth B. This lowers the effective cost of a given quantity of used bandwidth, shifting the aggregate demand curve outward. As the number of suppliers grows to infinity, the expected disruption cost converges to zero.

Q: How large is the market-expansion effect of diversification empirically? A: The effect is large but exhibits decreasing marginal returns. Entry of a second cable expands demand by as much as a 28.3% price reduction holding prices fixed; the third cable is equivalent to a 19.3% price reduction; and the eighth cable is equivalent to a 7.5% price reduction. All cable-count coefficients are positive and statistically significant in the IV demand model.

Q: How is price endogeneity addressed in the demand estimation? A: Bandwidth prices are instrumented using the marginal cost of electricity generation — specifically, country-level electricity generation shares (coal, gas, oil) interacted with quarterly commodity price series for coal, gas, and oil (Brent crude, Australian coal price, EU natural gas price). The first-stage results indicate electricity costs are strong predictors of bandwidth prices. Accounting for endogeneity raises the price elasticity from an OLS level to −1.36 in absolute value, consistent with the expected direction of OLS bias.

Q: What share of cable investment and surplus is attributable to diversification motives? A: In the counterfactual where the diversification channel is eliminated — entry raises competition and lowers prices but provides no diversification benefit — cable investment falls by 12%. Under the observed equilibrium, the net present value of total surplus per market over 2005–2021 averages $1.11 billion; supplier diversification accounts for 11% of this total surplus and 27% of consumer surplus.

Q: How are the two distortions — business-stealing and diversity — defined and separated? A: Business-stealing distortion arises because entrants reduce incumbents’ outputs and revenues, so private entry benefits exceed social benefits, leading to excessive entry. Diversity distortion arises because entrants create surplus for buyers through diversification but cannot fully capture it without perfect price discrimination (following Spence (1976) and Mankiw and Whinston (1986)), leading to insufficient entry. The authors disentangle these by comparing: (i) the social planner’s solution (eliminates both distortions), and (ii) a coordinated entry solution maximizing producer surplus (eliminates only business-stealing). The residual gap between the two identifies the diversity distortion.

Q: What is the net direction and magnitude of distortion in equilibrium market structure? A: At 2021-Q4, for most markets, business-stealing dominates, leading to moderately excessive entry. Diversity distortions in number of entrants range from 54% to 125% of the business-stealing distortion across markets. Relative to the market outcome, the social planner’s solution yields average total surplus that is 10% higher. Of that welfare gap, 53% is attributable to diversity effects and 47% to business-stealing effects.

Q: How do market characteristics affect which distortion dominates? A: The paper analyzes cross-market heterogeneity and identifies market features — including the size of entry costs, market size, and the rate of demand growth over time — as determinants of whether insufficient diversification or excessive entry is the binding distortion. Markets with higher entry costs or slower demand growth are more likely to exhibit insufficient diversification.

Q: How are entry costs identified given the near-absence of cable exits in the data? A: Because exit events are rare in a nascent industry — only a handful of exits observed, mostly after 2020 — entry and fixed costs cannot be separated by exit decisions alone. The authors address this by using cable-level construction cost data from TeleGeography to estimate entry costs outside the dynamic model. With entry costs in hand, firms’ optimal entry decisions identify fixed costs. Scrap values are normalized to zero, consistent with industry reports that retired cables are typically abandoned on the seabed.

Q: What role does the non-stationarity of the market environment play in the model? A: The data covers the industry’s earliest growth phase, with demand growing by roughly three orders of magnitude (used bandwidth from 5 Tbps in 2005 to 2,886 Tbps in 2021) and prices falling by a factor of roughly 25. The authors use a non-stationary Markov Perfect Equilibrium concept in which strategies and transition functions are indexed by time, aligning with the treatment of high-tech commodities in Igami (2017).

Q: What are the policy implications of the findings? A: Because profit-maximizing suppliers do not fully internalize the diversification-related social benefits of entry, entry rates can be sub-optimal from a welfare perspective when diversity distortions dominate. The authors suggest targeted entry subsidies would pass cost-benefit tests in such cases. For antitrust analysis, regulators who ignore the demand-expansion effect of incremental suppliers may incorrectly judge a market as sufficiently competitive. In merger review, authorities must account for firms’ private incentives to provide diversification to reach accurate welfare conclusions.

Q: How does the paper verify that diversification demand is not a spurious empirical artifact? A: Several checks support the causal interpretation. The estimated demand parameters are consistent with the predictions of the consumer-level utility maximization problem derived analytically: decreasing marginal returns to diversification and a positive relationship between the number of suppliers and demand. The demand model achieves R² = 95%, suggesting limited unobserved confounders. Additionally, 78% of cable faults involve only a single cable, confirming that disruptions are geographically isolated and that cross-cable diversification provides genuine insurance value.

Q: What are the main data limitations acknowledged by the authors? A: The authors cannot observe cable-level revenue or market shares, nor contracts between buyers and sellers; only aggregate country-pair used bandwidth is observed. Price coverage is not comprehensive — TeleGeography collects prices on a voluntary basis from dozens of providers. The cable faults dataset (168 faults) represents only a subset of total faults, as collection focuses on publicly disclosed events. The demand model also does not explicitly account for substitution patterns across firms due to lack of firm-level market share data, though the high R² partly mitigates this concern.

Diversification (in this paper’s sense): Buyers’ practice of splitting bandwidth purchases across multiple cable operators to reduce exposure to idiosyncratic disruption risk. Diversification across n cables with i.i.d. disruption shocks reduces the variance of realized delivered bandwidth and lowers the required over-provisioning buffer, making the effective cost of a given usage level B a decreasing function of n.

Market Expansion Effect: The channel through which entry of additional cable suppliers raises aggregate demand holding prices fixed. This occurs because each additional supplier reduces disruption risk, allowing buyers to demand more used bandwidth for the same price. It is distinct from the conventional competition channel (entry lowering prices).

Diversity Distortion: The tendency toward insufficient entry arising because marginal entrants generate consumer surplus through diversification benefits but cannot fully capture this surplus absent price discrimination. Follows Spence (1976) and Mankiw and Whinston (1986).

Business-Stealing Distortion: The tendency toward excessive entry arising because entrants reduce incumbents’ output and revenues, creating a gap between private and social returns to entry.

Non-Stationary Markov Perfect Equilibrium: The equilibrium concept used for the dynamic entry game, in which strategies and equilibrium selection rules are indexed by calendar time to accommodate substantial secular trends in demand and costs — as opposed to a stationary MPE which assumes a stable long-run distribution.

Used Bandwidth vs. Purchased Bandwidth: Used bandwidth B is the amount the buyer is committed to delivering (to downstream customers or for internal use). Purchased bandwidth Q is what the buyer actually contracts for across all cables; Q > B because the buyer holds an over-provisioning buffer against disruption risk. The ratio B/Q is a decreasing function of the disruption cost parameter gamma and an increasing function of the number of suppliers n.

Nested Pseudo-Likelihood (NPL) Algorithm: The baseline estimator for the dynamic game, following Aguirregabiria and Mira (2007). It iterates on the best-response mapping to impose equilibrium restrictions. The authors supplement NPL with two-step estimators (1-PML, 1-MD) and the spectral algorithm of Aguirregabiria and Marcoux (2021), which solves for the root of a nonlinear system using a quasi-Newton method and is robust to fixed-point instability.

Environmental Consequences of Hydrocarbon Infrastructure Policy

Mon, 01 Jan 0001 00:00:00 +0000

Covert and Kellogg study policies that aim to “keep carbon in the ground” by blocking fossil fuel infrastructure investment, with the Dakota Access Pipeline (DAPL) as their empirical application. DAPL moves more than 500,000 barrels per day of oil from the Bakken Shale of North Dakota to the U.S. Gulf Coast and was completed in June 2017 amid substantial opposition. The central research question is whether blocking pipeline construction actually keeps oil in the ground or merely shifts transport to alternative modes — specifically crude-by-rail — and what the net environmental and economic consequences are.

The paper develops a two-period model of crude oil production and transportation mode choice. In the model, oil shippers decide in period 1 whether to commit to pipeline capacity under ship-or-pay contracts, then in period 2 allocate flows between the committed pipeline and the more flexible but costlier railroad alternative. Pipeline construction is an irreversible sunk cost with zero ongoing marginal cost; rail involves no sunk cost but substantial ongoing marginal costs including quadratic adjustment costs that capture capital investment in rail cars and loading/unloading facilities. Equilibrium pipeline capacity is determined by a shippers’ indifference condition: expected per-barrel returns from pipeline access equal the FERC-regulated tariff.

The empirical model is estimated using monthly Bakken oil production and transportation data, price differentials across three coastal destinations (Gulf, East, West), and drilling productivity data. Crude-by-rail marginal costs are estimated via 2SLS, yielding static marginal cost intercepts of $9.49/bbl to the East Coast, $12.64/bbl to the Gulf Coast, and $8.69/bbl to the West Coast, plus a dynamic adjustment cost of $1.28/bbl per mbbl/d of flow change. The upstream supply model follows Anderson, Kellogg, and Salant (2018), with old-well production following exponential decline (estimated decay parameter β = 0.955) and new-well drilling responding to current and lagged prices with a total long-run elasticity of 1.32. Shippers’ beliefs about future oil prices are calibrated to an AR(1) process fit to historical price volatility (persistence φ₁ = 0.9925, volatility σ_G = 0.098). Model validation confirms a predicted expected return to pipeline commitment of $6.17/bbl against DAPL’s actual tariff of $5.50–$6.25/bbl.

The main counterfactual asks what would have happened had DAPL’s construction been enjoined. In expectation, blocking DAPL reduces pipeline flows by 306 mbbl/d. Expected crude-by-rail flows increase by 248 mbbl/d, offsetting 81% of the pipeline reduction. Bakken oil production falls by only 58 mbbl/d, a 4% reduction. The modal shift from pipeline to rail worsens local environmental outcomes: per-barrel local pollution damages from rail transport substantially exceed those from pipelines, dominated by locomotive NOx emissions in populated areas. Foreclosing DAPL increases net local pollution damages by $444,000 per day (the decrease in pipeline-related harm of $144,000/day is more than offset by the increase from rail of $588,000/day). The total cost of blocking DAPL is $45/tonne of CO2 abated — $28/tonne from lost producer surplus and $17/tonne from increased local pollution damages — a figure comparable to the contemporaneous U.S. government social cost of carbon estimate of $42/tonne.

An upstream production tax achieving the same CO2 reduction costs only $1.01–$2.68/tonne CO2 abated, an order of magnitude less, because it does not induce the distortionary modal shift to rail. Two caveats apply: if 57% of Bakken production reductions leak to other basins, the cost of blocking DAPL rises from $45/tonne to $104/tonne; and if reductions represent production delays rather than permanent reductions, effective abatement is further diminished. The analysis is scoped to Bakken crude oil and land transportation alternatives. The finding that blocking infrastructure increases local pollution is atypical of CO2 abatement policies, which usually generate local pollution co-benefits.

Q: What is the core economic mechanism by which blocking a pipeline can keep oil in the ground? A: When a pipeline is foreclosed, crude oil can still move by railroad, but rail transport involves substantial ongoing marginal costs. These costs create a wedge between upstream (Bakken) and downstream (Gulf Coast) prices that depresses upstream supply. Only when downstream prices are high enough to cover both rail marginal cost and this wedge will rail fully substitute for the pipeline; at lower prices, some production is uneconomical and stays in the ground. In the model, this price-depressing wedge is the mechanism that reduces production — but it operates only partially, since rail can substitute for much of the pipeline’s flow.

Q: How much of the blocked pipeline flow substitutes to rail versus stays in the ground? A: In expectation, blocking DAPL reduces pipeline flows by 306 mbbl/d. Expected crude-by-rail flows increase by 248 mbbl/d, offsetting 81% of the pipeline reduction. Bakken oil production falls by only 58 mbbl/d, or approximately 4%. In a specific simulated month (December 2019), 348 mbbl/d (67%) of the 520 mbbl/d of foregone pipeline flows would still move by rail.

Q: How are crude-by-rail costs estimated, and what is the role of adjustment costs? A: The authors estimate a 2SLS model of rail flows on price differentials, allowing for quadratic adjustment costs to capture investments and disinvestments in rail cars and loading facilities. Static marginal costs are $9.49/bbl (East Coast), $12.64/bbl (Gulf Coast), and $8.69/bbl (West Coast). The adjustment cost parameter γ is estimated at $1.28/bbl per mbbl/d, meaning a 10 mbbl/d monthly increase in rail flows raises marginal shipping cost by $12.76/bbl — a substantial share of total rail costs. Adjustment costs are necessary to reconcile the model with the sluggish observed response of rail flows to price differentials.

Q: What is the structure of the upstream oil supply model and what are its key parameter estimates? A: The model distinguishes “old” production from pre-existing wells, which follows exponential decline with estimated decay parameter β = 0.955, and “new” production from newly drilled wells, which is price-responsive with a total long-run elasticity of 1.32 — comparable to the 1.1–1.2 estimated by Newell and Prest (2019) across major U.S. shale plays. This structure implies that total production is highly inelastic in the short run (dominated by old wells) but responds to persistent price shocks over the long run through changes in drilling rates.

Q: How do the local pollution damages of rail compare to those of pipeline transport? A: At a social cost of carbon of $100/tonne, local air pollution damages from rail transport to the Gulf Coast are $1.66/bbl (plus $0.73/bbl in spill/accident costs), versus only $0.35/bbl local pollution (plus $0.11/bbl spills) for pipelines. Locomotive NOx emissions are the dominant factor, both because locomotives have high NOx emission factors and because these emissions often occur in densely populated areas. CO2 damages at $100/tonne SCC are roughly similar across modes ($0.79–0.83/bbl), so local pollution is the key differentiator.

Q: What is the net welfare impact of foreclosing DAPL, and how is it decomposed? A: Foreclosing DAPL reduces producer surplus by $716,000/day, increases net local pollution damages by $444,000/day (the $588,000/day increase from rail more than offsets the $144,000/day decrease from pipeline), and reduces CO2 emissions by 25.2 mtonnes/day from the 58 mbbl/d production reduction. The cost per tonne of CO2 abated is $28/tonne from lost producer surplus and $17/tonne from increased local pollution damages, totaling $45/tonne — broadly comparable to the U.S. government’s contemporaneous SCC estimate of $42/tonne. This means the policy’s abatement cost is approximately equal to the social value of each tonne abated, leaving little or no net social gain even before accounting for leakage.

Q: How does the model validate against observed data and institutional parameters? A: The model predicts an expected return to committed DAPL pipeline shipment of $6.17/bbl, which closely matches the actual DAPL tariff for committed shippers of $5.50–$6.25/bbl. The authors also validate simulated crude-by-rail flows against actual flows across destinations. The close match on the tariff is particularly meaningful because it tests the model’s equilibrium condition for pipeline capacity investment rather than a within-sample fit.

Q: How does an upstream production tax compare to blocking DAPL as a policy instrument? A: A production tax normalized to achieve the same CO2 reduction requires only $3.68/bbl if imposed after shippers have committed to DAPL (holding capacity fixed), or $3.24/bbl if announced before commitments are made (reducing pipeline capacity to 443 mbbl/d). The production tax reduces combined producer surplus and government revenue by only $96,000–$109,000/day versus $716,000/day under the DAPL ban, and reduces local pollution damages by $82,000/day rather than increasing them. The resulting cost per tonne CO2 abated is $1.01–$2.68 — an order of magnitude smaller than the $44.63/tonne for blocking DAPL.

Q: What is the production leakage caveat and how large is its effect? A: If blocking DAPL causes Bakken production to fall, production from other U.S. or global oil basins may increase, partially or fully offsetting the CO2 reduction. Following Prest (2022) and Prest et al. (2023), the authors note that if 57% of the Bakken production reduction leaks to other basins, the cost of blocking DAPL rises from $45/tonne to $104/tonne. Leakage would increase the cost per tonne for the upstream tax as well, but the relative advantage of the tax over the pipeline ban is unaffected by this caveat.

Q: What is the production delay caveat? A: Even absent leakage, the paper cautions that production reductions from either policy may represent production delays rather than permanent reductions — oil not extracted today may be extracted later as prices rise or technology improves. To the extent that reductions are temporary, the effective carbon abatement is smaller than the authors compute, and the cost per tonne of CO2 abated is correspondingly higher. The paper does not quantify this effect but flags it as a material caveat.

Q: What institutional features drive pipeline capacity investment and risk allocation? A: Pipelines are irreversible investments subject to ex-post holdup, so construction financing requires firm ship-or-pay commitments from shippers before construction and before future prices are known, meaning oil price risk is borne primarily by shippers rather than the pipeline owner. Pipeline tariffs are regulated by FERC on a cost-of-service basis. In the DAPL case, shippers executed binding ten-year ship-or-pay contracts in June 2014, and shippers’ beliefs about future oil prices at that date — calibrated to historical price volatility using an AR(1) process with estimated persistence φ₁ = 0.9925 and volatility σ_G = 0.098 — determine equilibrium capacity investment.

Q: How does the paper’s finding relate to the typical co-benefit structure of climate policies? A: Most CO2 abatement policies generate local pollution co-benefits (reduced NOx, SOx, particulates), so the abatement cost is partially offset by local pollution gains. Blocking DAPL reverses this: the pipeline-to-rail modal shift increases local pollution damages, making local pollution a cost rather than a co-benefit of the policy. The authors note this is atypical but not unprecedented — urban densification and post-combustion emissions controls in fossil fuel boilers also present CO2–local pollution trade-offs.

Infrastructure foreclosure policy: A “keep it in the ground” strategy that blocks construction of specialized fossil fuel transportation infrastructure (pipelines) with the aim of inhibiting production of the fuels that would have been transported, without requiring direct acquisition or buyout of mineral rights.
Ship-or-pay agreement: A firm, up-front capacity commitment in which a pipeline shipper agrees to pay for reserved pipeline capacity whether or not they ultimately use it, made before construction and before future prices are realized; the institutional mechanism by which oil price risk is transferred from pipeline owners to shippers.
Crude-by-rail adjustment costs: Quadratic costs modeled as linear in the period-to-period change in rail volumes to a given destination, capturing capital investments and disinvestments in rail cars, loading facilities, and unloading terminals needed to expand or contract crude-by-rail capacity; estimated at $1.28/bbl per mbbl/d of monthly flow change.
Production leakage: The partial or full offset of production reductions in one oil basin (Bakken) by production increases in other U.S. or global basins in response to the same price signals; at 57% leakage, the cost of blocking DAPL rises from $45/tonne to $104/tonne of CO2 abated.
Old-well vs. new-well production dynamics: The distinction between production from pre-existing wells (which follows an exponential decline path insensitive to current prices, β = 0.955) and production from newly drilled wells (which responds to current and lagged upstream prices with long-run elasticity 1.32); this structure makes total short-run supply highly inelastic while allowing substantial long-run price responsiveness through drilling adjustments.
Local pollution damages from NOx: The dominant component of environmental harm from crude-by-rail transport, arising from locomotive NOx emissions that are both large in magnitude and concentrated in densely populated areas along rail corridors; at $100/tonne SCC, monetized local pollution damages from rail exceed CO2 damages for all three coastal destinations, whereas for pipelines CO2 damages exceed local pollution costs.
Cost per tonne of CO2 abated: The authors’ metric for comparing infrastructure foreclosure to alternative policies; computed as the sum of lost producer surplus and net change in local pollution damages divided by the quantity of CO2 emissions avoided from reduced oil production and consumption; equals $45/tonne for blocking DAPL versus $1.01–$2.68/tonne for an equivalent upstream production tax.

Market Segmentation through Information

Mon, 01 Jan 0001 00:00:00 +0000

This paper asks what market outcomes an information designer — modeled as an internet platform that knows consumers’ preferences — can achieve by choosing what information to disclose to competing oligopolistic firms who then make personalized price offers. The model features n firms each producing a single differentiated product at zero cost, a continuum of consumers with unit demand and multidimensional valuations (one per product), and a designer who commits to a mapping from consumer types to joint distributions over messages sent to firms before they play a simultaneous pricing game. The designer’s objective spans the full range from maximizing producer surplus to maximizing consumer surplus.

The paper establishes two main results. First, under a necessary and sufficient condition called Aggregate Incentive Compatibility (AIC), the designer can implement full surplus extraction by firms — the producer-optimal outcome — in which every consumer buys her most preferred product at a price exactly equal to her valuation for it, capturing 100% of available surplus for producers. The AIC condition requires, for each firm i and each candidate deviation price p_hat_i, that the infra-marginal losses firm i would bear on its natural customers (those in Ei who value i most) from lowering price to p_hat_i must be weakly greater than the maximum business-stealing profit available from consumers who prefer other products but have valuation for i above p_hat_i. The condition is easier to satisfy when consumer preferences are more polarized, i.e., when consumers have stronger relative preferences for their most-preferred product. When firms offer homogeneous products the condition fails everywhere and no information structure can generate any producer surplus — Bertrand competition drives all profits to zero under any signal structure.

Second, the paper characterizes the consumer-optimal information structure, which achieves the maximum possible consumer surplus across all equilibria induced by any information structure. The upper bound on consumer surplus is CS* = (total surplus) minus sum_i Pi*_i, where Pi*_i is the profit firm i can guarantee itself by ignoring the designer’s signal and setting the best uniform price assuming all rivals price at zero. This bound is tight: the designer can implement it by publicly partitioning consumers into groups by most-preferred product, inducing rival firms to price at marginal cost (zero) for consumers who prefer another firm’s product, and then applying the Bergemann-Brooks-Morris (2015) extremal segmentation within each firm’s natural customer set to preserve each firm’s guarantee profit while achieving efficiency.

The illustrative two-firm example shows the quantitative stakes concretely. With no information disclosure, firms charge 4/5 and total producer surplus is about 76% of total surplus S*, consumer surplus is just under 10% of S*, and some consumers are excluded. With full disclosure, producer surplus rises to about 81% of S* and consumer surplus to 19%. The producer-optimal information structure (Case 3) achieves 100% of S* as producer surplus by pooling consumers who prefer different products into the same message submarket, giving each firm an incentive to price for its highest-valuing customers and ignore the others. The consumer-optimal information structure (Case 4) brings producer surplus down to about 57% of S* — its guaranteed lower bound — and delivers roughly 43% of S* to consumers, an outcome unattainable by full disclosure alone.

Both producer-optimal and consumer-optimal outcomes are efficient: all consumers buy their most-preferred product in both cases. The paper further characterizes the full efficient frontier between consumer- and producer-optimal outcomes, showing that mixing the consumer-optimal and full-information structures (or consumer-optimal, full-information, and producer-optimal structures when the latter is implementable) spans every point on the frontier.

The model assumes firms will price-discriminate if they can, that the designer has full knowledge of consumer types, and that the game is played once. The core results extend to continuous type distributions as shown in Online Appendix B.2. The analysis is restricted to a monopoly platform; competition among platforms is left for future work.

Q: What is the central research question and why does the two-benchmark comparison used by antitrust authorities miss important possibilities?

A: The paper asks what market outcomes — combinations of consumer and producer surplus — an information designer (a platform) can achieve by choosing among all possible information structures, not just the two benchmarks of no-information and full-information. Antitrust analysis that compares only those two cases misses a vast middle ground: an intermediary can package information in ways that, for instance, implement perfect collusion (extracting all surplus as producer surplus) while appearing to use privacy-protective technologies, or can intensify competition well beyond the full-information benchmark to benefit consumers.

Q: What is the producer-optimal information structure and when does it exist?

A: A producer-optimal information structure is one that induces an equilibrium in which every consumer buys her most-preferred product at a price exactly equal to her valuation — full surplus extraction. It exists if and only if, for every firm i and every candidate deviation price p_hat_i, the Aggregate Incentive Compatibility (AIC) condition holds: the aggregate infra-marginal losses firm i would suffer on its natural customers Ei from lowering price to p_hat_i must be at least as large as the maximum business-stealing profit from consumers outside Ei who have valuation for i weakly above p_hat_i. This is a condition on the distribution of consumer valuations, not on the information structure per se.

Q: What is the economic mechanism behind the producer-optimal structure — how does pooling consumers implement full surplus extraction?

A: The designer assigns consumers who prefer product A to the same message submarket as consumers who prefer another product but have a lower valuation for A. Firm A is then price-recommended its highest-valuing customers’ willingness to pay. The presence of the “outside” consumers in the same message makes it unprofitable for firm A to deviate downward to capture them, because the infra-marginal loss on the natural customers exceeds the additional revenue. Simultaneously, the rival firm cannot identify and undercut for A’s natural customers because the messages do not allow it to distinguish them. The result is that each firm plays a niche strategy, setting price equal to the valuation of its highest-type natural customers and excluding the others from its offer.

Q: When does polarization of consumer preferences help achieve the producer-optimal outcome?

A: Proposition 1 states that if a producer-optimal information structure exists under distribution f, it also exists under any distribution f_tilde that is more polarized than f — where more polarized means the mass of consumers who prefer i and have valuation above any threshold for i increases, and the mass of consumers who prefer j but have valuation above that threshold for i decreases. Intuitively, polarization slackens the Firm IC constraints because it reduces the business-stealing temptation: fewer consumers with high cross-product valuations are available for firm i to capture by undercutting. Concrete continuous-distribution examples include: uniform over the unit square (producer-optimal always exists), Hotelling anti-correlated values (exists everywhere), and truncated normal with mean 1/2 — producer-optimal is feasible for all standard deviations sigma > 0.15.

Q: Why does the producer-optimal outcome fail entirely when products are homogeneous?

A: Proposition 2 states that when all consumer types have equal valuations across products (the support of f lies on the diagonal of V^n), then for any information structure and any induced equilibrium, every consumer buys at price zero and all firms earn zero profit. The logic extends the standard Bertrand undercutting argument: with homogeneous products, any positive price a firm charges is undercut by a rival who can always profitably steal demand, and this applies to any posterior distribution induced by any signal realization. Even private signals cannot prevent this outcome because no signal realization can give a firm a non-contestable position.

Q: How is the consumer-optimal information structure constructed, and what is its key economic logic?

A: Theorem 2 shows the consumer-optimal structure has three layers. First, consumers are partitioned into n groups by most-preferred product (Ei). Second, firms j not equal to i are induced — by publicly revealing which group a consumer belongs to — to set price zero for consumers outside their group, because competing for those consumers is hopeless when their preferred firm is identified. Third, within each Ei, consumers are further partitioned into submarkets using the Bergemann-Brooks-Morris (2015) extremal segmentation applied to residual valuations (theta_i minus the maximum of competing valuations), ensuring firm i earns exactly its guarantee profit Pi*_i. By holding each firm down to its guarantee profit, the residual goes to consumers, maximizing CS.

Q: What is the guarantee profit Pi*_i and how does it bound consumer surplus?

A: Pi*i is the maximum profit firm i can achieve by ignoring all designer signals and setting a single uniform price to all consumers, against the worst-case scenario in which all other firms price at zero. Formally, Pi*i = max{pi} sum{theta in Ei: theta_i - pi >= max_{j not equal i} theta_j} pi * f(theta). Since firm i can always achieve Pi*_i regardless of the information structure (by simply ignoring signals), no information structure can push firm i’s profit below Pi*_i. The sum of these guarantee profits across all firms provides a lower bound on total producer surplus — and therefore an upper bound on consumer surplus — achievable by any information structure.

Q: In the two-firm numerical example, what is the quantitative comparison across the four cases?

A: Total available surplus S* = 0.84. Under no information (Case 1): producer surplus approximately 76% of S*, consumer surplus just under 10% of S*, and consumers of types (3/5, 2/5) and (2/5, 3/5) do not trade. Under full disclosure (Case 2): producer surplus approximately 81% of S*, consumer surplus 19% of S*, efficient. Under the producer-optimal structure (Case 3): producer surplus = 100% of S* (all surplus extracted), consumer surplus = 0%, efficient. Under the consumer-optimal structure (Case 4): producer surplus approximately 57% of S*, consumer surplus approximately 43% of S*, efficient. All cases except Case 1 are efficient; the no-information case excludes some consumers from trading.

Q: Is the full-information disclosure structure consumer-optimal?

A: Not in general. Proposition 3 states that full information is consumer-optimal if and only if all consumers in Ei have identical residual valuations (theta_i minus their second-best alternative) — a condition that generically fails. When residual valuations within Ei are heterogeneous, the designer can do strictly better for consumers by applying the extremal segmentation within each Ei rather than revealing full information, which would allow firms to price-discriminate on individual residual valuations and extract more surplus.

Q: Can the designer trace out the entire efficient frontier between consumer- and producer-optimal outcomes?

A: Yes, under two conditions. First, by mixing the consumer-optimal structure (point A) with the full-information structure (point B) using fractions lambda and 1-lambda respectively, the designer can implement any point on the efficient frontier between A and B. Second, when the producer-optimal outcome (point C) is also implementable, mixing the full-information structure with the producer-optimal structure by applying them to fractions lambda and 1-lambda of the consumer population respectively spans every point between B and C. The key insight is that the AIC condition, if it holds for f, also holds for any rescaled sub-distribution of f (it is scale-invariant), so the producer-optimal sub-problem remains feasible.

Q: What are the regulatory implications of the analysis?

A: The paper identifies a fundamental tension: banning information use sacrifices efficiency (some consumers excluded, wrong products purchased), but unrestricted use permits platforms to implement perfect collusion through information design. Critically, the paper shows that privacy-enhancing technologies that pool consumers into cohorts — like Google’s Privacy Sandbox — are equally consistent with the producer-optimal (collusive) and consumer-optimal (competitive) structures; the two differ only in the principle by which consumers are grouped. The paper suggests regulators could mandate that consumers in the same cohort share the same most-preferred product and that information be disclosed symmetrically across firms — the defining features of the consumer-optimal structure. This would block the producer-optimal grouping (which mixes consumers with different most-preferred products) while preserving efficiency.

Q: How does this paper relate to and extend Bergemann, Brooks, and Morris (2015)?

A: Bergemann, Brooks, and Morris (2015) characterize achievable consumer and producer surplus outcomes when a designer discloses information to a single monopolist who can price-discriminate. The present paper extends this to oligopoly, where competition between firms creates both additional constraints (firms may undercut each other) and additional instruments (the designer can play firms against each other). The consumer-optimal construction directly applies the BBM (2015) extremal segmentation within each firm’s natural customer set Ei, but the outer layer — using public revelation of group membership to induce rival firms to price at zero — is new and arises specifically from the oligopoly setting.

Information designer: An entity (modeled as a platform) that observes the full joint distribution of consumer valuations over all products and commits, before firms price, to a mapping from consumer types to joint distributions over messages sent to competing firms; the designer can be interpreted as an internet intermediary choosing how to package and share consumer data.

Aggregate Incentive Compatibility (AIC): The necessary and sufficient condition on the distribution of consumer valuations for the existence of a producer-optimal information structure; for each firm i and each candidate deviation price p_hat_i, the aggregate infra-marginal losses firm i would incur on its natural customers by lowering price to p_hat_i must weakly exceed the maximum revenue firm i could gain by attracting consumers who prefer rival products but have valuation for i above p_hat_i.

Producer-optimal information structure: An information structure that induces an equilibrium in which every consumer buys her most-preferred product at a price exactly equal to her full valuation for it, extracting 100% of available surplus as producer surplus — the outcome equivalent to the firms’ fully collusive joint surplus maximum.

Consumer-optimal information structure: An information structure that achieves the maximum consumer surplus attainable across all equilibria induced by any information structure, holding each firm to its guarantee profit Pi*_i (the best uniform-price profit the firm can secure by ignoring all signals) and allocating all residual surplus to consumers while maintaining allocative efficiency.

Guarantee profit (Pi*i): The maximum profit firm i can secure unilaterally by ignoring the designer’s signal and setting an optimal uniform price, computed against the worst case in which all rival firms price at zero; it equals max{pi} times the sum of f(theta) over all types in Ei for which theta_i minus pi exceeds all rival valuations.

Polarization of preferences: A stochastic dominance condition under which, relative to a baseline distribution, the mass of consumers who prefer product i and have high valuations for it increases while the mass of consumers who prefer rival products but have high valuations for i decreases; higher polarization weakens the Firm IC constraints and makes the producer-optimal outcome easier to implement (Proposition 1).

Separation and Consistency: Two structural properties any producer-optimal information structure must satisfy: Separation requires that the messages firm i sends to different consumers in Ei who have distinct valuations for i are disjoint in support; Consistency requires that every message firm i can send to any consumer type is contained in the union of messages firm i sends to consumers in Ei, preventing firm i from ever inferring that a consumer prefers a rival’s product.

Merger Effects and Antitrust Enforcement: Evidence from US Consumer Packaged Goods

Mon, 01 Jan 0001 00:00:00 +0000

This paper by Bhattacharya, Illanes, and Stillerman makes two contributions to the debate over US antitrust enforcement stringency. First, it documents the price, quantity, and assortment effects of a comprehensive set of consummated mergers in US consumer packaged goods (CPG). Second, it develops and estimates a model of agency enforcement decisions to quantify antitrust stringency and simulate counterfactual outcomes under stricter regimes.

Data and scope. The analysis covers 129 product markets across 47 transactions in US CPG from 2006 to 2017, using the NielsenIQ Retail Scanner Dataset (covering 35,000–50,000 stores and 2.6–4.5 million UPCs). The sample is restricted to all deals valued at $280 million or more where both the acquirer and target sold products in at least one overlapping product market-DMA. Geographic markets are NielsenIQ designated market areas (DMAs). The sample is defined to avoid selection bias from studying only mergers that attracted press attention or were litigation targets.

Identification strategy. The empirical approach is a before-after event study within geography and product. For each merger, a brand-specific linear time trend is estimated from the 36 months prior to the merger announcement, controlling for UPC-DMA fixed effects, month-of-year fixed effects, input cost indices, and log median household income. Post-merger outcomes (24 months after completion) are measured as deviations from the extrapolated pre-merger trend. The identifying assumption is that secular demand and cost trends are gradual and well-captured by a linear trend. Pre-trend placebo tests show no significant departures from trend in the pre-period, and randomized-date placebos confirm that the linear trend is a better predictor of post-period outcomes under random merger dates than under actual merger dates, supporting the interpretation that observed post-period departures reflect merger effects.

Price effects. The average price effect of consummated CPG mergers is small: across specifications, estimates range from -0.6% to 1.0%, with a baseline mean of 0.3%. However, heterogeneity is substantial. The standard deviation of merger-level price effects is 4.0–7.5 percentage points. In the baseline specification, the first quartile of price effects is -2.1% and the third quartile is 3.7%. Merging and non-merging party price changes are positively correlated (correlation = 0.49), consistent with strategic complementarity. Thirty-six percent of mergers lead both groups to lower prices; 36% lead both groups to raise prices.

Quantity and assortment effects. Total quantities fall on average by 0.4–1.0% across specifications, with 60% of mergers producing quantity reductions. Merging parties exhibit a larger average quantity decline of 6.4%. Mergers also lead to a 2.7% average reduction in the number of stores served by merging parties, a 2.2% reduction in the number of brands sold in a DMA by merging parties, and a 3.2% reduction for non-merging parties. Brands with less than 5% of the merged entity’s sales are 6 percentage points more likely to be dropped post-merger.

Enforcement model. To interpret these outcomes relative to enforcement, the authors develop a model in which the agency receives a noisy signal of a merger’s price effect and challenges the merger if the posterior mean exceeds a threshold that is decreasing in deal size. They estimate the model by maximum likelihood using data on enforcement actions (6 mergers receiving remedies, 4 withdrawn under antitrust pressure) and realized price changes. The estimated sales-weighted average threshold is 4.8–6.3%: agencies act as if they challenge CPG mergers only when they expect a price increase exceeding this level. The posterior standard deviation of the agency’s assessment is 2.5–3.2 pp (aggregate prices) to 4.1–4.8 pp (merging-party prices).

Counterfactual stringency. Tightening the threshold from approximately 6.1% to 2.5% would roughly quadruple the challenge probability (from 0.075 to 0.30), reduce aggregate price changes of consummated mergers by approximately 1.4 pp, and lower the share of allowed anti-competitive mergers from roughly 50% to 35%. Critically, type I errors (blocking pro-competitive mergers) remain negligible at thresholds down to approximately 3%; at 0% threshold only 10% of blocked mergers would be type I errors. The primary cost of tighter enforcement is a significantly larger agency workload, not an increase in blocked pro-competitive mergers.

Scope conditions. Results pertain specifically to large CPG mergers (deal size ≥ $280 million) sold through US retail outlets, 2006–2017. Findings on structural presumptions show DHHI and merging share have predictive value for price changes, but structural metrics alone explain less than 10% of the variance in price effects (adjusted R-squared never exceeds 10% even with third-order interactions).

Q: What is the average price effect of consummated CPG mergers and how should it be interpreted? A: Across specifications, the average price effect is between -0.6% and 1.0%, with a baseline mean of 0.3%. This small average does not imply that enforcement is strict: Carlton (2009) shows that with perfect foresight, the largest observed price change — not the average — would indicate stringency. Because agencies face uncertainty, the distribution of realized price changes reflects both inframarginal approved mergers and the noise in agency forecasts.

Q: How large is the heterogeneity in merger price effects? A: The standard deviation of merger-level price effects is 4.0–7.5 percentage points across specifications. In the baseline, the first quartile of price effects is -2.1% and the third quartile is 3.7% for all parties combined. Merging parties specifically show a first quartile of -3.2% and third quartile of 3.7%, meaning a full quarter of mergers raise merging-party prices by more than 3.7%.

Q: How do merging and non-merging party prices co-move? A: Price changes for merging and non-merging parties are positively correlated (correlation = 0.49, s.e. = 0.08), consistent with strategic complementarity in pricing. Thirty-six percent of mergers lead both groups to lower prices, 36% lead both to raise prices, 13% cause merging parties to lower while non-merging parties raise, and 15% cause the reverse. The timing evidence shows merging-party prices begin changing upon merger completion, with rivals following suit.

Q: What happens to quantities following mergers? A: Total quantities fall on average between 0.4% and 1.0% across specifications, with 60% of mergers producing quantity reductions. Merging parties bear the bulk of quantity adjustment, with an average quantity decline of 6.4% and a standard deviation and interquartile range both around 30 pp. Non-merging party quantity changes are much less variable. The correlation between merging and non-merging party quantity changes is 0.36 (s.e. 0.08), which is positive — at odds with theoretical predictions from demand systems with the “type aggregation property” (Nocke and Schutz, 2018, 2024), where mergers should produce negatively correlated quantity changes.

Q: What non-price competitive responses do mergers trigger? A: Merging parties reduce the number of stores they serve by 2.7% on average, though in 38% of mergers store networks expand. Both merging and non-merging parties reduce product portfolios: merging parties drop the number of brands in a DMA by 2.2% on average and non-merging parties by 3.2%. Brands most likely to be dropped are those with less than 5% of the merged entity’s sales (6 pp more likely to be dropped), brands in small DMAs, and brands with small DMA shares.

Q: Do the Merger Guidelines’ structural presumptions (HHI, DHHI, merging share) predict price effects? A: DHHI and merging share have statistically significant but quantitatively modest predictive power. A 100-point increase in average DHHI is associated with a 0.2 pp increase in merging-party price changes and 0.3 pp for non-merging parties. Price effects are significantly larger when merging share exceeds 30%. However, structural metrics alone explain very little variance: adjusted R-squared never exceeds 10% even with third-order interactions of HHI, DHHI, merging share, private label share, and market size. Within-merger, DHHI is positively correlated with local price changes, and markets with DHHI above 200 exhibit significantly higher price effects than those below.

Q: How do the authors model antitrust enforcement and identify its stringency? A: The agency observes a noisy signal of a merger’s price effect, forms a posterior distribution combining a normally distributed prior (mean X’beta, standard deviation sigma_p*) with a normally distributed signal error (standard deviation sigma_epsilon), and challenges the merger if the posterior mean exceeds a threshold that is decreasing in deal size. The model is estimated by maximum likelihood: for approved mergers, the realized price change is observed; for withdrawn/remedied mergers, the posterior mean must have exceeded the threshold. Six mergers (from four deals) received remedies for horizontal market power concerns and four mergers (from two deals) were withdrawn under antitrust pressure, forming the challenged set.

Q: What is the estimated enforcement threshold and how does it vary across mergers? A: The sales-weighted average threshold is 4.8–6.3% using aggregate price changes and 6.6–7.8% using merging-party price changes. The threshold is lower for larger mergers: a 10% increase in merging-party sales is associated with an approximately 0.06 pp decrease in the threshold. The first quartile of thresholds across mergers is 4.5–5.6% and the third quartile is 5.6–6.9%, reflecting that the agencies apply stricter standards to larger deals.

Q: How accurate are the agencies’ forecasts of merger price effects? A: Using only the prior (structural characteristics), the agency’s accuracy in classifying mergers as anti-competitive versus pro-competitive is 56% (s.e. 3 pp). Adding the signal increases accuracy to 83% (s.e. 9 pp). The correlation between the prior mean and the true price change is 0.29 (s.e. 0.08); the correlation between the posterior mean and the true price change is 0.85 (s.e. 0.15). The posterior standard deviation is 2.5–3.2 pp for aggregate price changes and 4.1–4.8 pp for merging-party price changes.

Q: What would happen under stricter antitrust enforcement? A: Tightening the average threshold from 6.1% to 2.5% would raise the challenge probability from approximately 0.075 to 0.30 — roughly quadrupling it — and would reduce aggregate price changes of consummated mergers by approximately 1.4 pp (from roughly 0.2% to -1.2%). Moving to a 0% threshold would result in challenges to 57% of mergers, with 60–70% of consummated mergers then causing price decreases.

Q: How large are type I and type II errors at the current and counterfactual thresholds? A: At the current threshold (~6.1%), approximately 50% of allowed mergers are type II errors (anti-competitive mergers that should have been challenged). Type I errors (pro-competitive mergers wrongly blocked) are negligible at the current threshold and only become non-trivial starting around a 3% threshold. At a 2.5% threshold, the type II error share falls to 35%; at a 0% threshold, to 16%, while type I errors reach 10% of blocked mergers. The primary trade-off of stricter enforcement is therefore a larger agency workload, not an increase in blocking pro-competitive mergers.

Q: What identification strategy is used and how is it validated? A: The strategy is a within-product, within-geography before-after comparison using a brand-specific linear pre-merger trend as the counterfactual. Validation proceeds through three checks: (1) coefficient plots from an extended event study show no significant pre-trends after controlling for the linear trend; (2) a plot of brand trends against estimated price effects shows little explanatory power (statistically significant negative correlation but small magnitude, not consistent with results being driven by trend extrapolation); (3) placebo tests randomizing merger dates within the same markets yield a distribution centered at zero, narrower than the true distribution, and a significantly higher mean squared prediction error in the post-period, confirming that the linear trend is a better predictor under randomly assigned merger dates than under true dates.

Q: Why do the authors not use alternative control group approaches? A: Non-merging firms in the same market are rejected as controls because they may strategically respond to the merger. Synthetic controls using similar-industry untreated markets are rejected because deals often treat multiple similar markets (ruling out natural donors) and estimates prove sensitive to individual donors. Geographic controls (markets where merging parties have small shares) are rejected because they omit all 39 national mergers, untreated markets are not randomly selected, and regional pricing by non-merging parties could propagate effects into untreated regions, biasing estimates toward zero.

Merger retrospective. In this paper’s usage, an ex-post empirical study of the price, quantity, and assortment effects of a consummated merger, using pre-merger trends as the counterfactual, as opposed to forward-looking merger simulation.

Enforcement stringency. The marginal price increase at which the antitrust agency would expect to challenge a merger. Measured here as the sales-weighted average posterior-mean threshold: the value above which the agency acts as if it would propose a remedy, estimated at 4.8–6.3% for US CPG mergers.

Type I error (antitrust). The mistake of challenging (blocking) a merger that would have reduced prices (a pro-competitive merger). In the model, this occurs when an adverse signal causes the agency to block a merger whose true price effect is below the threshold.

Type II error (antitrust). The mistake of allowing a merger that increases prices (an anti-competitive merger). In the model, this occurs when a favorable signal causes the agency to approve a merger whose true price effect is above the threshold. Estimated at approximately 50% of allowed mergers at the current enforcement threshold.

Structural presumptions. The HHI-based rules in the 2010 and 2023 Merger Guidelines that create a presumption of competitive harm when DHHI exceeds specified thresholds (e.g., DHHI > 200 and post-merger HHI > 2,500 for the “red zone”). The paper finds DHHI and merging share have statistically significant but low explanatory power (adjusted R-squared below 10%) for actual price changes.

Prior and signal (in the enforcement model). The agency’s prior is a normal distribution over the merger’s true price effect, parameterized by structural characteristics (HHI, DHHI). The signal is a noisy draw centered on the true price effect, capturing information gathered through due diligence (e.g., evidence of efficiencies). The posterior mean — combining prior and signal — determines whether the agency challenges the merger.

Product market-deal pair (merger). The unit of observation in the empirical analysis: a specific NielsenIQ product module (e.g., soluble coffee) within a specific acquisition transaction (e.g., a food conglomerate merger). The sample contains 129 such pairs across 47 deals.

Online Business Models, Digital Ads, and User Welfare

Mon, 01 Jan 0001 00:00:00 +0000

Acemoglu, Huttenlocher, Ozdaglar, and Siderius develop a two-sided platform model to study the welfare consequences of digital advertising as an online business model. The platform intermediates between a firm selling a horizontally differentiated product and a continuum of users who derive utility from both entertaining content and informative signals about product quality embedded in ads. Users have a two-dimensional type: a sophistication dimension (sophisticated with probability lambda, naïve with probability 1-lambda) and a product-quality dimension (high quality with prior probability q). The central departure from the standard informational-advertising literature is that sophisticated users hold the correct model of the ad signal process, while naïve users underestimate the false-positive rate — the probability that a low-quality product generates a positive ad signal (phi_0). Naïve users perceive this false-positive rate to be phi_{0,N} = omega_N * omega_P * phi_0, where omega_N <= 1 captures inherent naïveté and omega_P <= 1 captures failure to understand personalized targeting, so phi_{0,N} < phi_0. The equilibrium concept is Berk-Nash equilibrium (Esponda and Pouzo 2016), meaning all agents are Bayesian given their subjective model.

The platform chooses ad load alpha (Poisson rate of ad displays), subscription fees, and the monetary transfer from the firm; the firm sets product price p after observing the platform’s contract. The central finding (Proposition 2) is that when the objective false-positive rate phi_0 exceeds a threshold phi-hat_0(lambda, phi_1, phi_{0,N}) — which is increasing in lambda and phi_{0,N} and decreasing in the true-positive rate phi_1 — the unique equilibrium is an advertising-based plan that fully segments the market: naïve users receive an ad load that extracts all their surplus, while sophisticated users are excluded entirely. In this regime the firm charges a strictly higher price p-hat* > p-bar*, where p-bar* = (beta*q + c)/2 is the monopoly price without advertising. The ad-based equilibrium emerges precisely when ads are more misleading (larger gap between phi_0 and phi_{0,N}), not when they are more informative — a comparative static the authors describe as paradoxical.

Welfare consequences (Proposition 4) are unambiguous in the advertising regime: both naïve and sophisticated users are strictly worse off than the baseline without any platform. Naïve users over-purchase due to inflated posteriors from misread signals; sophisticated users are harmed through the price channel — the firm’s higher profit-maximizing price p-hat* applies to all buyers. In the fully rational benchmark (phi_{0,N} = phi_0), the unique equilibrium is subscription-based and user welfare equals the no-platform baseline (Proposition 3).

These results extend to richer menus (Proposition 5), mixed subscription-plus-advertising plans (Proposition 7), and to multi-firm and multi-platform competition (Propositions 9-12). Digital ads soften Bertrand competition by generating endogenous horizontal differentiation among otherwise identical firms, so equilibrium prices can exceed marginal cost even with two competing firms. Platform competition similarly fails to restore welfare: platforms compete away subscription fees but both adopt ad-based plans targeting naïfs when phi_1 exceeds a threshold, maintaining the welfare loss.

On policy, the first best (planner observes types) cannot be decentralized because naïve users prefer more ads than is socially optimal, inverting the usual self-selection constraint. The second best (planner subject to incentive-compatibility constraints) is a single pooling plan with an intermediate ad load alpha^{SB} in [alpha^{FB}_N, alpha^{FB}_S] and yields average welfare above the no-platform baseline, though below first best (Proposition 13). This second best can be decentralized with a nonlinear digital ad tax, a per-unit product subsidy, and a platform subscription subsidy (Proposition 14). A simpler flat tax on digital ad revenues — above a threshold gamma-bar < 1 — also improves welfare relative to the ad-based equilibrium, though it does not restore the second best (Proposition 15).

Four robustness extensions are developed: endogenous manipulation (platform always chooses the most manipulative environment, lowest phi_{0,N}); naïve learning dynamics (learning raises the sophisticate share in steady state, making ad-based models less profitable but not overturning the main results); imperfect price discrimination by the firm (naïfs are unambiguously worse off, threshold for advertising equilibrium shifts down); and an added price-sensitivity dimension (the platform runs a 2x2 menu separating by both sophistication and price sensitivity, preserving the result that naïve users tolerate and receive more ads than sophisticates in every stratum).

Q: What is the key asymmetry between naïve and sophisticated users that drives the main results? A: Sophisticated users hold the correct Bayesian model of the ad signal process and thus correctly account for the false-positive rate phi_0 when updating beliefs from positive ad signals. Naïve users perceive the false-positive rate as phi_{0,N} = omega_N * omega_P * phi_0 < phi_0, so they treat positive signals as stronger evidence of high product quality than they actually are. Because naïve users overestimate the informativeness of ads, their (interim) subjective valuation of an ad-based plan is higher, making them more tolerant of ad loads and more willing to join platforms with heavy advertising. This asymmetry is what makes it profitable to target naïfs with high ad loads while excluding or charging subscription fees to sophisticates.

Q: Why does advertising to sophisticated users generate no additional firm profit, while advertising to naïve users does? A: Lemma 1 establishes that with linear-quadratic utility the firm extracts no surplus from advertising to sophisticates: because sophisticated agents are fully Bayesian, their expected posterior equals the prior (E_S[pi_i] = q), so expected demand after advertising is identical to demand before advertising. By contrast, Lemma 2 shows that the firm’s profit from naïve agents is positive and strictly increasing in ad load alpha, because naïve users’ average demand curve drifts upward as alpha rises — their inflated perceived informativeness of ads causes them to over-update on positive signals, systematically raising their willingness to pay. The platform captures this surplus from the firm via the advertising transfer m*.

Q: What is the threshold condition determining whether the equilibrium is subscription-based or advertising-based? A: Proposition 2 identifies a threshold phi-hat_0(lambda, phi_1, phi_{0,N}) that is increasing in the sophisticate share lambda and in the naïve false-positive perception phi_{0,N}, and decreasing in the true-positive rate phi_1. When the objective false-positive rate phi_0 is below this threshold, the profit-maximizing business model is subscription-based with price P* = T - v and product price p* = p-bar* = (betaq + c)/2. When phi_0 exceeds the threshold, the advertising model dominates: the platform sets a high ad load alpha-hat that makes naïve users exactly indifferent between participating and their outside option v, excludes sophisticates, and the firm charges p-hat* > p-bar*. The threshold falls with phi_1, meaning more informative ads expand the range of phi_0 over which the advertising equilibrium obtains.

Q: How does allowing the platform to offer menus change the results relative to the baseline two-plan case? A: Proposition 5 shows that with menus the platform can simultaneously serve both user types: sophisticates receive a subscription plan at P* = T - v and naïve users receive an ad-based plan with the same high load alpha-hat* as in the baseline. The threshold for the advertising equilibrium shifts down to phi*0(lambda, phi_1, phi{0,N}) < phi-hat_0, so advertising business models arise for a strictly larger set of parameters. Welfare consequences are unchanged (Corollary 1): when phi_0 > phi*_0, both types have welfare strictly below the no-platform baseline. Proposition 6 further shows consumer welfare is monotonically decreasing in both phi_0 and phi_1: higher phi_1 (more informative true-positive signals) also reduces welfare because any surplus from greater informativeness is fully captured by the platform.

Q: What is the welfare ranking across the three regimes: no platform, advertising equilibrium, and subscription equilibrium? A: In the subscription equilibrium (regime (a) of Proposition 2 or 4), user welfare for both types equals the no-platform base case W_base(tau) — the platform captures all surplus it creates and users are no better or worse off. In the advertising equilibrium (regime (b)), both naïve and sophisticated users are strictly worse off than with no platform: W-hat*(tau) < W_base(tau) for both tau in {S, N}. The first-best, where a planner controls ad loads separately by type, yields W^{FB}(tau) > W_base(tau) for both types because informative ads can genuinely improve sophisticated users’ decisions and a constrained amount improves naïve users’ decisions too.

Q: How does firm-level competition interact with digital advertising to affect prices and welfare? A: Without advertising, two ex ante identical firms compete à la Bertrand and price at marginal cost (p*_1 = p*_2 = c). Proposition 9 establishes that when phi_1 > phi^F_1 and phi_0 >= phi^F_0(phi_1), the platform offers an ad-based plan and equilibrium prices p-hat*_1 and p-hat*_2 are both strictly above p-bar* — the monopoly price without advertising. The mechanism is endogenous horizontal differentiation: users who see positive ad signals for one firm’s product form higher valuations for that product, so the two products become differentiated in the eyes of consumers even though they are ex ante identical, breaking Bertrand logic. Example 1 further illustrates that advertising can be more prevalent with competition than without: a second firm’s entry can push the equilibrium from no-advertising to separating.

Q: Does platform competition protect users from the welfare losses associated with digital advertising? A: Not fully. Proposition 11 shows that with two competing platforms (M=2, N=1) and no advertising, platforms compete away both subscription fees and ad loads, and welfare reaches the fully rational benchmark. However, when phi_1 exceeds threshold phi^P_1, both platforms adopt ad-based plans targeting naïve users, charge no subscription fees, and the product price rises to p-hat*_P > p-bar* (Proposition 12). Competition reduces subscription fees to zero but does not eliminate the incentive to target naïfs with heavy ads, because naïve users’ over-valuation of ads means they remain willing to join ad-heavy plans. The fundamental inefficiency from naïve users’ misspecified model persists under platform competition.

Q: Why is the first-best allocation not implementable as a decentralized equilibrium? A: Proposition 13 explains the obstacle: the social planner would ideally offer naïve users fewer ads (alpha^{FB}_N) than sophisticated users (alpha^{FB}_S), with alpha^{FB}_N <= alpha^{FB}_S. However, naïve users have a higher subjective valuation for ads than sophisticates because they believe ads are more informative. If offered a menu with both options, naïve users would self-select into the plan with the higher ad load alpha^{FB}_S — the exact opposite of what the planner wants. The incentive-compatibility constraints therefore force the planner toward a single pooling plan with an intermediate ad load alpha^{SB} in [alpha^{FB}_N, alpha^{FB}_S]. Average welfare under the second best exceeds the no-platform baseline, confirming that some advertising is socially valuable, but falls short of the first best whenever alpha^{FB}_N > 0.

Q: How does a flat digital ad tax improve welfare, and what are its limitations? A: Proposition 15 establishes that whenever the equilibrium features an ad-based plan, a flat tax on digital ad revenues at rate gamma > gamma-bar < 1 improves welfare by discouraging advertising-based business models and inducing the platform to shift toward subscription-based plans. The mechanism is that taxing ad revenue reduces the platform’s marginal gain from increasing ad load, making the subscription plan relatively more profitable. However, the flat tax does not achieve the second best because it operates linearly rather than targeting the nonlinear distortion: the optimal nonlinear tax-subsidy scheme (Proposition 14) requires a threshold-style ad tax at rate mu > mu-bar combined with a per-unit product subsidy delta* and a platform subscription subsidy eta > eta-bar.

Q: What happens when the platform can endogenously choose how manipulative its ads are? A: Proposition 16 shows that a profit-maximizing platform always chooses the lowest feasible phi_{0,N} = phi-bar — the most manipulative environment. Two reinforcing channels drive this: the pricing channel (lower phi_{0,N} amplifies naïve demand shifts per positive signal, so the downstream firm raises price and sales, increasing ad revenues extracted by the platform) and the participation channel (lower phi_{0,N} raises naïve users’ perceived informational value of ads, relaxing their participation constraint and permitting a higher ad load alpha). Platform competition constrains the equilibrium ad load through tighter participation constraints but does not alter the choice of phi_{0,N} = phi-bar, so competition limits ad quantity but not ad manipulativeness.

Q: How do naïve learning dynamics affect the main results? A: Proposition 17 introduces a birth-death environment where exposure to disconfirming evidence gradually converts naïve agents to sophisticates. A unique steady-state sophisticate share lambda*(alpha_N, phi_0) exists; both higher ad load alpha_N and higher phi_0 accelerate the conversion of naïfs, raising future sophisticate share and reducing future ad revenues. This creates a new intertemporal trade-off that constrains the platform’s choice of ad loads relative to the static case. The key result (part ii) is that the main characterization of Proposition 7 carries through under a modified cutoff phi-tilde^{dynamic}0 >= phi-tilde_0(lambda-tilde, phi_1, phi{0,N}), so learning dynamics make the ad-based business model less likely but do not overturn the fundamental welfare results.

Q: How does imperfect price discrimination by the firm affect naïve users? A: Proposition 18 considers a firm that observes a user’s sophistication type with probability kappa in [0,1]. With price discrimination, the firm sets type-specific prices satisfying p*_N >= p* >= p*_S, moving toward the type-specific monopoly levels. Naïfs are unambiguously worse off: when identified (with probability kappa), they face the higher price p*_N and a higher equilibrium ad load. The threshold for the advertising equilibrium also shifts down relative to the baseline, meaning advertising business models emerge for a larger parameter range when price discrimination is possible.

Q: How does the paper define and measure user welfare, and why is ex post rather than interim welfare the relevant concept? A: User welfare W(tau_i) is defined as ex post utility, which depends on the actual product quality theta_i realized after consumption, not on interim beliefs formed after viewing ads. Naïve users’ interim assessment inflates expected product quality, but their ex post utility depends on whether the product is genuinely high quality for them (theta_i = 1 with probability q, theta_i = 0 with probability 1-q). Because naïve users over-purchase due to misread signals — consuming more than optimal when theta_i = 0 — their ex post utility is strictly lower than their interim expected utility, and strictly lower than the no-platform baseline in the advertising equilibrium. The ex post welfare concept is the relevant one precisely because it captures the actual material consequences of manipulation, not the subjectively perceived gains from ads.

Naïve vs. Sophisticated Users: The paper’s primary user heterogeneity dimension. Sophisticated users hold the correct model of the ad signal process, setting phi_{0,S} = phi_0 (the true false-positive rate). Naïve users hold a misspecified model with phi_{0,N} = omega_N * omega_P * phi_0 < phi_0, underestimating the probability that a low-quality product generates a positive ad signal, due to inherent naïveté (omega_N) and failure to understand personalized targeting (omega_P).

Ad Load (alpha): The Poisson rate at which ads are displayed to a user per unit time. Total ad displays follow a Poisson(alpha*T) distribution. Higher ad load means less time on entertaining content — expected entertainment time is (1-alpha)T — and a higher probability (1 - exp(-alphaT)) that the user sees the ad at least once. The platform chooses alpha as its primary instrument for extracting surplus from naïve users.

False-Positive Rate (phi_0): The objective probability that a low-quality product (theta_i = 0) generates a positive (“good”) ad signal. The gap between phi_0 (objective) and phi_{0,N} (naïve users’ perceived rate) is the key parameter driving all welfare results: a larger gap implies greater de facto manipulation and a stronger incentive for the platform to adopt an advertising-based model.

Berk-Nash Equilibrium: The solution concept from Esponda and Pouzo (2016), used to model agents with misspecified subjective models. All agents are Bayesian conditional on their own subjective model. Sophisticates’ subjective model equals the objective model (standard Bayesian), while naïfs update using the misspecified phi_{0,N}. Perfection requires sequential rationality at each information set given beliefs.

De Facto Manipulation: The paper’s term for a situation in which the platform and firm exploit naïve users’ misspecified model to boost demand and extract surplus, without requiring any outright deception in the formal sense. It arises because naïve users voluntarily choose high-ad-load plans (believing ads to be highly informative) and voluntarily over-purchase (having updated on what they mistakenly think are strong positive signals). The manipulation is “de facto” because it operates through the users’ own rational (but misspecified) decision-making.

Separating Equilibrium: An equilibrium in which naïve and sophisticated users self-select into distinct platform plans. In the advertising equilibrium, naïve users join an ad-heavy plan (extracting all their surplus via inflated willingness to pay for ads) while sophisticated users are either excluded or placed on a subscription plan. This separation is the vehicle through which the platform maximizes revenue from naïf manipulation while limiting the disciplining force of sophisticates.

Second-Best Allocation: The welfare-maximizing allocation subject to the incentive-compatibility constraints that users self-select into plans. Because naïve users prefer more ads than sophisticated users (the inverse of what the planner desires), the second best is a single pooling plan with an intermediate ad load alpha^{SB} in [alpha^{FB}_N, alpha^{FB}_S]. This is strictly worse than the first best but achieves average welfare above the no-platform baseline, and can be decentralized with a nonlinear ad tax, product subsidy, and platform subscription subsidy.

Peer Effects in Consideration and Preferences

Mon, 01 Jan 0001 00:00:00 +0000

This paper develops a general nonparametric model of discrete choice in which peers influence agents through two distinct channels: (1) the set of alternatives an agent considers (consideration set effects) and (2) the agent’s preferences over those alternatives (preference effects). The framework embeds these peer mechanisms in a continuous-time Markov process where agents revise choices at Poisson alarm-clock rates. A peer is classified as a consideration peer, a preference peer, or both, and the network is encoded as two directed edge sets rather than one.

The central identification challenge is recovering network structure, consideration probabilities, and preferences simultaneously, without relying on exogenous variation in covariates or the menu of available options. The paper shows this is achievable using time-series variation in the choices made by connected agents. The key insight is that consideration peers who adopt alternative v change the probability that the focal agent considers v — entering only the “consideration” term of the conditional choice probability (CCP) — while preference peers who adopt alternatives other than v change only the “conditional-on-consideration” selection probability. These cross-alternative patterns in the CCPs allow the researcher to distinguish the two channels. Once consideration-only peers are isolated, their choices serve as exclusion restrictions that mimic artificial menu variation, enabling nonparametric recovery of preferences.

Identification proceeds in stages: (i) recover the full reference group of each agent from changes in CCPs; (ii) separate consideration-only peers from preference-affecting peers using cross-order effects across alternatives; (iii) distinguish preference-only peers from consideration-and-preference peers under an exclusion restriction (Assumption 4) requiring that an agent with a dual-channel peer also has at least one single-channel peer; (iv) recover consideration ratios Q(v|n+1)/Q(v|n) and then the full choice rule. The results allow arbitrary heterogeneity across agents and do not require exogenous menu variation or covariate shifters.

For continuous-time data (Dataset 1), the CCPs and Poisson rates are exactly identified from the observed revision history. For discrete-time panel data (Dataset 2), identification is generic under a mild eigenvalue condition on the transition rate matrix.

The empirical application studies store-opening decisions by China’s two dominant high-end tea chains — Heytea and Nayuki — across prefecture-level cities from their founding through end-2020. By that date, Nayuki had 485 stores in 57 cities and Heytea had 729 stores in 46 cities, in an industry whose total revenue grew from 42.2 to 83.1 billion yuan between 2017 and 2020. Each firm-market pair is modeled as an agent deciding whether to open a new store. The key exclusion restriction is that the cumulative store count of either firm in geographically neighboring markets shifts consideration probabilities but does not enter marginal profitability directly.

Estimation via maximum likelihood yields four substantive findings: (1) Firms exhibit limited consideration — consideration probabilities for markets with no prior presence by either firm are substantially below one. (2) Stores in neighboring markets significantly raise consideration probabilities for a given market, for both own-firm and rival stores; this peer effect in consideration is described as economically large. (3) Own-market store density raises marginal profitability (density economies) while rival presence lowers it (competitive effects). (4) A full-consideration model that omits the attention stage overestimates the negative competitive effect and underestimates positive density effects.

Counterfactual simulations show that removing attention constraints (full consideration) accelerates market penetration substantially: firms enter new markets earlier and achieve broader geographic coverage. Removing peer effects in consideration only — while retaining attention constraints — slows the diffusion of store openings across neighboring markets, because peer effects in consideration function as an informational cascade. Limited consideration also reduces competition by delaying rival entry into high-profitability markets, explaining a significant share of the geographic concentration in first- and second-tier cities during the early expansion phase. The paper’s scope is limited to settings with repeated, non-durable choices; it does not model forward-looking behavior or multiple equilibria, which the authors note as directions for future research.

Q: What are the two peer-effect channels in the model, and how do they differ structurally? A: A consideration peer influences whether an alternative enters the agent’s consideration set — specifically, the probability Q_a(v | n) that alternative v is considered is a function of the number n of consideration peers currently adopting v. A preference peer influences the choice rule R_a(v | y, C) — the probability that v is selected conditional on it being in the consideration set. Importantly, the paper models the two channels as affecting logically separate stages of the decision process, so the observed CCP factors into a consideration term and a conditional-selection term that respond to distinct sets of peers.

Q: Why does the standard identification approach of varying menus fail here, and how does the paper substitute for it? A: Menu variation requires the researcher to observe the same agent facing different sets of available alternatives, which is unavailable in many empirical settings. The paper replaces exogenous menu variation with endogenous variation generated by consideration-only peers: when a consideration-only peer adopts alternative v, the focal agent’s probability of considering v rises, effectively mimicking the removal of other alternatives from her consideration set. This peer-induced variation in consideration is then used to trace out the choice rule R_a over counterfactual menus without any actual menu changes.

Q: How does the paper separate consideration peers from preference peers in the data? A: The decomposition exploits an asymmetry in how the two peer types appear in the log-CCP. When a consideration peer switches to alternative v, the term ln Q_a(v | .) changes but the conditional-selection term ln D_a(v | .) remains unchanged, because the agent already considers v. Conversely, when a preference peer adopts an alternative other than v, only the conditional-selection term shifts. The paper formalizes this via cross-order effects of peers across alternatives in the CCPs (Propositions 3.1–3.3) and invokes Assumption 4 — requiring at least one single-channel peer when a dual-channel peer exists — to complete the separation.

Q: What is Assumption 4 and why is it necessary? A: Assumption 4 states that if agent a has a peer in N_CR_a (a peer affecting both consideration and preferences), then a also has at least one additional peer affecting only consideration or only preferences. Without this exclusion restriction, the consideration and preference effects of a dual-channel peer are not separately identified from each other; the single-channel peer provides the variation needed to pin down each component separately.

Q: What does Proposition 2.1 establish and what does it require? A: Proposition 2.1 establishes existence and uniqueness of an invariant equilibrium distribution mu over choice configurations, with full support. It requires Assumptions 1 (independent consideration), 2(i) (strictly positive consideration probability for every alternative), and 3(i) (strictly positive probability of selecting any non-default alternative from some reachable consideration set). The continuous-time Poisson structure ensures zero probability of simultaneous revisions, which rules out multiple equilibria in the data-generating process.

Q: How does the paper handle discrete-time panel data, where only periodic snapshots of choices are observed? A: The paper invokes results from Blevins (2017, 2026) to show that the transition rate matrix W of the continuous-time process is generically identified from the discrete-time transition matrix observed at interval Delta, provided the eigenvalues of W do not differ by integer multiples of 2pii/Delta. Once W is identified, the CCPs P and Poisson rates lambda_a are recovered. This result is described as generic, meaning it holds except on a measure-zero set of parameter values.

Q: What data does the empirical application use, and what are the key sample statistics? A: The application uses city-level store registration data sourced from the National Enterprise Credit Information Publicity System (via CnOpenData, 2021), supplemented by regional statistics from the China City Statistical Yearbook (2016–2021). The sample ends in 2020 to avoid COVID-19 demand shifts. By end-2020, Nayuki had 485 stores across 57 cities and Heytea had 729 stores across 46 cities. The high-end tea industry’s total revenue grew from 42.2 to 83.1 billion yuan between 2017 and 2020.

Q: What is the key exclusion restriction in the empirical specification, and why is it plausible? A: Stores in geographically neighboring markets (parameterized by distance bins d(m,m’)) enter the attention index pi_tilde but are excluded from the marginal profit index pi_bar. The rationale is that nearby store counts are informative signals that draw managerial attention to a market (an informational spillover) but do not directly alter the profitability of operating in that market — profitability depends on local demand, competition within the market, and own firm density, not on activity in adjacent markets. This restriction identifies the consideration-only peer channel.

Q: What does the paper find about biases from ignoring limited consideration? A: When the two-stage model (consideration + choice) is replaced by a single-stage full-consideration model, the estimated payoff parameters differ substantially. Specifically, the full-consideration model overestimates the negative effect of competition (rival presence in the same market) and underestimates the positive effect of own-store density. The intuition is that correlated entry patterns driven by shared consideration spillovers are misattributed to payoff interactions when the consideration stage is omitted.

Q: What do the counterfactual simulations show about the role of limited consideration in market dynamics? A: Three counterfactuals are compared against the baseline. Under full consideration (no attention constraints), market penetration is substantially faster — firms enter new markets earlier and achieve broader geographic coverage. Removing peer effects in consideration while retaining attention constraints slows geographic diffusion because the informational cascade that propagates entry to neighboring markets is eliminated. Limited consideration also reduces competition by delaying rival entry into high-profitability markets; markets with high potential demand remain underserved for longer. Collectively, limited consideration explains a significant portion of the geographic concentration of tea chain stores in first- and second-tier cities during the early expansion period.

Q: What forms of heterogeneity does the identification allow, and what does it not require? A: The nonparametric identification results accommodate arbitrary heterogeneity across agents in consideration mechanisms Q_a, choice rules R_a, Poisson revision rates lambda_a, and network positions. The identification requires neither exogenous covariates that shift preferences or consideration, nor variation in the set of available alternatives across observations. It relies solely on time-series variation in the choices made by connected agents, which are endogenous to the model and are themselves identified in the first stage.

Q: How does the paper model history dependence, and does it change the main identification results? A: Section 4.1 extends the model to allow consideration probabilities and choice rules to depend on the agent’s own choice history h_t in addition to the current configuration y. Proposition 4.1 states that under Assumptions 1–4 applied conditional on both y_{at} and h_t, all identification propositions from Section 3.1 remain valid. The extension also allows consideration probabilities to equal one, enabling nontrivial dynamics in consideration sets driven by past choices.

Q: How is the unobservable default handled in the empirical application? A: When the default alternative (e.g., “do not open a store”) is unobserved, the Poisson revision rate lambda_a cannot be separately identified from the CCPs without normalization. The paper normalizes lambda_a = 1 for each agent in the empirical application, treating the revision opportunity rate as fixed and recovering all remaining primitives under this normalization.

Consideration set: The subset C of the full menu Y that agent a actually attends to at the moment of revision; formed before the choice rule is applied. Alternative v enters C independently with probability Q_a(v | n), where n is the number of consideration peers currently adopting v. The default alternative is always in the consideration set.

Conditional choice probability (CCP): P_a(v | y), the ex-ante probability that agent a selects alternative v given choice configuration y; equal to the product of the consideration probability Q_a(v | .) and the conditional-selection probability D_a(v | .), integrated over all possible consideration sets.

Choice configuration: The vector y = (y_a)_{a in A} recording the current alternative selected by every agent in the network simultaneously; the state variable of the continuous-time Markov process.

Consideration-only peer: A peer a’ in N_C_a \ N_R_a whose choices enter the consideration probability Q_a but not the choice rule R_a. Variation in the choices of consideration-only peers serves as an exclusion restriction that mimics artificial menu variation for identifying preferences.

Preference-only peer: A peer a’ in N_R_a \ N_C_a whose choices enter the choice rule R_a but not the consideration probability Q_a.

Cross-order peer effect: The pattern in the CCP by which a consideration peer’s adoption of alternative v changes ln P_a(v | .) but not the conditional-selection component, while a preference peer’s adoption of a different alternative v’ changes the conditional-selection component but not the consideration component; this asymmetry is the key to separating the two channels.

Limited consideration: The situation in which Q_a(v | n) is strictly less than one for at least some alternatives v and peer counts n, so that the agent does not evaluate all available options before choosing; distinct from full rationality in which all alternatives are always considered.

Mean attention index (pi_tilde): The latent index governing the consideration probability in the empirical specification; it depends on own and rival store counts in the same and neighboring markets and on firm fixed effects, but is excluded from the marginal profit index — constituting the empirical exclusion restriction that separates the consideration and payoff channels.

The Illiquidity of Water Markets

Mon, 01 Jan 0001 00:00:00 +0000

Donna and Espín-Sánchez investigate whether a market (sequential English auction) or a non-market institution (fixed quota) more efficiently allocates an intermediate good — irrigation water — when some buyers are liquidity constrained. The setting is Mula, a city in southeastern Spain, where farmers used an unregulated water auction continuously from 1244 until August 1, 1966, when the institution was replaced by a fixed quota system. This 700-year natural experiment, combined with the fact that water demand for a given crop is pinned down by the crop’s production function rather than by farmer wealth, allows the authors to separately identify liquidity constraints from unobserved heterogeneity in productivity.

The empirical context has four features the authors exploit. First, the pre-1966 auction was entirely unregulated, so price differences directly reflect valuations without the confounds of regulatory changes. Second, water is an intermediate good for apricot production; conditional on plot area, tree count, and crop type, demand is determined by the apricot tree’s biological water requirements — not by the farmer’s wealth — so wealthy and poor farmers growing the same bulida apricot variety share the same underlying demand up to an idiosyncratic productivity shock. Third, farmers are classified as wealthy if they held positive urban real estate (non-agricultural wealth) in 1955 tax records; wealthy farmers’ average annual urban rental income (5,702 pesetas) far exceeded their average annual water expenditure (500 pesetas, rising to 1,619 in the highest-expenditure year, 1963), supporting the assumption that wealthy farmers were never liquidity constrained. Fourth, the 1966 institutional shift to quotas — under which each farmer received a fixed water allotment (tanda) every three weeks proportional to plot size, paying only a small annual maintenance fee after the critical season — provides the counterfactual.

The authors build a structural dynamic demand model with three key features: storability (irrigation raises soil moisture, creating intertemporal substitution between periods because water evaporates partially), liquidity constraints (poor farmers cannot always afford water during the critical season when prices peak), and weather seasonality (the critical season, corresponding to apricot fruit growth stages II–III and the Early Post-Harvest period, spans roughly weeks 18–32 and is when trees most need water). Farmers are forward-looking and form expectations about future prices and rainfall. The model’s production function, drawn from the agricultural engineering literature (Torrecillas et al., 2000; Allen et al., 2006), transforms soil moisture into apricot output via a transformation rate parameter gamma, a hydric stress coefficient, and a seasonal dummy.

Demand parameters are estimated using a two-step conditional choice probability (CCP) estimator (Hotz et al., 1994) on wealthy farmers only, then projected onto poor farmers’ welfare calculations. The sample consists of 24 single-crop apricot farmers observed in weekly auction records from January 1955 to July 1966, embedded in a market with over 500 total participants.

The main finding is that the institutional change from auction to quota increased total efficiency. Welfare increased by 23.4 real pesetas per farmer per tree, a 6 percent increase in total apricot production relative to the market. This gain arises because: (1) farmers were relatively homogeneous in productivity (small idiosyncratic shocks), so the primary source of misallocation was not productivity heterogeneity but wealth heterogeneity; (2) liquidity constraints prevented poor farmers from purchasing water during the critical season when their valuation was high, causing them instead to buy earlier (at lower prices but with partial evaporation loss) or later (when their trees had already experienced hydric stress); and (3) the apricot production function is concave in water, so uniform quota allocation is more efficient than market allocation when farmers are approximately homogeneous. The paper provides the first empirical demonstration that liquidity constraints can reverse the standard efficiency ranking of markets over quotas.

Q: What is the core research question? A: The paper asks whether a free market (water auction) or a non-market institution (fixed quota) more efficiently allocates an intermediate good when some buyers are liquidity constrained. The theoretical ranking is ambiguous when agents are heterogeneous in both productivity and wealth, making this an empirical question. The authors find that quotas dominated the auction in the specific Mula setting.

Q: What was the historical water market in Mula and when did it end? A: From 1244 to 1966 — over 700 years — Mula farmers used a sequential ascending-price (English) auction to allocate river water. The auctioneer sold water in discrete units called cuartas (each representing 3 hours of canal flow, or approximately 432,000 liters), holding 40 units per weekly Friday session. Farmers paid in cash on auction day. On August 1, 1966, the farmers’ union (Sindicato de Regantes) replaced the auction with a fixed quota system, having secured a credit line to purchase water property rights share by share.

Q: How did the quota system work, and how did it eliminate liquidity constraints? A: Under the quota, each plot of land received a fixed water allotment (tanda) every three weeks, proportional to plot size. Farmers paid only a small annual maintenance fee to the Sindicato at year-end, after the critical season harvest. Because payment occurred after farmers collected harvest revenue, no farmer was liquidity constrained under the quota. The fee was substantially lower than the per-unit average price under the market.

Q: How do the authors identify liquidity constraints separately from unobserved heterogeneity in productivity? A: The key insight is that water is an intermediate good whose demand is determined by the apricot tree’s biological production function, not by farmer wealth. Two farmers growing the same bulida apricot variety with the same number of trees should have the same water demand up to an idiosyncratic shock. The authors use wealthy farmers (those with positive urban real estate in 1955 tax records) to estimate preferences, under the assumption that wealthy farmers are never liquidity constrained. They then verify that outside the critical season, wealthy and poor farmers purchase similar amounts of water; the purchasing divergence appears only during the high-price critical season, consistent with a cash constraint rather than a preference difference.

Q: What empirical evidence shows poor farmers were liquidity constrained rather than simply less interested in water? A: Poor farmers display a bimodal purchasing pattern inconsistent with the apricot tree’s biological water needs: they buy water before the critical season (when prices are low) in anticipation of not being able to afford it during the critical season, and again after the critical season (when prices fall) to prevent their trees from withering from dehydration. Wealthy farmers, by contrast, delay purchases strategically to the critical season when trees most need water (weeks 18–32). Regression analysis confirms that wealthy farmers purchase significantly more water per tree during the critical season than poor farmers growing identical bulida apricots, while the difference outside the critical season is not statistically significant.

Q: How were wealthy farmers defined and why does their wealth validate the non-constrained assumption? A: A farmer is defined as wealthy if the value of their urban real estate (from 1955 urban tax records) is positive, and as poor if it is zero. Urban real estate constitutes non-agricultural wealth uncorrelated with the apricot production function. Wealthy farmers’ average annual urban rental income was 5,702 pesetas, while their average annual water expenditure was only 500 pesetas (rising to 1,619 pesetas in 1963, the highest-expenditure sample year). This large gap supports the assumption that wealthy farmers could always afford water purchases.

Q: What is the model’s treatment of soil moisture dynamics and why does it matter? A: Soil moisture (M_it) evolves according to an agricultural engineering formula: it increases with rainfall and irrigation purchases (each unit adding 432,000 liters divided by plot area) and decreases via evapotranspiration (ET), subject to a full-capacity ceiling (FC) and a permanent wilting point (PW) lower bound. This storage structure creates intertemporal substitution — water purchased early partially substitutes for future purchases, but at a cost (evaporative loss). The dynamics mean poor farmers who pre-buy water before the critical season lose some of that investment to evaporation, generating a real efficiency loss relative to the quota that delivers water closer to when it is biologically needed.

Q: What are the two sources of potential inefficiency the authors identify? A: The first is inefficiency due to heterogeneity: if farmers differ in ex-post productivity (captured by idiosyncratic shocks epsilon_it), allocating water to a less productive farmer at a given moment is wasteful. Markets correct this inefficiency (they direct water to highest-valuation buyers) while quotas do not. The second is inefficiency due to decreasing marginal returns (DMR): because the production function is concave in water, giving water to a farmer with already-high soil moisture is less productive than giving it to a farmer with low moisture. Quotas naturally avoid DMR inefficiency by allocating uniformly; markets with liquidity constraints exacerbate DMR inefficiency by directing scarce critical-season water to wealthy farmers who may have already accumulated moisture from prior purchases.

Q: What is the main quantitative result of the welfare analysis? A: Switching from the market auction to the fixed quota system increased welfare by 23.4 real pesetas per farmer per tree, representing a 6 percent increase in total apricot production relative to the market counterfactual. This is computed as the difference in yearly mean welfare per tree per farmer (net of irrigation costs, excluding water expenditures which are transfers) between the quota and market allocations using the estimated structural model.

Q: Under what conditions is a quota more efficient than a market with liquidity constraints? A: Quotas dominate markets when three conditions hold simultaneously: (1) farmers are relatively homogeneous in productivity (so the market’s advantage of directing water to high-valuation buyers is small), (2) liquidity constraints are significant (so the market misallocates water away from constrained high-valuation farmers), and (3) the production function is concave in water (so uniform allocation is efficient when farmers are homogeneous). The authors find all three conditions hold in Mula. Conversely, markets dominate quotas when heterogeneity in productivity is large relative to heterogeneity in wealth.

Q: How is the transformation rate parameter gamma estimated and interpreted? A: The transformation rate gamma measures how soil moisture above the permanent wilting point converts into apricot output (in pesetas) during the critical season, via the production function h() = gamma * (M_it - PW) * KS(M_it) * Z(w_t). It is identified from variation in purchasing patterns across seasons and variation in moisture across farmers within the same season. The preferred specification (column 3 of Table 3) yields gamma_L = 0.05. With average moisture per tree (accounting for the hydric stress coefficient) of 873.93 during the critical season, a farmer earns on average 29.09 pesetas per tree per week during the critical season, or 407.25 pesetas per tree per year.

Q: How does ignoring liquidity constraints bias demand estimates? A: If one estimates demand using the full sample (poor and wealthy farmers pooled), a decrease in demand during the critical season when prices rise conflates two effects: (1) the standard price effect (fewer farmers have valuations above the price) and (2) the liquidity constraint effect (some farmers with valuations above the price still cannot buy because they lack cash). Attributing the second effect to price sensitivity overstates the demand elasticity, biasing its absolute value upward.

Q: What robustness checks do the authors provide against unobserved heterogeneity? A: The authors provide four pieces of evidence that wealthy and poor farmers do not have systematically different underlying preferences: (1) wealthy and poor farmers are not geographically sorted into different locations (both groups appear in subareas 1, 2, 4, and 7); (2) wealthy and poor farmers grow the same bulida apricot variety; (3) outside the critical season, wealthy and poor farmers purchase statistically similar amounts of water; and (4) the purchasing divergence is significant only during the critical season when prices are high, precisely the pattern predicted by the liquidity constraint mechanism.

Q: What are the policy implications for water allocation in developing countries? A: The paper implies that before introducing water markets in regions where farmers may be liquidity constrained, policymakers should assess the magnitude of those constraints. If liquidity constraints are significant and farmers are relatively homogeneous in productivity, a quota system or a market supplemented with credit provision may deliver higher efficiency than a pure market. The standard presumption that markets outperform quotas can reverse when poor farmers cannot access credit to purchase water at the times they most need it.

Q: How does this paper relate to Che et al. (2013)? A: Che, Gale, and Kim (2013) assume agents consume at most one unit with linear utility and find that markets always dominate quotas, though some non-market mechanisms with resale outperform markets. Donna and Espín-Sánchez extend this framework by allowing multiple discrete units, a concave utility function, and intertemporal dynamics. Under these extensions, the efficiency ranking between markets and quotas is theoretically indeterminate, and the authors show empirically that quotas can dominate markets. Both papers agree that non-market mechanisms with resale outperform both markets and simple quotas.

Liquidity constraint (paper’s sense): A farmer is liquidity constrained when they lack sufficient cash to purchase water at the prevailing auction price, even if their valuation (marginal productivity of water) exceeds that price. In Mula, poor farmers without urban real estate income faced this constraint during the critical season when prices peaked, because they had already spent their harvest proceeds from the prior year and lacked access to credit markets.

Soil moisture (M_it): The state variable measuring water accumulated in a farmer’s plot, computed using the agricultural engineering evapotranspiration formula. Moisture increases with rainfall and irrigation purchases (each auction unit contributing 432,000 liters divided by plot area) and decreases via evapotranspiration. It is bounded below by the permanent wilting point (PW) — below which trees die — and above by field capacity (FC). Moisture creates intertemporal substitution in demand.

Critical season: The period corresponding to apricot fruit growth stages II and III and the Early Post-Harvest (EPH) period, spanning approximately weeks 18–32 (early May to early August). This is when the bulida apricot tree transforms water into fruit at the most rapid rate, when water demand peaks biologically, and when auction prices rise to their highest levels. It is the season during which liquidity constraints are binding.

Transformation rate (gamma): The parameter in the apricot production function that measures the rate at which excess soil moisture (above the permanent wilting point) converts into apricot output (measured in real pesetas) during the critical season. Estimated at gamma_L = 0.05 in the preferred specification (column 3). It is identified from cross-seasonal variation in purchasing patterns and cross-farmer variation in moisture levels.

Inefficiency due to decreasing marginal returns (DMR): One of two sources of allocation inefficiency identified in the paper. It arises when a farmer with already-high soil moisture receives water, yielding less additional output than if that water had gone to a farmer with lower moisture, given the concavity of the production function. Quotas avoid this inefficiency by allocating uniformly; markets with liquidity constraints exacerbate it by directing critical-season water to wealthy farmers who may have accumulated moisture from earlier purchases.

Cuarta (quarter): The unit of water sold at Mula auctions, representing the right to use water flowing through the main channel for three hours. At approximately 40 liters per second of flow, each cuarta carried approximately 432,000 liters of water. Water rights and land rights were held independently; farmers who participated in auctions owned only land, while waterlords separately owned canal usage rights.

Conditional choice probability (CCP) estimator: The two-step estimation procedure used to recover demand parameters from wealthy farmers’ purchasing choices. In Step 1, transition probability matrices for observable state variables (moisture, week, price, rainfall) are computed and CCP is estimated via multinomial logit. In Step 2, the value function is forward-simulated using these transition matrices and parameters are estimated by GMM, following Hotz et al. (1994).