A Cognitive Theory of Reasoning and Choice
What this paper finds — and why it matters
Bordalo, Gennaioli, Lanzani, and Shleifer develop a cognitive theory of choice in which a decision maker’s attention to the features of options is determined by her categorization of the current problem against a memory database of problems she solved in the past. The core claim is that before solving a problem, the decision maker asks “what kind of problem is this?” and resolves it by selecting the category — indexed by a prototype attention-plus-context vector and a time-discounted frequency — whose similarity to the current problem is maximized. This problem recognition step then pins down which features (price, quality, probabilities) receive attention, which in turn shapes valuation and choice.
The model formalizes two-step choice. In step one (recognition), the decision maker jointly chooses an attention vector alpha_P and a category c* to maximize a separable similarity function S[(alpha_P, kappa_P), (alpha_c, kappa_c)] weighted by category frequency F_c, plus a Type I extreme-value shock that yields a logit probability over categories. In step two, she maximizes perceived value over the menu using the endogenously determined weights. Perceived hedonic value of feature i shrinks toward the menu average when alpha_{P,i} < 1; perceived probabilities compress toward uniform when the event-attention weight falls below 1, producing probability overweighting of unlikely events. Full attention recovers expected utility.
The model yields three structural predictions that hold without changing tastes or information. First, within-person multi-modal attention: because categorization is stochastic, the same person can cluster on entirely different features (e.g., the base rate vs. the likelihood in an inference problem) across otherwise identical choice occasions. Second, systematic context-driven instability: when an irrelevant context feature kappa_{P,i} drifts away from a category’s diagnostic kappa_{c,i}, the probability of that category falls discontinuously, causing a discrete switch in the attention profile and hence in valuation. Third, experience-driven heterogeneity: people more frequently exposed to a category (higher F_c) are more likely to use it, producing persistent differences in price elasticities or probability weighting at constant income and tastes.
Applied to riskless consumer choice, the paper introduces two categories — “buying” (full attention to price, partial to quality: alpha_{M_g}=1 > alpha_{Q_g}=alpha) and “consuming” (full attention to quality, partial to price: alpha_{Q_g}=1 > alpha_{M_g}=alpha). A jam problem categorized as buying yields valuation v = alphaq - etap; categorized as consuming, v = q - alphaetap. The valuation jumps discontinuously as context crosses a threshold kappa*, which shifts when relative category frequency F_{buy}/F_{con} changes. This framework accounts for context-dependent price elasticities (Wakefield and Inman 2003), poverty-driven excess price focus (Shah et al. 2018), de-commoditization through advertising, and mental accounting anomalies including opportunity cost neglect and the sunk cost fallacy — both arising because con neglects capital gains (alpha_{con,Delta_M}=0) and buy neglects quality shocks (alpha_{buy,Delta_Q}=0).
Applied to statistical judgment, the paper introduces two categories — “frequency estimation” (attention alpha_1=1 to a single i.i.d. draw from a known DGP) and “agnostic inference” (attention alpha_S=1 to the share of heads as a sufficient statistic). The threshold N* separates recognition: for sequence length N_P < N*(F_{freq}/F_{inf}), the decision maker categorizes as frequency and correctly assesses odds; for N_P >= N*, she switches to inference and overweights balanced sequences, producing the Gambler’s Fallacy. The same competition between categories also accounts for base rate neglect, conjunction fallacy, and correlation neglect, with the bias strengthening as sequences grow longer.
Applied to risky choice, bottom-up salience — sensory prominence and contrast — interacts with categorization. A publicity shock drawing attention to a low-probability contamination risk raises similarity to “consuming,” triggering a category switch that amplifies attention to quality broadly and reduces attention to price, producing large valuation drops disproportionate to the actual probability shift. This mechanism generates the framing effects of prospect theory without a stable S-shaped utility function: gains and losses frames correspond to different contexts activating different categories.
Scope conditions: the theory applies when features and their values are fully known to the decision maker (no uncertainty about attributes), so the distortions take the form of altered sensitivity to known features rather than missing information. The set of categories C is taken as given in the formal analysis, though the authors discuss endogenization as future work.
Q: What is the paper’s central departure from standard rational inattention and noisy-perception models?
A: Standard models (Sims 2003, Woodford 2012, Enke and Graeber 2023) produce unimodal, stably weighted valuations — the decision maker’s weighting of features is a smooth function of payoff-relevant costs or priors. In this paper, the weighting is determined by problem recognition, which is discrete and stochastic, producing within-person multi-modal attention: the same person can cluster on entirely different features across identical problems. The authors cite direct evidence from Bordalo, Conlon, Gennaioli, Kwon, and Shleifer [20] showing bimodal clustering on base rates vs. likelihoods in statistical problems, a pattern inconsistent with stable-weighting models.
Q: How is perceived value distorted when the attention weight on a hedonic feature is below 1?
A: The perceived value of hedonic feature i is u_i(alpha_P) = alpha_{P,i} * u_i + (1 - alpha_{P,i}) * u_bar_i, where u_bar_i is the average value of that feature across options in the menu. An attention weight of zero collapses perceived variation in that feature to zero; full attention recovers the true value. The implication is that under-attention shrinks the decision maker’s effective sensitivity to a known attribute, causing systematic under- or over-valuation relative to a rational benchmark while tastes (marginal utilities) are held fixed.
Q: How is perceived probability distorted?
A: With attention weight alpha_{P,W} on event W, the perceived probability of event e is P(e)^{alpha_{P,W}} / sum_{e’} P(e’)^{alpha_{P,W}}, which compresses the distribution toward uniform as alpha_{P,W} falls toward 0 and recovers the true distribution at alpha_{P,W}=1. In the jam example, under-attention to the small probability of spoilage causes the decision maker to overestimate the risk of contamination. For multi-dimensional event vectors the formula generalizes multiplicatively, allowing “editing out” of entire event dimensions (e.g., urn selection in a balls-and-urns problem) when their attention weight hits zero.
Q: What is the mechanism for context-dependent price elasticity?
A: When context kappa_P is below threshold kappa*(F_{buy}/F_{con}), the decision maker categorizes the problem as “buying” and her valuation is v = alphaq - etap, giving a high price sensitivity (coefficient eta) and attenuated quality sensitivity (coefficient alpha < 1). Above kappa*, she categorizes as “consuming” and valuation is v = q - alphaetap, reversing the emphasis. Because the threshold kappa* is increasing in relative frequency F_{buy}/F_{con}, a decision maker with more buying experience has a higher threshold and thus acts as more price-elastic at any given context level. These elasticity differences arise without any change in the true marginal utility of money eta or quality q.
Q: How does the model generate the sunk cost fallacy and opportunity cost neglect as a unified phenomenon?
A: Both anomalies arise because buying and consuming categories selectively neglect shocks. In the football example, recognizing the problem as “buying” activates alpha_{buy,Delta_Q}=0, so the blizzard quality shock Delta_q<0 is ignored and the decision maker drives to the game as if the shock did not occur — the sunk cost fallacy. In the wine example, recognizing the problem as “consuming” activates alpha_{con,Delta_M}=0, so the capital gain Delta_p is ignored and the decision maker reports a zero or purchase-price cost — opportunity cost neglect. The unifying mechanism is that each category attends only to the features diagnostic of its prototypical experiences: buying attends to price paid and normal quality; consuming attends to realized quality and partly to price, but not to capital gains.
Q: What comparative static does the model predict for sunk cost susceptibility based on experience?
A: People with higher F_{buy} (more buying experiences, e.g. poverty experiences or having recently purchased but not yet consumed the good) exhibit more sunk cost fallacy and less opportunity cost neglect. Conversely, season ticket holders face many consuming experiences relative to one buying event, raising F_{con} and thus reducing susceptibility to the sunk cost fallacy for sports events. Making the blizzard more salient in the description shifts similarity toward “consuming,” also reducing the sunk cost fallacy through a different channel (bottom-up salience rather than experience).
Q: What is the paper’s explanation for the Gambler’s Fallacy, and what distinguishes it from prior accounts?
A: The Gambler’s Fallacy arises when sequence length N_P exceeds threshold N*(F_{freq}/F_{inf}), causing the decision maker to switch from the frequency category (which attends to the 50:50 fairness of the coin) to the inference category (which attends to the share of heads). Under inference, the decision maker treats balanced and unbalanced sequences as representatives of their “share of heads equivalence class,” and the class of balanced sequences is larger, so balanced sequences receive higher estimated probability — the Gambler’s Fallacy. This differs from Rabin and Vayanos (2010), where the bias stems from a belief that the coin is drawn from a pool; here the decision maker knows the coin is fair (kappa_{P,U}=0.5) but the inference representation causes question substitution rather than a wrong model of the DGP.
Q: How does the model make the Gambler’s Fallacy testable beyond length effects?
A: The model predicts the bias is stronger for decision makers who recently solved many inference problems (lower F_{freq}/F_{inf}), and weaker when the 50:50 nature of flips is made bottom-up salient in the choice context (because salience raises similarity to the frequency category, hindering recognition of inference). These cognitive proxies — experience frequencies and bottom-up salience — are orthogonal to the statistical content of the problem and thus allow identification of the mechanism separately from changes in information or incentives.
Q: How does the model produce framing effects in risky choice without a stable S-shaped utility function?
A: Gains and losses frames are modeled as different context vectors kappa_P that differentially increase similarity to a “safe outcome” category or a “risk” category. Recognizing the problem as the safe-outcome category shifts attention toward the certain option; recognizing it as the risk category shifts attention toward variance. The reversal of preferences between gain and loss frames (the Asian Disease problem, Tversky and Kahneman 1981) thus emerges from context-driven re-categorization rather than from a fixed probability weighting function. The novel prediction is that framing effects should be stronger for decision makers with more experience with the category activated by each frame, and weaker when bottom-up salience of the alternative frame’s features is raised.
Q: How does bottom-up salience interact with top-down categorization in the contamination example?
A: A publicity shock alpha_{delta,Q_b}>0 raises baseline attention to the spoiled-jam quality feature, increasing the similarity of the current problem to the “consuming” category (where quality is focal). This triggers a category switch for marginal agents, activating the full consuming attention profile — which attends to quality broadly, not just to contamination specifically, and reduces attention to price. The resulting valuation drop is therefore disproportionate to the actual probability of contamination and exhibits price insensitivity, because re-categorization shifts the entire attention profile rather than just updating a single probability.
Q: How does the model relate to and distinguish itself from case-based decision theory (Gilboa and Schmeidler 1995) and analogical reasoning (Mullainathan 2002, Fryer and Jackson 2008)?
A: In Gilboa-Schmeidler and related models, the decision maker uses past cases to resolve uncertainty about unknown attributes of current options; attention is full and the mechanism is extrapolation of payoffs from similar cases. In Mullainathan (2002) memory-based model, categories again serve to fill in missing information. In this paper, there is no uncertainty about attributes — features and their values are fully known — and the distortion instead takes the form of altered sensitivity to known features through selective attention. This allows the model to produce biases even in simple problems with full data disclosure, and to explain phenomena like base rate neglect and price insensitivity that are not primarily about missing information.
Q: What does the model predict about within-person versus across-person distributions of valuations?
A: Within a person, attention is multi-modal (bimodal in the two-category case) because categorization is stochastic. However, if many categories are possible across the population, the aggregate distribution of valuations can appear approximately unimodal even though each individual’s distribution is not. This distinction is empirically important: a researcher observing average choices may incorrectly infer smooth preference heterogeneity when the underlying mechanism is discrete category switching.
Q: What cognitive proxies does the model propose for empirical identification?
A: The theory links endogenous attention and choice to three observable (or measurable) proxies: (1) past experience frequencies F_c, measurable from administrative histories, surveys about past exposure, or experimental manipulation of training; (2) contextual similarity, measurable from field or experimental variation in irrelevant context features; and (3) bottom-up salience, experimentally controllable via prominence or contrast manipulations. The key identification logic is that these proxies are payoff-irrelevant — they do not change tastes, information, or the objective choice problem — yet predict systematic shifts in choice through their effect on recognition.
Problem Recognition: The first step in the decision maker’s choice process, in which she jointly selects an attention vector alpha_P and a category c* by maximizing weighted similarity between the current problem (characterized by its context vector kappa_P) and the prototype of a past category (alpha_c, kappa_c), multiplied by the category’s time-discounted frequency F_c. Recognition is not about resolving uncertainty over attributes but about selecting which known attributes to attend to.
Category: A partition element of the decision maker’s memory database, indexed by a prototype attention-plus-context vector (alpha_c, kappa_c) and a frequency scalar F_c. The prototype encodes both the context features diagnostic of experiences in that category (binary alpha_{c,i} for i in Phi_K) and the attention to hedonic and event features (alpha_{c,i} for i in Phi_H union Phi_E) used when solving problems in that category. Examples in the paper: “buying” and “consuming” for riskless choice; “frequency estimation” and “agnostic inference” for statistical judgment.
Attention Weight (alpha_{P,i}): A scalar in [0,1] assigned to feature i of the current problem P. For hedonic features, alpha_{P,i}<1 collapses perceived variation toward the menu average; for event features, alpha_{P,i}<1 compresses perceived probabilities toward uniform. Full attention alpha_{P,i}=1 recovers expected utility. Attention weights are the endogenous output of the recognition step, not fixed preference parameters.
Contextual Similarity S: A separable function measuring how close the current problem (alpha_P, kappa_P) is to a category prototype (alpha_c, kappa_c). It decreases in discrepancies in the attention vector (measured by a strictly increasing, convex function d) and in discrepancies in the values of context features diagnostic of the category (d_i(kappa_{P,i}, kappa_{c,i}) * alpha_{c,i}). Endogenous attention to context is set to reduce sensitivity to discrepancies, not to eliminate them.
Mental Accounting (as categorization): In the paper’s account, non-fungibility, sunk cost fallacy, and opportunity cost neglect all arise because buying and consuming categories selectively attend to different monetary and quality features. The sunk cost effect is alpha_{buy,Delta_Q}=0; opportunity cost neglect is alpha_{con,Delta_M}=0. Mental accounts are not separate budget constraints but the by-product of category-specific attention profiles that were calibrated to normal-state experiences and do not generalize to shocks.
Bottom-up Salience: Exogenous attention to a feature driven by sensory prominence (described by alpha_{delta,i} in the problem’s presentation vector) or payoff contrast (the DM attends more to features where her option’s value deviates more from the menu average relative to total menu variance). Bottom-up salience raises baseline attention to a feature before top-down categorization acts, and can trigger a category switch by raising similarity to the category for which that feature is focal.
Gambler’s Fallacy via Question Substitution: In the model, the Gambler’s Fallacy arises when a long sequence length kappa_{P,N} causes recognition of the “agnostic inference” category, which focuses attention on the share of heads alpha_S=1. The decision maker then treats sequences as representatives of a “share of heads equivalence class,” and since the balanced class is larger than the unbalanced class, balanced sequences are assigned higher estimated probability. This is not a belief that the coin is unfair; it is question substitution induced by the inference representation.