Forthcoming [American Economic Review] doi:10.1257/aer.20240712

Random Utility with Unobservable Alternatives

Haruki Kono

Kota Saito

Alec Sandroni

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

This paper addresses a foundational gap in the random utility model (RUM) literature: existing axiomatizations by Falmagne (1978) and McFadden and Richter (1990) assume that whenever a menu is observed, the choice frequencies of all alternatives in that menu are observable. In practice, the choice frequencies of some alternatives are routinely missing. The paper derives the full testable implications of the random utility model for such incomplete datasets, delivering a finite, nonredundant system of linear inequalities as a necessary and sufficient condition for RU-rationalizability.

The empirical backdrop motivates the formal contribution directly. In transportation choice (bus, train, walk, drive), revenue data from transit operators can reveal the market shares of bus and train but not walking or driving without survey data. In school choice, governments observe enrollment across public schools but may lack data on private school selections. In market-share analysis, private firms may not disclose sales figures. In each case, researchers typically aggregate all unobservable alternatives into a single “outside option,” treating it as one composite choice. The paper calls this the outside option approach and establishes its formal limitations.

The main theorem (Theorem 3.2) states that an incomplete dataset is RU-rationalizable if and only if two conditions hold jointly. The first is the classical nonnegativity of Block-Marschak (BM) polynomials, which appears in Falmagne’s original characterization and requires that certain inclusion-exclusion quantities over observed choice frequencies are nonneg. The second is a novel balance condition: for any “essential test collection” of choice sets, a specific net signed sum of BM polynomials across observable arcs crossing the boundary of that collection must be nonneg. This second condition captures the informational content that is lost when unobservable alternatives are collapsed. The characterization is nonredundant in the strong sense that removing any single inequality from either condition produces a strictly weaker system — every inequality is independently binding for some dataset.

The limitation of the outside option approach is made precise by Proposition 3.5: the reduced dataset formed by the outside option approach is RU-rationalizable whenever the original incomplete dataset satisfies condition (i) and condition (ii) for singleton essential test collections only. Consequently, if the original data violates condition (ii) for non-singleton essential test collections — meaning it is not genuinely RU-rationalizable — the outside option approach will nonetheless return a verdict of rationalizability. False acceptance of the random utility model is therefore possible under the outside option approach.

The proofs translate the rationalizability problem into a network flow problem on the hypercube lattice over subsets of alternatives, following Fiorini (2004). Each path from the empty set to the full alternative set corresponds to a linear order (ranking). The key methodological innovation is applying a feasibility theorem from network flow theory — specifically a generalization drawing on the max-flow min-cut theorem — to derive the necessary and sufficient conditions in the incomplete-data setting.

The paper also provides an efficient algorithm for computing tight bounds on unobservable choice frequencies, formulated as a minimum-cost transshipment problem. Because the constraint matrix is totally unimodular (it is the incidence matrix of a network), the network simplex algorithm applies directly. Applied to a lottery-choice dataset from McCausland et al. (2020) — 141 participants each choosing from subsets of five lotteries, with choices made six times per choice set — the authors treat two of the five lotteries as unobservable and compare bound widths. Their method yields significantly tighter bounds than the outside option approach and, critically, correctly identifies that lottery 4 is more desirable than lottery 3 among the unobservable alternatives. The outside option approach yields identical trivial bounds for both lotteries and thus cannot distinguish their relative desirability at all.

Q: What is the central research question? A: The paper asks: what are the testable implications of the random utility model when the choice frequencies of some alternatives are unobservable? The goal is a necessary and sufficient condition for RU-rationalizability under incomplete observation, along with a demonstration of what is lost when the standard outside option approach is used instead.

Q: What is the random utility model and why is it the focus? A: The random utility model posits a probability distribution over strict rankings of alternatives; each individual’s preferences correspond to one ranking. It is a cornerstone of discrete choice analysis in economics. Falmagne (1978) and McFadden-Richter (1990) characterized it under full observability of choice frequencies, making the extension to incomplete data a natural and practically important frontier.

Q: What does “incomplete dataset” mean formally in this paper? A: An incomplete dataset is a nonneg vector of choice frequencies satisfying: (i) for menus composed entirely of observable alternatives, frequencies sum to one; (ii) for menus that include at least one unobservable alternative, the sum of observable-alternative frequencies is at most one. The gap between the sum and one corresponds to the unobserved probability mass on unobservable alternatives.

Q: What are Block-Marschak polynomials and why do they appear? A: The Block-Marschak (BM) polynomial K(rho, D, x) is defined by inclusion-exclusion: it sums, with alternating signs, the choice frequency of alternative x over all supersets E of D. In Falmagne’s complete-data characterization, nonnegativity of all BM polynomials is necessary and sufficient for RU-rationalizability. In the incomplete-data setting, nonnegativity of BM polynomials remains necessary but is no longer sufficient.

Q: What is the novel condition in Theorem 3.2 beyond BM nonnegativity? A: Condition (ii) of Theorem 3.2 requires that for any “essential test collection” C of choice sets, the net observable outflow — the sum of BM polynomials on arcs leaving C minus the sum on observable arcs entering C — is nonneg. This balance condition captures the constraint that unobservable flow must be nonneg on every cut of the network corresponding to an essential test collection.

Q: What makes the characterization nonredundant, and why does nonredundancy matter? A: The characterization is nonredundant in the sense that for every individual inequality in conditions (i) and (ii), there exists an incomplete dataset that violates only that inequality and satisfies all others. This is established as part (b) of Theorem 3.2. Nonredundancy is essential for identifying precisely which inequalities the outside option approach discards: without it, some of the novel condition (ii) inequalities might be implied by others, and the argument that the outside option approach loses independent information would not hold.

Q: What does the outside option approach actually discard? A: Proposition 3.5 shows that the outside option approach retains only condition (i) (BM nonnegativity) and condition (ii) for singleton essential test collections. All condition (ii) inequalities corresponding to non-singleton essential test collections are discarded. Because the characterization is nonredundant, each discarded inequality is a genuinely independent constraint, meaning a dataset can violate any one of them while satisfying all others — including all conditions the outside option approach checks.

Q: Can the outside option approach produce a false acceptance of the random utility model? A: Yes. If the true incomplete dataset violates condition (ii) for some non-singleton essential test collection but satisfies all other conditions of Theorem 3.2 — including all conditions the outside option approach checks — then the original dataset is not RU-rationalizable, but the reduced dataset formed by collapsing unobservables into one outside option is RU-rationalizable. Researchers using the outside option approach would therefore erroneously conclude that the data-generating process follows a random utility model.

Q: How is the problem translated into a network flow problem? A: The authors build a directed network on the power set of alternatives, with arcs from D to D union {x} for each alternative x not in D, source at the empty set, and terminal at the full set X. Each source-to-terminal path corresponds to a unique linear order. A probability distribution over rankings corresponds to a flow, with flow conservation at interior nodes and total flow equal to one. The BM polynomial of an observable arc equals the required flow on that arc. Feasibility of this flow — guaranteed by a theorem generalizing max-flow min-cut — is equivalent to RU-rationalizability.

Q: What is the algorithmic contribution for bounding unobservable choice frequencies? A: The bounds problem is formulated as a minimum-cost transshipment problem on the same network. Because the constraint matrix is the incidence matrix of a network (totally unimodular), the network simplex algorithm applies and yields exact solutions efficiently. The algorithm produces tight upper and lower bounds for each unobservable choice frequency by optimizing the flow subject to all feasibility constraints from Theorem 3.2.

Q: How does the paper demonstrate tighter bounds empirically? A: The paper applies its method to a lottery stochastic choice dataset from McCausland et al. (2020), involving 141 participants choosing from subsets of five lotteries, with six repeated choices per choice set. The authors treat two of the five lotteries as unobservable. Their network-flow bounds are significantly tighter than the trivial bounds from the outside option approach. Specifically, their method correctly identifies that lottery 4 is more desirable than lottery 3 among the unobservable alternatives, a distinction the outside option approach cannot draw because it assigns identical trivial bounds to both lotteries.

Q: What is the monotonicity-based lower bound for unobservable choice frequencies? A: Under monotonicity (a weaker condition than full RU-rationalizability), the lower bound L(x*) for the choice frequency of unobservable alternative x* from menu D is the sum over observable alternatives a of the difference rho(D{x*}, a) minus rho(D, a), when D{x*} is in the domain. This lower bound is larger when removing x* from the menu substantially increases observable choice frequencies, indicating that x* was drawing demand away from observables and is therefore relatively desirable.

Q: How does this paper relate to McFadden-Richter (1990)? A: McFadden and Richter (1990) allow for menus to be unobserved but require that when a menu is observed, all its alternative frequencies are observed — a distinct setup from the present paper. Their characterization also involves infinitely many inequalities and is redundant. The present paper’s characterization uses finitely many inequalities and is nonredundant, making it more tractable both theoretically and computationally.

Q: What is the scope of the model regarding which alternatives are unobservable? A: The paper focuses on the case where the set of unobservable alternatives X* is fixed and consistent across all menus: a given alternative is either always observable or always unobservable. The domain of choice sets D is assumed to be an upper set (if a menu is in D, all supersets are too). The paper does not handle cases where observability of an alternative varies by menu.

Incomplete dataset: A nonneg vector of choice frequencies in which, for menus containing unobservable alternatives, the observable frequencies sum to at most one (not exactly one), with the residual mass attributable to unobservable alternatives.

Block-Marschak (BM) polynomial: An inclusion-exclusion quantity K(rho, D, x) defined as the alternating-sign sum of rho(E, x) over all supersets E of D; its nonnegativity is the classical Falmagne condition for RU-rationalizability under complete observation.

Essential test collection: A collection C of choice sets used to define the novel balance condition in Theorem 3.2; for each such C, the net observable outflow of BM polynomial values across the boundary of C must be nonneg for RU-rationalizability.

Outside option approach: The empirical practice of aggregating all unobservable alternatives into a single composite “outside option,” so that all remaining choice frequencies sum to a value below one and the residual is assigned to that composite. This approach retains only a subset of the testable implications of the random utility model.

Nonredundant characterization: A system of inequality conditions in which no single inequality is implied by the conjunction of all others; every inequality is independently binding for some dataset. This property is essential for identifying precisely which implications the outside option approach discards.

Network flow representation: A directed network on the power set of alternatives (source: empty set, terminal: full set X) in which each source-to-terminal path encodes a linear order, flow conservation corresponds to probability conservation, and feasibility of a flow with prescribed values on observable arcs is equivalent to RU-rationalizability.

Minimum-cost transshipment problem: The optimization problem used to compute tight bounds on unobservable choice frequencies; tractable via the network simplex algorithm because the constraint matrix is totally unimodular (the incidence matrix of a network).

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.