C18 | Macro Paper Warehouse

Optimal Decision Rules When Payoffs are Partially Identified

Mon, 01 Jan 0001 00:00:00 +0000

This paper derives asymptotically optimal statistical decision rules for discrete choice problems when the payoffs associated with some choices are only partially identified. The research question is: how should a decision maker who can bound but not point-identify a payoff-relevant parameter θ use data to make optimal policy choices?

The framework separates two parameter types. The reduced-form parameter µ is point-identified and can be estimated from data. The structural parameter θ — such as the average treatment effect (ATE) in a target population — is set-identified, meaning only that θ ∈ Θ0(µ) can be established, where the identified set is indexed by µ. The decision maker confronts both ambiguity (arising from partial identification of θ given µ) and statistical uncertainty (µ must be estimated).

The authors propose a hybrid optimality criterion that applies minimax reasoning to the partially-identified parameter θ — choosing actions that minimize maximum risk over Θ0(µ) — while applying average (integrated) risk minimization over µ, reflecting the asymmetric nature of the two identification problems. This asymmetric treatment follows the generalized Bayes-minimax principle of Hurwicz (1951).

The optimal decision rule is implemented by computing, for each action, the maximum risk (or regret) over θ ∈ Θ0(µ) conditional on µ, then averaging this maximum risk across either (i) a bootstrap distribution for an efficient estimator µ̂, (ii) a posterior distribution for µ in parametric models, or (iii) a quasi-posterior based on a limited-information criterion in semiparametric models. The optimal action is whichever choice has the smallest average maximum risk.

A central theoretical result (Theorems 1 and 4) establishes formal asymptotic optimality for both parametric and semiparametric settings: Bayes and quasi-Bayes decisions with any prior whose density is positive, bounded, and continuous are asymptotically equivalent and optimal. Critically, the optimality of these rules is asymptotically independent of the choice of prior for µ. The authors also establish a necessity result (Theorems 2 and 5): any decision rule not asymptotically equivalent to the Bayes or bootstrap rule is strictly sub-optimal.

A key finding is that “plug-in” rules — which substitute an efficient point estimate µ̂ directly into the oracle decision rule — can be sub-optimal. This failure occurs generically under partial identification because the maximum risk function R(d,µ) is typically only directionally differentiable (not fully differentiable) in µ, owing to max and min operators in intersection bounds, linear program value functions, or other bound constructions. When full differentiability holds, Corollary 1 confirms plug-in rules are optimal; otherwise they are not. The empirical illustration demonstrates the practical consequence: for German male youths deciding whether to adopt a job-training program based on 14 RCT studies from Card, Kluve, and Weber (2017), the optimal rule recommends treatment (average quasi-posterior robust welfare contrast b̄n > 0) while the plug-in rule recommends against treatment (plug-in value b(µ̂) < 0). The lower bound maximum of µ̂k − C‖x0 − xk‖ is −0.3190 for the leading US study and −0.3298 for the second-best Brazilian study; because these two values are close relative to the average standard error of 0.034 across studies, the lower bound distribution is right-skewed (behaving like the maximum of two Gaussians), pushing b̄n positive even though b(µ̂) is negative.

The paper extends optimality theory to semiparametric models via a least favorable parametric submodel, introduces the concept of σ-optimality for cases where the average maximum risk criterion is infinite (relevant when the dimension K of µ exceeds 1), and provides detailed implementation guides for treatment assignment under intersection bounds, IV-like estimands, and non-separable panel data, as well as for optimal pricing decisions where revealed-preference demand theory bounds counterfactual demand responses via linear programming.

Scope conditions: optimality results apply to discrete action spaces, require efficient estimation of µ, require the identified set Θ0(µ) to be known as a set-valued mapping, and assume no “first-order ties” (the oracle decision is unique at µ0). The asymptotic framework is local, mimicking the finite-sample problem where µ is not known with certainty.

Q: What is the core decision problem this paper addresses?

A: A decision maker must choose from a finite set of actions D = {0, 1, …, D}. Payoffs depend on a structural parameter θ that is only set-identified — the data can establish θ ∈ Θ0(µ) but not pin down θ exactly. The reduced-form parameter µ is point-identified and estimated from data. The decision maker faces both ambiguity (which θ in Θ0(µ) is true?) and sampling uncertainty (what is µ?). The paper asks how to construct decision rules that are optimal in large samples under this dual uncertainty.

Q: What is the proposed optimality criterion, and why is it asymmetric across parameters?

A: The criterion applies minimax reasoning to the partially-identified θ — the maximum risk over Θ0(µ) given µ is the relevant loss — and integrates this maximum risk over µ using Lebesgue measure on local perturbations h = √n(µ − µ0) of a fixed µ0. The asymmetry reflects the fact that θ is not updated by the data (the prior for θ is not identified), while µ can be learned efficiently from the data. Full minimax over both (θ, µ) is rarely tractable even for simple binary treatment problems; the asymmetric approach yields tractable optimal rules for a broad empirically relevant class of settings.

Q: What are the Bayes, bootstrap, and quasi-Bayes implementations of the optimal rule?

A: In all three cases, the decision maker computes R̄n(d) — the average maximum risk for action d — and chooses the action that minimizes it. The Bayes rule averages R(d, µ) over the posterior πn(µ|Xn) for µ using Bayes’ theorem with a prior π on M. The bootstrap rule averages R(d, µ̂*) over bootstrap redraws µ̂* of the efficient estimator µ̂. The quasi-Bayes rule (for semiparametric models) uses a limited-information quasi-posterior N(µ̂, (nÎ)−1) combining a Gaussian quasi-likelihood with a prior for µ. All three implementations are asymptotically equivalent and optimal under the regularity conditions of Theorems 1 and 4.

Q: What do Theorems 1 and 2 (and their semiparametric analogues Theorems 4 and 5) establish?

A: Theorem 1 establishes sufficiency: Bayes decisions with any prior in the class Π are asymptotically equivalent to each other and are optimal; any rule asymptotically equivalent to such a Bayes decision is also optimal. Theorem 2 establishes necessity: any rule in the admissible class D that is not asymptotically equivalent to the Bayes rule has strictly higher average excess risk at any µ0 where asymptotic equivalence fails. Together, these theorems fully characterize the class of asymptotically optimal rules and show that the Bayes/bootstrap class is not merely sufficient but also necessary for optimality.

Q: When are plug-in rules sub-optimal, and when are they optimal?

A: Plug-in rules substitute an efficient point estimate µ̂ directly into the oracle decision δo(µ̂). If R(d, µ) is fully differentiable at µ0 for all oracle-optimal actions d, then the directional derivative is linear and plug-in and Bayes rules are asymptotically equivalent; Corollary 1 confirms plug-in rules are then optimal. However, under partial identification, max and min operators in bound constructions — intersection bounds, linear program value functions, revealed-preference bounds — generically induce only directional (non-linear) differentiability of R(d, µ). In these cases asymptotic equivalence can fail, and Theorem 2 implies plug-in rules are sub-optimal. Manski (2021, 2023) documents poor finite-sample performance of plug-in rules numerically; the authors’ necessity result provides a general theoretical explanation under the asymptotic average risk criterion.

Q: How does the treatment assignment empirical illustration demonstrate the difference between optimal and plug-in rules?

A: Using data from Ishihara and Kitagawa (2021) with K = 14 RCT studies from Card, Kluve, and Weber (2017) and Lipschitz constant C = 0.25, the decision is whether to adopt a job-training program for German male youths or female youths in 2010 (GDP growth 3.48%, unemployment 9.45%). For male youths, the largest lower bound value µ̂k − C‖x0 − xk‖ is −0.3190 (US study) and the second-largest is −0.3298 (Brazilian study), separated by only 0.0108 against an average standard error of 0.034 across studies, so the lower bound distribution is right-skewed (maximum of two near-tied Gaussians). This right-skew pushes the quasi-posterior mean b̄n positive, yielding a treatment recommendation, while the plug-in value b(µ̂) is negative, yielding a non-treatment recommendation — a concrete reversal of the policy decision. For female youths, the minima and maxima are better separated, the distribution is near-Gaussian, and b̄n ≈ b(µ̂), so both rules agree on treatment.

Q: What are intersection bounds and why do they generate directional differentiability?

A: Intersection bounds arise when the ATE is bounded in K separate observational studies by lower bounds bL,k(µk) and upper bounds bU,k(µk). The combined identified set uses bL(µ) = max_{1≤k≤K} bL,k(µk) and bU(µ) = min_{1≤k≤K} bU,k(µk). Even if each component bound is smooth in µk, the max and min operators make bL and bU only directionally differentiable (not fully differentiable) in µ. The directional derivative is positively homogeneous of degree one but non-linear, which is the property that drives the wedge between Bayes and plug-in rules.

Q: How does the paper extend to semiparametric models, and what technical tool does it use?

A: In semiparametric models, the data distribution depends on both µ ∈ R^K and an infinite-dimensional nuisance parameter η. Integrating over local perturbations of η as well as µ raises measure-theoretic problems in infinite-dimensional spaces. The authors instead restrict attention to local perturbations of µ0 within a least favorable parametric submodel, which is the direction that makes the problem hardest. The quasi-posterior N(µ̂, (nÎ)−1) is then used as the averaging distribution, combining a Gaussian quasi-likelihood with a prior for µ. Theorem 4 establishes optimality and Theorem 5 establishes necessity under these semiparametric conditions, mirroring the parametric Theorems 1 and 2.

Q: What is σ-optimality and why is it needed?

A: When the dimension K of µ exceeds 1, the integrated average excess risk criterion R({δn}; µ0) — which integrates over Lebesgue measure on R^K — may be infinite for all decision sequences in D, making the criterion uninformative. σ-optimality approximates the improper Lebesgue prior on h by a sequence of proper priors indexed by σ, and requires that the decision rule minimize the resulting criterion for all σ. Theorem 3 shows that the limiting behavior of σ-optimal rules coincides with that of the Bayes rule δ*n(·; π), preserving the practical implementation.

Q: How is the optimal pricing application structured and what role do revealed-preference bounds play?

A: A monopolist observes repeated cross-sections of individual demands across B budget sets and must choose a price vector from D = O ∪ C, where O contains observed prices and C contains counterfactual prices. For observed prices, average demand is identified; for counterfactual prices, only bounds are available. Following Kitamura and Stoye (2019), the space of goods is partitioned into GARP-compatible regions, and sharp bounds on counterfactual demand are computed by solving linear programs over the mass allocated to each region subject to GARP consistency constraints. The reduced-form parameter µ collects empirical choice probabilities across observed budget-region cells, estimated consistently by sample frequencies. The optimal pricing decision averages the linear-program bound solutions across quasi-posterior draws of µ.

Q: How does this approach relate to minimax and conditional Γ-minimax approaches?

A: Full minimax over (θ, µ) requires strong distributional assumptions and tractable finite-sample distributions; the authors note that no minimax treatment rule exists even for binary treatment with binary outcomes and estimated bounds. Conditional Γ-minimax (DasGupta and Studden, 1989; Giacomini, Kitagawa, and Read, 2021) fixes a prior for µ and takes minimax over the set of priors for θ conditional on µ; this is closely related to the authors’ approach but can be conservative when the marginal prior for µ varies. The authors’ framework fixes the marginal prior for µ and takes minimax over θ ∈ Θ0(µ) conditional on µ, which is shown to arise as the equilibrium of a two-player zero-sum game where adversarial nature chooses a prior for θ ∈ Θ0(µ) conditional on µ and the available data for µ.

Q: What is the technical contribution regarding directionally differentiable functions?

A: Hirano and Porter (2009) derived asymptotic optimality for treatment rules under fully differentiable welfare contrasts. This paper extends that theory to settings with directional (but not full) differentiability — a generic feature whenever bounds involve max/min operators or linear program values. The key technical building block is the asymptotic distribution of the quasi-posterior mean of directionally differentiable functions (Propositions 2 and 3 in Appendix C). While Kitagawa, Montiel Olea, Payne, and Velez (2020) characterized the asymptotic behavior of the posterior distribution of such functions, this paper instead characterizes the frequentist distribution of the posterior mean — a distinct and novel contribution to the literature on asymptotics for non-smooth functions (Dümbgen, 1993; Fang and Santos, 2019).

Q: What are the key scope conditions and limitations of the optimality results?

A: The action space D must be finite and discrete (continuous pricing must be approximated by a grid of whole-currency units, as noted in the introduction). The identified set mapping Θ0(·) must be known. Efficient estimation of µ is required, along with a consistent estimator of its asymptotic variance for quasi-Bayes implementation. The optimality criterion assumes “no first-order ties” — the oracle decision must be unique at µ0. The framework is asymptotic (local perturbations around a fixed µ0), and the theory is designed for settings where deriving exact finite-sample optimal rules is intractable. The results do not cover the case where θ affects the data distribution (only payoffs are partially identified, not identification of µ itself).

Partially-identified parameter (θ): A structural parameter — such as the ATE in a target population — about which the data can establish only set membership θ ∈ Θ0(µ), not a point value. The identified set Θ0(µ) is indexed by the point-identified reduced-form parameter µ.

Oracle decision (δo(µ)): The infeasible first-best decision that minimizes maximum risk over the identified set Θ0(µ) for a known value of µ. It serves as the benchmark against which practical rules are evaluated; any data-dependent rule can only do weakly worse.

Maximum risk (R(d, µ)): The supremum of risk r(d, θ, µ) = Eθ[l(d, Y, θ, µ)] over all θ ∈ Θ0(µ) conditional on µ. Under the regret criterion for binary treatment, R(0, µ) = (bU(µ))+ and R(1, µ) = −(bL(µ))−.

Robust welfare contrast (b(µ)): In the treatment assignment application, b(µ) = (bU(µ))+ + (bL(µ))−, whose sign determines the oracle decision: treat if b(µ) ≥ 0. The optimal rule replaces b(µ) with its quasi-posterior mean b̄n.

Directional differentiability: A function f : M → R^k is directionally differentiable at µ0 if limits of (f(µ0 + tn hn) − f(µ0))/tn exist for all sequences tn ↓ 0 and hn → h, yielding a directional derivative ḟµ0[·] that is positively homogeneous but not necessarily linear. Max/min operators and linear program value functions are generically only directionally differentiable, not fully differentiable. This property is what causes plug-in rules to fail.

Quasi-posterior: In semiparametric models, a posterior-like distribution for µ formed by combining a limited-information Gaussian quasi-likelihood N(µ̂, (nÎ)−1) with a prior π, yielding πn(µ|Xn) ∝ exp(−½(µ − µ̂)T(nÎ)(µ − µ̂))π(µ). Used in place of a full Bayesian posterior when the exact likelihood of the data-generating process is unavailable.

σ-optimality: An optimality concept that replaces the improper Lebesgue prior on local perturbations h ∈ R^K with a sequence of proper priors indexed by σ, used when the average excess risk criterion is infinite for K > 1. Theorem 3 establishes that the σ-optimal decision rule converges to the Bayes rule as σ → ∞.

Plug-in rule (δplug_n): A decision rule formed by substituting an efficient point estimate µ̂ directly into the oracle decision: δplug_n = δo(µ̂). Optimal when R(d, µ) is fully differentiable (Corollary 1), but generically sub-optimal under partial identification because directional differentiability of R(d, µ) breaks the asymptotic equivalence between the plug-in and Bayes rules.

The Effect of Omitted Variables on the Sign of Regression Coefficients

Mon, 01 Jan 0001 00:00:00 +0000

Masten and Poirier demonstrate a previously unrecognized asymmetry in the coefficient stability literature: depending on how omitted variable bias is measured, it can be substantially easier for omitted variables to flip a regression coefficient’s sign than to drive it to zero. The paper focuses specifically on Oster (2019b), a widely used robustness framework with approximately 5,500 Google Scholar citations as of December 2025, and shows that Oster’s sensitivity parameter δ — commonly interpreted as the ratio of selection on unobservables to selection on observables — exhibits a structural problem when used to assess sign robustness.

The core theoretical result (Theorem 2) is that, in Oster’s sensitivity analysis, the sign change breakdown point is bounded above by 1 for any value of R²_long. Since researchers typically treat |δ| = 1 as the cutoff for a robust result, this implies that no empirical result is robust to sign changes under Oster’s framework, even when the explain away breakdown point is far larger than 1. The mechanism is a vertical asymptote in the identified set for βlong that occurs precisely at δ = 1, arising from near multicollinearity between the treatment X and the covariates. At this asymptote, the bias-adjusted estimand becomes discontinuous: βlong can jump from a positive to a negative value as δ crosses 1, even when δ is changed by a negligible amount.

The paper illustrates this with the bias-adjusted estimand formula. Under Oster’s Proposition 1 (which requires δ = 1 plus an auxiliary proportionality assumption), the point estimate for the social capital application is 0.532. But if δ = 1 without the auxiliary assumption, the identified set becomes {−0.0855, 1.8947}. For δ = 0.99, the identified set includes {−18.66, −0.0868, 1.736}. The baseline OLS estimate is 0.17, and the explain away breakdown point (correct) is −32.0, while the sign change breakdown point is only 0.586 — well below the conventional robustness threshold of 1.

The authors propose a modified robustness measure that adds Assumption A5: an explicit bound M on the magnitude of omitted variable bias (|βlong − βmed| ≤ M). Under this restriction, the sign change breakdown point can exceed 1, making robust sign conclusions possible. The choice of M requires substantive justification by the researcher.

Two meta-analyses covering 58 empirical papers document the practical extent of the problem. For papers published in top-five journals from 2019–2021 that cite Oster (2019), the median explain away breakdown point is 2.65, while the median sign change breakdown point (with M = 10|β̂med|) is 1.15 and without the M restriction is 0.96. At the 90th percentile, the explain away point is 13.22, while the sign change point (M = 10|β̂med|) is only 1.66. Across both meta-analytic samples, more than 50% of regressions require that the sign of βlong must be assumed a priori in order to interpret the explain away breakdown point as evidence of sign robustness.

Scope conditions: The results apply specifically to Oster’s linear regression coefficient stability framework under the assumption of exogenous controls (cov(W1, W2) = 0, Assumption A4). The authors note this exogeneity assumption is strong in many applications. The paper does not claim the results extend to other sensitivity analysis frameworks (e.g., Cinelli and Hazlett 2020). The methods are implemented in the companion Stata module regsensitivity.

Q: What is the central finding of the paper?

A: The sign change breakdown point for Oster’s δ is bounded above by 1 (Theorem 2), regardless of how large the explain away breakdown point is. Since |δ| = 1 is the conventional robustness threshold, this implies that, under Oster’s framework, no result is ever robust to a sign change. The explain away breakdown point can simultaneously be very large — e.g., −32.0 in the social capital application — while the sign change breakdown point is only 0.586.

Q: What are the two kinds of breakdown points the paper distinguishes?

A: The explain away breakdown point answers: what is the smallest |δ| required for the data to be consistent with a zero causal effect? The sign change breakdown point answers: what is the smallest |δ| required for the data to be consistent with a causal effect of opposite sign? These two quantities are often equal but are not generally equivalent, and the sign change breakdown point can be strictly smaller than the explain away breakdown point.

Q: What is the mechanism behind the sign change breakdown point being bounded above by 1?

A: The identified set for βlong has a vertical asymptote precisely at δ = 1, arising because the sensitivity analysis allows treatment X and the covariates (W1, W2) to approach near multicollinearity. Near this asymptote, omitted variable bias can be arbitrarily large while δ remains close to 1. This discontinuity allows the bias-adjusted estimand to jump across zero — changing sign — even as δ is changed by an infinitesimal amount near 1.

Q: How sensitive is Oster’s bias-adjusted point estimator near δ = 1?

A: Extremely sensitive. In the social capital application, Oster’s Proposition 1 formula (which assumes δ = 1 with the auxiliary proportionality condition) yields an estimate of 0.532. But without the auxiliary assumption, at δ = 1 the identified set is {−0.0855, 1.8947}; at δ = 0.99 it includes {−18.66, −0.0868, 1.736}; at δ = 1.01 it includes {−0.0843, 2.133, 15.64}. These are not minor perturbations — the estimand is discontinuous in δ at the value that Oster’s formula evaluates it.

Q: What modification do the authors propose to recover sign robustness?

A: They propose adding Assumption A5, which bounds the magnitude of omitted variable bias: |βlong − βmed| ≤ M for a researcher-specified M ≥ 0. Under this restriction, the identified set BI(δ, R²_long, M) is intersected with [βmed − M, βmed + M], and it becomes possible for the sign change breakdown point to exceed 1. The practical difficulty is that M must be chosen with substantive justification, and the authors show via meta-analysis that the conventional choice M = |βmed| (equivalent to assuming the sign of βlong is already known) applies to more than 50% of regressions in their sample.

Q: What do the meta-analyses show about the gap between explain away and sign change breakdown points in practice?

A: For 34 primary regressions from top-five journal papers (2019–2021) with R²_long = 1, the median explain away breakdown point is 2.65 while the median sign change breakdown point (M = 10|β̂med|) is 1.15 and without the M restriction is 0.96. At the 90th percentile, the explain away point is 13.22 versus a sign change point (M = 10|β̂med|) of only 1.66. The second meta-analysis (141 regressions from 55 papers, 2008–2013) produces qualitatively similar results.

Q: Why does the paper flag the implicit sign assumption embedded in many applications of Oster’s method?

A: Using the explain away breakdown point as evidence of sign robustness implicitly requires that M = |βmed|, which is equivalent to constraining βlong ∈ [0, 2βmed] — that is, assuming the sign of βlong is the same as the sign of βmed. The paper shows (Table 4) that across both meta-analytic samples, more than 50% of regressions make this implicit sign assumption in order to interpret the explain away breakdown point as informative about sign robustness.

Q: What is δ, and what are its interpretive limitations?

A: δ is the ratio of (cov(X, γ′2,long W2)/var(γ′2,long W2)) to (cov(X, γ′1,long W1)/var(γ′1,long W1)), measuring the relative magnitude of selection on unobservables versus observables. As Cinelli and Hazlett (2020) show, it is a double ratio: the ratio of the treatment-unobservable association to the treatment-observable association, divided by the ratio of their outcome effects. This double-ratio structure leads to counter-intuitive behavior: a single omitted variable that is only modestly related to treatment can produce δ values far from 1 if the observable control is also only weakly related to treatment, even if the omitted variable is not strongly confounding in an absolute sense.

Q: What assumption is required for the entire sensitivity analysis framework, and how restrictive is it?

A: Assumption A4 requires that all observed covariates W1 are uncorrelated with all unobserved covariates W2 (exogenous controls). The authors note this is a strong assumption in many empirical settings. A companion paper (Diegert, Masten, and Poirier 2025a) addresses the case where controls are endogenous.

Q: What do the authors recommend as best practice?

A: They recommend two practices: (1) plotting the full estimated identified set for the coefficient of interest across a range of assumptions about omitted variables, rather than relying on a single bias-adjusted point estimate; and (2) reporting sign change breakdown points as robustness summary statistics in addition to (or instead of) explain away breakdown points. Both are implemented in the companion Stata module regsensitivity.

Explain Away Breakdown Point: The smallest value of the sensitivity parameter |δ| required for the data to be consistent with a zero causal effect (βlong = 0). This is the quantity computed by Oster’s Proposition 2 and commonly reported as “Oster’s delta.”

Sign Change Breakdown Point: The smallest value of |δ| required for the data to be consistent with a causal effect of opposite sign from the baseline estimate. The paper proves this is bounded above by 1 in Oster’s framework, regardless of the magnitude of the explain away breakdown point.

Oster’s δ: The ratio of the regression of treatment X on the omitted variable index (γ′2,long W2) to the regression of X on the observed covariate index (γ′1,long W1), measuring relative selection on unobservables versus observables. Interpreted as a double ratio: (treatment-unobservable association / treatment-observable association) ÷ (outcome effect of unobservable index / outcome effect of observable index).

Identified Set BI(δ, R²_long): The set of values of βlong consistent with the observed data and a given value of δ and R²_long. Characterized as roots of a cubic polynomial. Has a vertical asymptote at δ = 1, meaning the set can include arbitrarily large or small values of βlong as δ approaches 1.

Bias Magnitude Restriction (Assumption A5): A bound M ≥ 0 on the magnitude of omitted variable bias: |βlong − βmed| ≤ M. Adding this assumption intersects the identified set with [βmed − M, βmed + M], allowing the sign change breakdown point to potentially exceed 1 and making sign robustness conclusions possible.

Coefficient Stability Analysis: A class of empirical methods that assess omitted variable bias by comparing regression coefficients across specifications that include different sets of covariates. The intuition is that if adding observed controls substantially raises R² but barely moves the coefficient, further omitted variable bias is likely small. Formalized by Altonji, Elder, and Taber (2005) and extended by Oster (2019b).

Near Multicollinearity (in this context): The situation in which treatment X and the combined covariate vector (W1, W2) are nearly collinear. In Oster’s framework, this arises precisely at δ = 1 and produces the vertical asymptote in the identified set, making the bias-adjusted estimand discontinuous and potentially unbounded near this value.