The Dynamics of Verification when Searching for Quality
What this paper finds — and why it matters
This paper develops a dynamic principal-agent model in which a principal seeks to select exactly one project from a stream of possibilities emerging over time, while a biased agent (who wants any project selected, regardless of quality) reports project quality each period. The principal cannot observe quality directly but can pay a cost c to verify it. Monetary transfers are unavailable. The central question is how verification and selection rules should optimally evolve over time as new options arrive.
The model is set in discrete time with an infinite horizon (extended to finite horizons in Section 6.1). Each period, a project of quality h with probability q = λΔ or quality l with probability 1 − q arrives i.i.d. The principal selects at most once; the agent receives utility 1 from any selection and 0 otherwise; the principal’s payoff equals project quality net of verification costs. Both parties share discount factor δ = e^{−ρΔ}.
When verification costs are low (c ≤ h) and the horizon is effectively infinite, the optimal mechanism exhibits decreasing skepticism: verification of high-quality reports occurs with a probability that is strictly declining over time, hitting zero at an endogenous deadline T* = ⌈(1−q)(δr − l) / (qc(1−δ))⌉. At that deadline, the principal selects any project irrespective of quality. Before the deadline, the agent reports truthfully — proposing only high-quality projects — and is incentivized by the threat of verification catching a lie, which triggers permanent exclusion. As the deadline approaches, the agent’s continuation value rises (guaranteed allocation arrives sooner), so the loss from a detected lie grows, and less verification is needed to deter misreporting. The deadline length is weakly increasing in h and r and decreasing in l and c; as c → 0, T* → ∞ and the principal’s payoff converges to the first-best of qh/(1−δ(1−q)).
When verification costs are high (h < c < c̄, where c̄ is an explicitly computed threshold), deterministic selection is suboptimal. The optimal mechanism has two sequential phases: a randomization phase (periods 1 through T_R = ⌊log(h/c)/log(1−q)⌋ + 1) in which the principal randomizes between selecting and never selecting after a high-quality report without any verification, and a subsequent verification phase matching the low-cost structure. Verification is strictly backloaded: the principal never uses both tools simultaneously in the same period, and randomization always precedes verification. The intuition is that verification acts as a reward to the agent (guaranteeing allocation when h is realized), so delaying it allows earlier periods to exploit the prospect of future verification to relax incentive constraints across more periods, accumulating gains that justify the high verification cost.
When the horizon is short (T ≤ T̄ := ⌊−(1−q)l/(qc)⌋) and l < 0 (static bias), increasing skepticism emerges: verification probability rises toward 1 in the final period. This occurs because a shrinking horizon reduces the agent’s continuation value, weakening the punishment for a detected lie, so more verification is required to maintain incentive compatibility. The paper also establishes that under renegotiation-proofness (Ray 1994), the optimal mechanism takes the same qualitative form as the full-commitment case but with permanent exclusion replaced by a mechanism restart. The leading application is board oversight of CEO-proposed acquisitions, motivated by the Smith v. Van Gorkom Delaware Supreme Court ruling; Graham et al. (2020) is cited as broad empirical support for decreasing oversight of CEOs over time.
Q: What is the core agency conflict in the model? A: The agent receives utility 1 from any selection regardless of quality, while the principal’s payoff equals quality minus verification costs. The agent always prefers immediate selection, while the principal prefers waiting for high quality, formalized by the condition qh + (1−q)l < qh/(1−δ(1−q)). This is “dynamic bias.” “Static bias” additionally arises when l < 0, meaning the principal prefers not allocating to allocating a low-quality project; this second source of conflict is more common in static settings.
Q: What is the endogenous deadline T* and what determines its length? A: T* = ⌈(1−q)(δr − l)/(qc(1−δ))⌉. It is weakly increasing in h and r (higher upside makes waiting worthwhile), weakly decreasing in l (a less costly low type shortens the horizon), and decreasing in c (cheaper verification makes longer search feasible). The term δr − l reflects the value of an additional quality draw relative to selecting low quality. As c → 0, T* → ∞ and the principal’s payoff converges to the first-best.
Q: Why does the verification probability decline over time under decreasing skepticism? A: As the deadline T* approaches, the agent’s continuation value from truthful play rises because guaranteed allocation is nearer. The loss from having a lie detected — permanent exclusion — therefore grows in absolute expected terms. Since more severe punishment requires less verification to deter misreporting, the minimum verification probability that satisfies the low type’s incentive compatibility constraint falls strictly over time, reaching zero exactly at T*.
Q: When is randomization of the selection rule optimal, and when is verification strictly better? A: Randomization is optimal if and only if c > h — when verification would guarantee a negative ex-post payoff for the principal. When c ≤ h, replacing randomization probability (1 − p̂_h) with verification probability x_h = 1 − δu_{t+1} maintains incentive compatibility while yielding a net gain to the principal proportional to h − c > 0 per period. The condition c > h is both necessary and sufficient for the randomization-augmented mechanism to dominate.
Q: Why is verification backloaded when c > h? A: Verification guarantees allocation whenever h is realized, which is a valuable reward for the agent. Deploying this reward later allows earlier randomization-phase periods to exploit the prospect of future verification to relax incentive constraints across multiple periods, accumulating gains. Moving verification earlier yields the same static cost but foregoes these accumulated gains; thus backloading verification is optimal. The principal never simultaneously randomizes and verifies in the same period.
Q: What are the two phases in Theorem 2 and how long does each last? A: The randomization phase runs from period 1 through T_R = ⌊log(h/c)/log(1−q)⌋ + 1; during this phase the principal randomizes allocation after a high-quality report (with the outside-option probability declining toward 0) but never verifies. The verification phase runs from T_R + 1 through a deadline at T* or T* + 1, with verification probability declining over time exactly as in Theorem 1. The total deadline is T* = T_R + ⌊(h − c − (l − δr)/(1−δ))(1−q)/(qc)⌋.
Q: Under what conditions does increasing skepticism emerge? A: Increasing skepticism arises when the horizon is finite and short — specifically when T ≤ T̄ = ⌊−(1−q)l/(qc)⌋, which requires l < 0 (static bias present). In this regime, verification probability rises to 1 in the final period T. Before T, the agent’s continuation value shrinks as fewer drawing opportunities remain, weakening the punishment for detected lies, so verification must increase to maintain incentive compatibility. Decreasing skepticism necessarily emerges only given a horizon long enough to overcome static bias.
Q: How does the renegotiation-proofness extension modify the optimal mechanism? A: Under renegotiation-proofness following Ray (1994), the mechanism cannot indefinitely withhold allocation following a detected lie, because both parties would prefer to restart rather than receive zero forever. The optimal renegotiation-proof mechanism takes the same qualitative form as Theorems 1 and 2, but permanent exclusion is replaced by a restart to the first period whenever a lie is verified during the verification phase or allocation is withheld during the randomization phase after a high-quality report. Deadlines, verification dynamics, and the phase structure are otherwise unchanged.
Q: What is the three-region form of the value function? A: Lemma 4 identifies thresholds u_low < u_high such that: for promised utility u ∈ [0, u_low], x_h(u) = 0 (no verification; only randomization); for u ∈ [u_low, u_high], dV/du = h − c (verification is interior, slope equals net benefit of verification); and for u > u_high, x_h(u) + y(u) = 1 (verification is at maximum). The slope h − c is constant on the middle region because increasing verification by ε raises promised utility by qε and the objective by q(h−c)ε, yielding a constant marginal rate.
Q: What revelation-principle simplifications reduce the problem? A: Lemmas 1–3 establish: (i) only high-type reports are ever verified (x_l = 0), since verification of the low type cannot improve principal payoffs; (ii) following verified truthfulness, allocation occurs with probability 1 (p*_{hh} = 1); (iii) the high type’s incentive constraint never binds in the optimal solution; and (iv) only the low type’s incentive compatibility constraint binds. These reduce the optimization to four free variables — x_h, p̂_h, p̂_l, û_l — subject to two binding constraints.
Q: How does the paper relate to Kovac et al. (2013)? A: The model builds most directly on Kovac et al. (2013)’s principal-agent stopping problem, which lacks costly verification. The key addition is the verification technology; the paper shows that when c ≤ h, verification eliminates the need for randomized selection rules that arise in Kovac et al. (2013). Kovac et al.’s randomization logic resurfaces in the randomization phase when c > h, and the analysis applies and extends Kovac et al.’s innovations.
Q: What empirical and institutional evidence motivates the model? A: The Smith v. Van Gorkom Delaware Supreme Court ruling (1985) established that boards must make meaningful efforts to become informed — exercising verification — as part of their duty of care in acquisition approvals; the TransUnion board was found negligent after approving an acquisition following a twenty-minute presentation with no written materials. Graham et al. (2020) provides broad empirical support for decreasing board oversight of CEOs over time, consistent with the paper’s decreasing skepticism prediction. Gompers et al. (2020) on VC analysts’ project evaluation processes also illustrates the general applicability.
Decreasing skepticism: The property of the optimal mechanism whereby the principal verifies high-quality reports with a probability that strictly declines over time, reaching zero at the endogenous deadline. Reflects diminishing concern about misrepresentation as the agent’s continuation value — and thus the cost of a detected lie — rises as the deadline approaches.
Endogenous deadline (T*): The period at which the principal allocates any project irrespective of quality, ending the mechanism. Determined by T* = ⌈(1−q)(δr − l)/(qc(1−δ))⌉, balancing the value of waiting for additional quality draws against verification costs; weakly increasing in h and r, decreasing in l and c.
Static bias vs. dynamic bias: Dynamic bias denotes the conflict that the principal prefers waiting for high quality while the agent prefers immediate selection. Static bias is the additional conflict (arising when l < 0) that the principal prefers withholding allocation to selecting a low-quality project, mirroring the agent-prefers-higher-action conflict in standard static models. Decreasing skepticism necessarily obtains absent static bias; static bias may flip dynamics to increasing skepticism if the horizon is short.
Backloaded verification: The property that when c > h, verification is deployed only after a complete randomization phase, never simultaneously with randomization. Arises because verification acts as a reward to the agent by guaranteeing allocation when high quality is realized, and delaying this reward allows its incentive-relaxation benefits to compound across more randomization-phase periods.
Randomization phase: The initial phase (periods 1 to T_R) in the high-cost regime, in which the principal randomizes the allocation decision after a high-quality report (outside option selected with declining probability) without using the verification technology. The randomization probability is set to keep the low type indifferent between truthful reporting and misreporting.
Increasing skepticism: The opposite verification dynamic from decreasing skepticism, arising when the horizon is short (T ≤ T̄) and l < 0 (static bias). Verification probability rises over time toward 1 in the final period, because the agent’s continuation value shrinks as drawing opportunities dwindle, weakening the deterrent effect of detection and requiring more frequent verification to maintain incentive compatibility.
Incentive compatibility via verification: The mechanism through which the principal deters low-type misreporting: by verifying a reported high-quality project with probability x_h, and punishing detected lies with permanent exclusion (or restart under renegotiation-proofness). This strictly dominates selection randomization when c ≤ h because the net per-period gain equals h − c > 0 while maintaining the same incentive compatibility condition for the low type.