Forthcoming [Review of Economic Studies] doi:10.1093/restud/rdag045

Markov-Perfect Equilibria in Differential Games—With an Application to Climate Policy

Niko Jaakkola

Florian Wagener

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

This paper by Jaakkola and Wagener addresses a long-standing open problem in the theory of differential games: how to make Markov-perfect equilibria (MPE) well-defined when best-response policy functions are generically discontinuous in the state variable. The paper’s primary contribution is methodological — it introduces discontinuous Markovian strategies into differential games and proves that, under this extension, (i) payoffs can always be computed and (ii) unique best responses exist for almost all strategy profiles of opponents. The authors then apply this framework to derive the entire set of symmetric MPE in a canonical non-cooperative climate mitigation model (van der Ploeg and de Zeeuw, 1992), finding welfare results that are quantitatively large and policy-relevant.

The technical difficulty the paper resolves is that discontinuous policy functions can cause the ordinary differential equation governing state dynamics to lack classical solutions, making payoffs undefined. Prior literature responded either by restricting strategies to continuous functions — which rules out many natural best responses and imposes an unjustified constraint on the strategy space — or by allowing discontinuities only in “admissible” profiles, which makes each player’s strategy set depend on opponents’ choices and thus violates the basic structure of non-cooperative game theory. The authors’ solution is to adopt Filippov solutions (differential inclusions that convexify dynamics at discontinuities), so that a well-defined state trajectory and payoff exist for every strategy profile, not just admissible ones.

The paper’s three main theorems cover existence (Theorem 1), characterization (Theorem 2), and symmetric equilibrium conditions (Theorem 3). Theorem 1 establishes that, given any fixed set of potential jump points, the best-response correspondence maps almost all opponent strategy profiles to a unique Markovian best response — “almost all” in the sense of prevalence on infinite-dimensional function spaces. Theorem 2 provides necessary and sufficient conditions for a strategy to be a best response: it must satisfy the maximum principle where the value function is differentiable, value discontinuities may only occur at jump points of opponents’ strategies where the player cannot unilaterally push the state back to the low-stock side, and the value at any such interface must exceed the static optimum. Theorem 3 translates these into conditions for symmetric Nash equilibrium.

Applied to the van der Ploeg–de Zeeuw climate model — N symmetric countries choosing emissions a_i, with carbon stock x evolving as x-dot = sum(a_i) - deltax, and flow utility u(x, a_i) = a_i - (1/2)a_i^2 - dx — the paper characterizes the complete set of symmetric MPE. The unique continuous globally defined equilibrium (the linear MPE, previously established by Rowat 2007) is shown to be weakly Pareto-dominated by every other MPE with a continuous value function. The best equilibria feature discontinuous strategies that act like stock-conditioned trigger strategies: when the carbon stock falls below a target steady state x, players respond with a discrete upward jump in emissions to rapidly return the economy to x*; when carbon rises above x*, players increase emissions only gradually, creating a threat of drifting to a higher-pollution steady state that disciplines deviations. In a calibrated example with N=10, delta=0.02, rho=0.02, and damage parameter d=0.5, the linear equilibrium steady state is approximately 2.5 times the first-best level, while the best continuous-value MPE steady state is approximately 1.2 times the first-best level. Choosing the best equilibrium rather than the linear equilibrium closes between 50 and 100 percent of the welfare gap to the first-best outcome, depending on initial conditions. The paper also identifies particularly bad equilibria involving value-function discontinuities — coordination failures in which no single country can unilaterally stop the carbon stock from rising past a threshold — that can yield welfare outcomes worse than the linear equilibrium at high carbon levels.

The scope of the methodological results covers differential games with a single state variable and strategies that are real-analytic except at finitely many points. Extension to multiple state variables is left for future work. The climate application is restricted to the symmetric linear-quadratic van der Ploeg–de Zeeuw framework, chosen to facilitate comparison with prior literature.

Q: What is the fundamental technical problem with MPE in differential games that this paper resolves?

A: In differential games with Markovian strategies, best-response policy functions are generically discontinuous in the state variable. Discontinuous right-hand sides in the state dynamics ODE can prevent existence or uniqueness of classical solutions, making payoffs undefined for some strategy profiles. Prior literature either restricted attention to continuous strategies (causing non-existence of best responses to many profiles) or defined “admissible” strategy sets that depend on opponents’ choices (violating non-cooperative game theory structure). This paper resolves both problems for the single-state-variable case.

Q: How does the paper make payoffs well-defined under discontinuous strategies?

A: The paper adopts Filippov solutions — differential inclusions that replace the dynamics at a discontinuity point with a convex hull of the left and right limits. At a “push-push” discontinuity (where dynamics push the state toward the jump point from both sides), the Filippov solution remains at the jump point and flow payoffs are a weighted average of left and right actions. This ensures a well-defined trajectory and payoff for every strategy profile, not just “admissible” ones, restoring the standard non-cooperative game-theoretic structure.

Q: What does Theorem 1 establish, and what does “almost all” mean in this context?

A: Theorem 1 establishes that, for any fixed collection of jump points, each player has a unique Markovian best response to almost every profile of opponents’ strategies. “Almost all” is in the sense of prevalence on infinite-dimensional function spaces (following Hunt, Sauer, and Yorke 1992): the set of profiles for which a unique best response fails to exist is shy (measure-zero analog in infinite dimensions) and nowhere dense. This resolves the long-standing open problem of making MPE well-founded in differential games.

Q: What are the necessary and sufficient conditions for a best response given by Theorem 2?

A: A strategy phi_i is the best response to opponents’ profile if and only if: (i) at all points where the value function is differentiable, the strategy satisfies the maximum principle; (ii) the value function is decreasing in the state (monotonicity); (iii) value discontinuities may occur only at opponents’ jump points where player i cannot unilaterally move the state back to the low-stock region; (iv) at any such interface, the value must be at least as large as the static optimum u(x, a_i)/rho; and (v) the value is differentiable at push-push steady states. These conditions extend the standard maximum principle with local requirements that restrict which discontinuities are possible.

Q: What is the van der Ploeg–de Zeeuw model and why is it used here?

A: The van der Ploeg–de Zeeuw (1992) model has N symmetric countries choosing emissions a_i, with carbon stock evolving as x-dot = sum(a_i) - delta*x, and flow utility u(x, a_i) = a_i - (1/2)a_i^2 - dx. It is linear-quadratic, so a linear MPE exists and is analytically tractable, and prior literature (Dockner and Long 1993; Rowat 2007; Dockner and Wagener 2014) has studied it extensively. The paper uses it as a benchmark to demonstrate that the new methods yield novel and economically important results for even well-understood models.

Q: What is the linear equilibrium and why does it produce poor welfare outcomes?

A: The linear equilibrium phi_L(x) = alpha + beta*x, with beta negative, is the unique continuous globally defined MPE (Rowat 2007). In it, emissions decrease with the carbon stock because each player anticipates that opponents will also reduce emissions when carbon is high. This strategic substitutability creates adverse dynamic free-riding: players try to exploit the fact that high carbon stock will cause opponents to cut back, so each has an incentive to emit more when carbon is low. In the calibrated example, the linear equilibrium steady state is approximately 2.5 times the first-best level.

Q: What do the best equilibria look like, and why do they achieve high welfare?

A: The best equilibria feature a target steady state x* near the first-best level and a discontinuous upward jump in emissions when carbon falls slightly below x*. This threat rapidly returns any carbon reduction back to x*, eliminating the strategic incentive to free-ride on others’ reductions. When carbon rises above x*, emissions increase only slightly, causing the economy to drift slowly toward a higher-pollution steady state — the threat of this bad outcome disciplines overshooting. This mechanism is analogous to a trigger strategy but is conditioned on the stock level rather than on past actions, making it compatible with Markovian strategies.

Q: How large are the welfare gains from the best equilibrium relative to the linear equilibrium?

A: In the calibrated example with N=10, delta=0.02, rho=0.02, and d=0.5, the best continuous-value MPE steady state is approximately 1.2 times the first-best level, compared to 2.5 times for the linear equilibrium. Choosing the best equilibrium closes between 50 and 100 percent of the welfare gap between the linear equilibrium and the first-best outcome, depending on initial conditions. The paper characterizes this as a quantitatively large, first-order welfare improvement.

Q: What are “coordination failure” equilibria and when do they arise?

A: Coordination failure equilibria feature discontinuities not only in the strategy (emission rate) but also in the value function itself. They arise when no single country can unilaterally prevent the carbon stock from rising past a threshold — formally, when N * a_max < delta * x at the discontinuity point. In such cases, if opponents are emitting heavily, no individual country can stop atmospheric carbon from rising even if it emits nothing, making heavy emission a best response. All players following this logic simultaneously produce a self-fulfilling collapse to high emissions. At high carbon levels these equilibria can yield welfare outcomes worse than the linear equilibrium.

Q: What is the paper’s main policy implication for climate negotiations?

A: The paper argues that international climate negotiations should be understood as a coordination problem over which of many MPE is played, rather than as bargaining over a limited cooperative surplus in a dynamic prisoners’ dilemma. Since the best equilibria are self-enforcing (they are Nash equilibria, not cooperative solutions), they do not require external enforcement. The paper suggests effective agreements may involve threshold-based commitments — sharp decarbonisation if a carbon target is met, but acceptance of a substantially higher stabilisation target (e.g., 2.5 degrees C rather than 2 degrees C) if the first target is missed — to create the discontinuous strategic incentives that support good equilibria.

Q: How does the paper handle the previously identified “local MPE” that could not be extended to the entire state space?

A: Prior work (Dockner and Long 1993; Rubio and Casino 2002; Dockner and Wagener 2014) constructed nonlinear equilibria that were only locally defined, and the validity of such equilibria was questioned (Rowat 2007; Bernhard 2024) because they were undefined on the full state space. The present paper’s framework allows discontinuous strategies, so these locally defined equilibria can be extended into globally defined, discontinuous MPE. Most previously discovered equilibria are shown to be nested within the larger set of all symmetric MPE identified here.

Q: What mathematical tools are used to prove the main results?

A: The proofs rely on the theory of viscosity solutions to Hamilton-Jacobi-Bellman equations (Bardi and Capuzzo-Dolcetta 2008), building on and extending results of Barles, Briani, and Chasseigne (2013, 2014) on optimal control with discontinuous dynamics. A key departure from Barles et al. is that the paper cannot assume controllability of the dynamics near discontinuities without imposing undue restrictions on opponents’ strategies. The application of these results to a fixed-point condition of the best-response correspondence to construct MPE conditions is described as entirely novel.

Q: What are the scope conditions and limitations of the methodological results?

A: The main results (Theorems 1–3) apply to differential games with a single state variable and strategies that are real-analytic except at finitely many points with one-sided derivatives everywhere. The climate application is further restricted to the symmetric linear-quadratic van der Ploeg–de Zeeuw framework. Extension to multiple state variables is acknowledged as future work. The welfare calibration results are specific to the parameter values N=10, delta=0.02, rho=0.02, d=0.5.

Markov-perfect equilibrium (MPE): A Nash equilibrium in Markovian strategies, where each player’s strategy conditions only on the current state variable and not on the history of play. The paper makes this concept well-founded in differential games by allowing discontinuous strategies, ensuring payoffs can be computed for all strategy profiles and unique best responses exist almost everywhere.

Filippov solution: A solution concept for ordinary differential equations with discontinuous right-hand sides, which replaces the dynamics at a discontinuity point with a convex hull of the left and right limits. Used in this paper to define well-specified state trajectories and payoffs even when players’ strategies have jumps, eliminating the need to restrict strategy sets to “admissible” profiles.

Discontinuous Markovian strategy: A policy function phi: X -> A that maps the state to an action and is real-analytic except at finitely many points, with well-defined one-sided derivatives everywhere. The key innovation of the paper — allowing such strategies makes differential games well-behaved as standard non-cooperative games while capturing the generically discontinuous nature of optimal policy functions.

Push-push steady state: A steady state at a discontinuity point of a strategy where the dynamics push the state toward that point from both sides. Under Filippov solutions the state remains at such a point, with flow payoffs being a weighted average of left and right actions. Theorem 2 requires the value function to be differentiable at these points in equilibrium.

Coordination failure equilibrium: An MPE featuring discontinuities in both the strategy and the value function, arising when no single player can unilaterally move the state across a threshold. At high carbon levels, if opponents emit heavily, individual emission cuts are ineffective; heavy emission becomes a best response for all, sustaining a self-fulfilling high-emission outcome. These equilibria can yield welfare outcomes worse than the linear equilibrium.

Linear equilibrium: The unique continuous globally defined symmetric MPE in the van der Ploeg–de Zeeuw model, characterized by emissions decreasing linearly in the carbon stock. It involves adverse strategic substitutability — each player reduces emissions in response to high carbon because opponents do likewise — and is weakly Pareto-dominated by every MPE with a continuous value function.

Skiba point: A state at which the optimal policy is discontinuous because the value function has distinct left and right derivatives, corresponding to the boundary between two basins of attraction with different long-run outcomes. In this paper, the steady state of a best equilibrium is a Skiba-type point: below it, emissions jump up to return rapidly to the target; above it, emissions increase only gradually.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.