Macro Paper Warehouse Forthcoming macro & monetary research
Forthcoming [Review of Economic Dynamics] doi:10.1016/j.red.2026.101349

Optimal Combination of Patent Instruments in a Cumulative-Innovation Growth Model

Davin Chor

Edwin L.-C. Lai

What this paper finds — and why it matters

Layer 1: Overview

This paper develops a tractable general equilibrium model of endogenous growth driven by cumulative innovation, and uses it to characterize optimal patent policy — both for patent breadth (via a “non-infringing inventive step” requirement) and patent length — with a focus on their welfare implications and optimal combination.

The central motivation is that cumulative innovation creates positive knowledge spillovers: each new idea strictly builds on the best existing technology, and the disclosure that patenting requires diffuses knowledge to future innovators. Because private firms do not internalize these spillovers, the decentralized equilibrium features strictly lower R&D investment than the social optimum. The key wedge is an intertemporal spillover effect: firms discount future profits at a rate that includes the hazard of being superseded (rho + lambdavL), while the social planner uses only the pure time preference rate (rho). Appropriability and business-stealing externalities exactly offset each other, so the intertemporal spillover is the sole source of under-investment.

The model has a continuum of differentiated varieties, a single labor input, a Poisson idea arrival process (rate lambda per R&D worker), and productivity improvements drawn i.i.d. from a standardized Pareto distribution with shape parameter theta > 1. The Pareto structure yields the key tractability: the log of the k-th best productivity level is Gamma-distributed with mean k/theta, which allows closed-form welfare expressions. In steady state, all outcomes depend on just three deep parameters: the discount rate rho, the Pareto shape theta, and the innovative capacity lambda*L.

The patent breadth instrument is formalized as a “non-infringing inventive step” (NIS) requirement B >= 1: a new idea must deliver a productivity at least B times the current patent-holder’s productivity to qualify for a patent. Raising B creates two opposing forces. The “profit effect” extends incumbent monopoly duration by reducing the hazard rate of supersession (from lambdavL to lambdavLB^{-theta}), raising innovation incentives. The “hurdle effect” raises the bar an idea must clear to be patentable, reducing the expected return to R&D. These forces generate a non-monotonic (inverted-U) relationship between R&D effort and B (Proposition 2): there is a unique B_v that maximizes the innovation rate, with dv/dB > 0 for B < B_v and dv/dB < 0 for B_v < B < B_0 (the upper bound beyond which no R&D occurs). Explicitly, B_v = [lambdaL / (rho*(theta-1))]^{1/theta}. Proposition 3 further establishes that in economies whose innovative capacity falls just below the threshold for positive growth at B=1, a well-chosen NIS can shift the economy from a zero-growth to a positive-growth steady state.

The welfare-maximizing breadth B_w is shown to be unique, binding (B_w > 1), and strictly below B_v (Proposition 4 and 5). The welfare optimum trades off the dynamic gain from greater innovation against the static consumer surplus loss from higher markup power. Because the dynamic gain is still positive when B < B_v (R&D is still rising) but the static loss grows continuously in B, the welfare maximum necessarily occurs in the region where research is still increasing — i.e., B_w < B_v.

Numerically, at baseline parameters (rho = 0.07, theta = 4, lambdaL = 1), B_w = 1.14 and the equilibrium R&D share is v(B_w) = 0.22, implying an asymptotic maximum real wage growth rate of 4.8%. The optimal breadth is most sensitive to theta (Pareto tail thickness) and less sensitive to rho and lambdaL.

When patent length (Omega) is added as a second instrument, the model yields a sharp result: the welfare-maximizing policy sets Omega → infinity together with B = B_w (Proposition 6). Unlike patent breadth, patent length has no hurdle effect — a longer patent duration raises R&D monotonically (dv/dOmega > 0, Lemma 2). With no diminishing returns to innovation effort in this model (the Poisson arrival rate is proportional to vL), the marginal dynamic gain from extending Omega always strictly outweighs the marginal static loss, so infinite patent length is always superior to any finite length. With Omega = 20 years (the TRIPS standard), the baseline calibration implies B_w = 1.13 and v(B_w) = 0.21 — only slightly below the infinite-length benchmark — suggesting the qualitative infinite-length result has limited quantitative bite for realistic patent durations.

Proposition 7 shows that patent breadth and patent length are policy complements: when patent length is exogenously constrained to a finite value, the welfare-maximizing breadth increases in Omega (dB_w/dOmega > 0). Intuitively, a shorter patent duration weakens innovation incentives, so the optimal NIS compensates by providing stronger breadth protection.

The paper provides a unified rationalization of several empirical puzzles: the weak or negative relationship between patent strength and innovation rates (Sakakibara-Branstetter 2001 on Japan; Bessen-Maskin 2009 on US software) is consistent with B being set above B_v, where the hurdle effect dominates; the causal evidence in Galasso-Schankerman (2014) that patents impede cumulative knowledge accumulation is consistent with the hurdle effect operating at the margin.

Layer 2: Deep Dive

What is the identification strategy, and is this a theoretical or empirical paper?

This is a purely theoretical paper. There is no empirical identification strategy. The core contribution is an analytically tractable general equilibrium model in which the key results (Propositions 1–7) are derived from first-order conditions, comparative statics, and the application of the intermediate value theorem. The Pareto-improvement distribution is the key parametric assumption that enables closed-form expressions for welfare and the growth rate.

What is the key model departure from Kortum (1997) and Eaton-Kortum (2001)?

Kortum (1997) and Eaton-Kortum (2001) model ideas as drawn from a stationary distribution over productivity levels — new ideas may or may not surpass the existing frontier, and as ideas accumulate it becomes progressively less likely that a new draw beats the current best. This generates growth only if the workforce grows. Chor and Lai instead model productivity improvements (ratios Z_{k+1}/Z_k) as i.i.d. Pareto draws, so each new idea strictly improves on the frontier regardless of how many ideas have arrived. This cumulative structure generates endogenous growth with a constant workforce and introduces knowledge spillovers that are absent in Kortum (1997).

What exactly is the ’non-infringing inventive step’ (NIS) and how does it differ from other breadth concepts in the literature?

The NIS requirement B stipulates that a new idea must achieve a productivity at least B times the productivity of the current best patent (i.e., Z_new >= B * Z_current) to be patentable and non-infringing (what the paper calls ’leading breadth’). The paper notes this is distinct from — though related to — patentability requirements studied by O’Donoghue (1998), which focused on the minimum improvement to qualify for a new patent but not necessarily on infringement. It also differs from the Gilbert-Shapiro (1990) and Klemperer (1990) breadth concepts, which focus on horizontal product differentiation (consumer willingness to substitute away from a patent) rather than vertical quality improvements. In the paper’s model, both patentability and non-infringement requirements are captured by a single parameter B, with the simplifying assumption that meeting the B hurdle is both necessary and sufficient for non-infringement.

What are the three externalities in the model, and which one drives the market-planner wedge?

Three externalities are present: (1) The intertemporal spillover effect — firms do not internalize that their innovation raises the knowledge base for future innovators. (2) The appropriability effect — firms capture only private profits, not the full consumer surplus gain from each innovation. (3) The business-stealing effect — each innovator imposes a negative externality on the incumbent patent-holder by eroding their profits. Effects (2) and (3) exactly offset each other in the Pareto specification, so only the intertemporal spillover effect remains. This is verified formally: the market equilibrium condition features a discount rate of rho + lambdavL (including the creative destruction hazard), whereas the social planner’s problem involves only rho. The wedge between v_eqm and v_SP stems entirely from this higher effective discount rate in decentralized equilibrium.

Why is the welfare-maximizing patent breadth strictly less than the innovation-rate-maximizing breadth?

At B_v, research effort is at its maximum, but this is achieved by granting patent-holders maximum protection, imposing the largest static consumer surplus loss. For B between B_w and B_v, increasing B further raises the static loss but no longer raises the innovation rate significantly enough to compensate; in fact for B > B_v, research effort falls while the static loss remains. The welfare optimum trades off the dynamic benefit (higher innovation) against the static cost (monopoly pricing). Because welfare must also account for the static loss at each period, and this loss is already large at B_v, the welfare optimum is achieved at a lower level of protection. Formally, dU_0/dB < 0 for all B in [B_v, B_0), and the unique welfare maximum lies strictly in [1, B_v).

Why is the optimal patent length infinite?

Unlike patent breadth, patent length has only a profit effect and no hurdle effect — a longer patent strictly raises R&D effort (Lemma 2). Moreover, the model has no diminishing returns to innovation effort: the Poisson arrival rate of ideas is simply proportional to the total number of R&D workers at each date (lambdavL), so each additional unit of research labor generates the same expected innovation flow regardless of how much research has already been done. This means the marginal dynamic gain from raising Omega (via increased innovation) is approximately constant, while the marginal static loss (additional consumer surplus ceded per period) is also roughly constant. The dynamic gain always strictly exceeds the static loss as long as the economy can sustain positive R&D (Lemma 1 condition holds), so Omega → infinity is always welfare-improving. This result breaks down if one introduces diminishing returns to R&D (e.g., a fishing-out effect or a congestion externality in research).

Are patent breadth and patent length policy substitutes or complements?

They are policy complements (Proposition 7): when patent length is shorter (e.g., exogenously constrained by TRIPS or ethical considerations), the welfare-maximizing breadth B_w is lower; conversely, a longer patent length calls for a higher optimal breadth. This is because a longer patent length increases the dynamic gain from research, which raises the marginal value of also increasing breadth (since breadth further amplifies the monopoly profit effect). Formally, d^2U^l_0/(dB d Omega) > 0 at B_w, implying dB_w/d Omega > 0 by the implicit function theorem.

What is the quantitative calibration, and what are the key numerical results?

The calibration is illustrative rather than structural. Baseline: rho = 0.07 (matching real stock market returns as in Kortum 1997), theta = 4 (implying expected profits = 25% of per-variety expenditure, since 1/(1+theta) = 0.20 … actually 1/(1+4) = 0.20, with the text stating 1/(1+theta) = 0.25 implying theta=3; the paper states theta=4 gives 1/(1+theta) = 0.20 — there is a slight inconsistency in the text’s wording, but the stated result is 25% of expenditures per variety), lambda*L = 1 (one expected new idea per variety per year). These yield: B_w = 1.14 (infinite patent length), v(B_w) = 0.22 (22% of labor in R&D), and an asymptotic maximum real wage growth rate of 4.8%. The optimal breadth B_w is most sensitive to theta: lowering theta (fatter tail, larger average improvements) raises B_w substantially. Under a finite patent length of Omega = 20, the results change minimally: B_w = 1.13, v(B_w) = 0.21.

How does the model handle the possibility that economies with low innovative capacity might not innovate at all without policy?

When lambdaL < rhotheta, the economy has no R&D in the decentralized equilibrium at B = 1 (v(1) < 0 per equation 22). However, Proposition 3 shows that if lambdaL falls in the intermediate range (rho(theta-1)(theta^2/(theta^2-1))^theta < lambdaL < rho*theta), there exists a range of binding NIS values B > 1 that can shift the economy from zero to positive growth. Setting B = B_v achieves this transition. This is because the profit effect of introducing a binding NIS can more than offset the hurdle effect in this regime, making it profitable for some workers to engage in R&D.

What are the key welfare-improving scope conditions for the NIS policy?

The welfare gain from a binding NIS requires Assumption 1: lambdaL > rhotheta. This ensures the economy already features positive R&D at B = 1, and that the innovative capacity is large enough so the dynamic gains from raising B above 1 exceed the static consumer surplus losses. Without this condition, the NIS may either fail to generate R&D (if lambda*L is very low) or may tip the economy into R&D via Proposition 3’s mechanism, but welfare-optimality of the NIS still requires the economy be in a regime where the profit effect dominates for small B. Additionally, the NIS must remain below B_v to generate any dynamic gain.

How does the model relate to Japan’s narrow patent breadth policy from 1960-1993?

The paper cites Ordover (1991) and Maskus-McDaniel (1999) to note that Japan deliberately adopted narrow patent breadth to encourage more incremental innovation and technology catch-up. In the model’s terms, Japan was setting B close to 1 (or even at 1) to lower the hurdle for new patents, maximizing the number of patentable ideas. This is consistent with a strategy of maximizing the innovation rate (operating near B_v or even below it), potentially at the cost of some dynamic welfare optimization. The Apple v. Samsung example illustrates that the US tends toward broader patent breadth (higher B) than Japan, consistent with the model’s international variation in NIS standards.

How does the paper handle the price markup and profit structure under the NIS?

Under Bertrand competition with limit pricing, the incumbent with the best patentable technology sets price equal to the marginal cost of the second-best technology (the previous patent-holder). The price markup m = Z_k/Z_{k-1} is drawn from a Pareto distribution with shape theta and lower bound 1 (no NIS) or B (with NIS). Flow profits are therefore: Pi = B(1+theta)^{-theta} / [B(1+theta) - theta] … more precisely from equation (19): Pi = [B(1+theta) - theta] * (B(1+theta))^{-1}. As B rises, Pi increases (higher average markups from higher minimum improvement), which is the profit effect. The expected log productivity of the k-th patentable idea is E[ln Z~_k] = k/theta + k*ln(B), confirming that higher B raises not just the probability threshold but also the expected productivity of successful innovations.

What are the limitations and potential extensions noted by the authors?

The authors acknowledge several limitations and propose extensions: (1) The model assumes fully cumulative innovation — each idea strictly builds on the frontier. Generalizing to partial cumulativeness (where some ideas are non-cumulative or only partially built on existing knowledge) is flagged as a natural extension. (2) The analysis is confined to a single-country setting. A multi-country extension would allow study of cross-border patent policy spillovers and optimal international IPR harmonization (e.g., under TRIPS). (3) The model does not allow directed research — firms cannot target specific varieties. Relaxing this could introduce additional policy margins. (4) The model abstracts from imitation threats, which Gallini (1992) shows can make broader patent protection optimal.

How does the paper compare to O’Donoghue (1998) and O’Donoghue-Zweimüller (2004)?

O’Donoghue (1998) shows a patentability requirement can raise social welfare in a partial equilibrium setting, and Hunt (2004) finds an inverted-U relationship between innovation rate and requirement strength — both echo Chor-Lai’s findings. O’Donoghue-Zweimüller (2004) embed patentability in a quality-ladder endogenous growth model but focus more on innovation effects than welfare. The contribution of Chor-Lai relative to these papers is: (i) a fully general equilibrium treatment with explicit welfare analysis; (ii) derivation of both the welfare-maximizing breadth and the innovation-maximizing breadth and proof that Bw < Bv; (iii) extension to jointly optimal patent breadth and length, showing infinite patent length is optimal; and (iv) the Pareto-Gamma tractability that yields closed-form expressions and enables clean comparative statics on three deep parameters.

What robustness checks does the paper provide?

The paper notes in the main text that results are robust to removing the scale effect (the feature that the innovation rate increases in L). An online appendix (referenced but not included in this draft) proves that the main qualitative results — inverted-U in innovation vs. B, unique welfare-maximizing B_w < B_v, and infinite optimal patent length — survive in a model variant without the scale effect. The numerical sensitivity analysis in Section 3.4 also demonstrates robustness of the qualitative findings across wide ranges of rho (0.02 to 0.12) and theta (2 to 6) and lambda*L.

Key Concepts

Non-Infringing Inventive Step (NIS) requirement: A patent policy parameter B >= 1 stipulating that a new idea must achieve a productivity at least B times that of the current best patent to qualify for a patent and be deemed non-infringing. In the paper’s usage, this simultaneously captures both the patentability requirement and the leading breadth (protection of incumbents against near-imitation), and is used interchangeably with ‘patent breadth.’

Cumulative innovation: An innovation process in which each new idea strictly improves upon the existing technological frontier. Formally, the productivity improvement Z_{k+1}/Z_k is drawn i.i.d. from a Pareto distribution with support [1, infinity), so each arriving idea always delivers a strictly positive productivity gain over the current best technology. This contrasts with non-cumulative models (e.g., Kortum 1997) where draws are from a stationary distribution and may fall below the frontier.

Profit effect (of patent breadth): The mechanism by which a higher NIS requirement B reduces the hazard rate that an incumbent patent-holder is superseded (from lambdavL to lambdavL*B^{-theta}), thereby extending the expected duration of monopoly power and raising the value of each patent. This increases R&D incentives by raising expected profits from successful innovation.

Hurdle effect (of patent breadth): The mechanism by which a higher NIS requirement B reduces the probability that any given arriving idea is patentable (probability B^{-theta}), thereby lowering the expected return to engaging in R&D. This discourages research effort and is the force that eventually dominates when B becomes sufficiently large, causing the innovation rate to fall.

Innovative capacity: The product lambdaL, where lambda is the per-worker Poisson arrival rate of ideas and L is the total labor endowment. All steady-state outcomes in the model depend on lambda and L only through this product, not their individual values. It is the key parameter determining whether positive R&D equilibrium exists (requires lambdaL > rho*theta) and the magnitude of welfare gains from patent policy.

Intertemporal spillover externality: The sole market failure driving under-investment in R&D in this model’s Pareto specification. Because the knowledge embodied in each marketed innovation diffuses freely and becomes the base for subsequent cumulative improvements, private innovators do not internalize the benefit their R&D confers on future innovators. This causes firms to use an effective discount rate of rho + lambdavL (including the creative destruction hazard) rather than rho alone, leading to strictly less R&D than the social optimum. Appropriability and business-stealing externalities exactly cancel in this model.

Policy complementarity (breadth and length): The property that the welfare-maximizing patent breadth B_w is increasing in patent length Omega: dB_w/d Omega > 0. When the patent authority is constrained to set a shorter patent length, the optimal breadth should also be narrower, and vice versa. This arises because a longer patent length raises the marginal dynamic benefit of providing stronger breadth protection.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.