Forthcoming [Quarterly Journal of Economics] doi:10.1093/qje/qjag021

Codification, Technology Absorption, and the Globalization of the Industrial Revolution

Réka Juhász (University of British Columbia

CEPR

NBER)

Shogo Sakabe (AI Lab

CyberAgent

Japan)

David E. Weinstein (Columbia University

NBER

CEPR)

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

Layer 1: Overview

Research question and motivation: Why did the First Industrial Revolution (IR) spread to Meiji Japan—and to essentially no other non-Western country—during the first wave of globalization? The paper tests Mokyr’s hypothesis that “technical literacy,” i.e., the codification of engineering, commercial, and industrial knowledge in the local vernacular, was a necessary condition for absorbing IR technologies. The motivating puzzle: after opening to trade (1858) and the Meiji Restoration (1868), 80% of Japanese exports were still primary products as late as ~1883 and real per capita GDP growth was only 0.6%/yr (1870-1883/85); then in a brief 13-year window (1883-1896) the manufacturing export share tripled and stabilized at around 60% of exports until WWII.

Data and setup: The authors build several novel datasets. (1) A cross-language measure of codification: scraping national/major libraries and WorldCat for technical books (agriculture, applied sciences, commerce, industry, technology) in 33 languages, 1500-1930. (2) “British Patent Relevance” (BPR): the cosine similarity (TF-IDF, unigrams+bigrams) between the digitized synopses of all British patents 1780-1852 (from Woodcroft 1857) and a hand-curated corpus of 460 English-language 19th-century technical manuals matched to SITC industries. BPR measures the world supply of codifiable IR knowledge by industry and is deliberately not based on what Japan translated (to avoid endogeneity). (3) The first harmonized, bilateral, industry-level trade dataset for the 19th century: 37 regions, 93 industries, quinquennial 1880-1910, built from reporting countries Japan, US, Belgium, Italy. Outcomes are annualized industry export growth ({1880,1885} to {1905,1910}) and, in robustness, productivity/comparative-advantage growth following Costinot et al. (2012) and Amiti-Weinstein (2018).

Main findings (with magnitudes): A Japanese industry with a one-standard-deviation higher BPR experienced annual export growth ~12 percentage points faster and annual productivity (comparative-advantage) growth ~1.2 percentage points faster (coefficients 0.121*** and 0.012***). Cross-sectionally, the BPR-growth relationship is positive and significant only for Japan and other codifying countries: for non-Japan regions the BPR coefficient is negative (-0.030***), while English-, French-, and the “top-4 codified” (English/French/German/Italian) regions show positive coefficients (0.042**, 0.032**, 0.078***), smaller than Japan’s. Low-income and Asian regions tend negative (divergence), not always significant. Time-series: regressing Japanese export growth from 1875 to varying end-years, the BPR coefficient is negative/significant in the 1875-1880 placebo window (Japan resembled the periphery), flips around 1890, and is positive and significant at 1% by 1895—coinciding with Japan’s catch-up in codification.

Mechanism and the Meiji “natural experiment”: In 1870, 84% of all technical books were in four languages (English, French, German, Italian); an Arabic-only reader had access to just 71 technical books. Japan started ordinary but codified explosively: technical-book growth jumped from 1.6%/yr (1600-1860) to 8.8%/yr (1870-1900); translated technical books rose from 8 (1500-1860) to 608 by 1900; Japanese technical books in the NDL grew from 706 (1880) to 2,823 (1890). State provision solved a public-goods/coordination problem: the government built English-Japanese dictionaries (ETSJ 1862/1866, FSEJ 1871) creating standardized Japanese jargon from Chinese glyphs, and 74% of identified technical-book translators (1870-1885) were government employees. Implication: low-cost vernacular access to technical knowledge was a necessary (not sufficient) condition for IR diffusion; where regions were linguistically/geographically distant from Western Europe, codification required state provision (a Gerschenkronian role for the state).

Layer 2: Deep Dive

What is the identification strategy and the main threats to it?

Two-pronged. (1) Cross-sectional: regress region-industry export growth on BPR interacted with region-group dummies, with exporter fixed effects, exploiting that BPR is global (not Japan-specific) and that Japan was uniquely a codifier in the periphery. If codification is the mechanism, only codifying regions should show a positive BPR-growth link. (2) Time-series: exploit the sharp timing of Japanese codification (two well-demarcated periods—pre vs. post technical literacy in the 1880s) by estimating the BPR coefficient on Japanese export growth from 1875 to rolling end-years. The 1875-1880 window serves as a placebo (Japan not yet literate). Main threat is omitted-variable bias: that BPR is correlated with distance to the technology frontier, fundamental comparative advantage, Meiji institutional reforms, or industry steam-intensity. The cross-section addresses the ‘BPR matters everywhere’ and income/geography confounds; the timing addresses slow-moving confounds (literacy, Tokugawa culture, gradual reforms) since reforms like tax/banking/railroads were mostly in place by 1875, 15-37 years before the BPR effect appears.

How are the cross-section and time-series results distinguished from confounders empirically?

In the cross-section, income terciles (High/Medium/Low) and an Asia dummy are added: no region group replicates Japan’s positive pattern; the poorest and Asian regions show negative (divergence) coefficients. The placebo (1875-1880) yields a negative significant BPR coefficient for Japan itself—identical in sign to non-codifiers—then flips positive/significant by 1895, which conventional ‘opening to trade’ (1858) or ‘Meiji Restoration’ (1868) stories cannot explain because the effect appears 37 and 27 years later, respectively.

What heterogeneity is documented?

Japan’s BPR coefficient is larger (though not always significantly) than that of European codifiers, consistent with Japan having more to learn from British patents as a late industrializer. Among non-codifiers, low-income and Asian regions show negative BPR-growth relationships (divergence). Within codifiers, English- and French-speaking regions individually have positive but smaller and less precisely estimated coefficients; pooling the top-4 codified languages sharpens significance (0.078***). The time-series point estimates for Japan slowly decline after 1900 (not significantly), consistent with Japan shifting to Second Industrial Revolution technologies and becoming less reliant on older IR ones.

What robustness checks are run?

(1) Alternative patent corpora: results are nearly identical using British patents 1853-1879 (full text and AI-summarized) and US patents 1836-1860 and 1861-1879 (coefficients 0.121, 0.116, 0.111, 0.115), though later/US patents lower the R-squared, suggesting the 1780-1852 IR patents best explain Japanese export growth. (2) Productivity instead of exports (Costinot et al. 2012 comparative-advantage growth): qualitatively the same, 1.2 pp/yr for a 1-SD BPR increase, with deterioration in non-codifiers. (3) Confounders: controlling for British-colony status (insignificant) and industry steam-power intensity (French 1860s data) does not affect results. (4) Sample selection: dropping non-manufacturing sectors, excluding Asian destination markets, and dropping major export products (textiles, iron/metal) all leave the results intact, indicating broad-based change.

How does this paper relate to and differ from prior work?

It builds on Mokyr (2011) on ’technical knowledge’/‘access costs’ for European industrialization, extending it outside Europe with a Gerschenkronian twist (state as provider of the codification public good). It contributes to the technology-adoption-lags literature (Comin and Hobijn 2010; ~45-year average lags) by offering a friction explanation. It departs from prior Meiji studies (Sussman-Yafeh 2000; Tang; Morck-Nakamura; Bernhofen-Brown) that found banking, railroads, constitutional/monetary reforms had little measurable growth impact—offering codification as the resolution to ‘what drove the Meiji Miracle,’ consistent with Broadberry et al. (2025) dating Japan’s convergence to ~1890 driven by manufacturing productivity. It also extends the knowledge-codification literature (Dittmar 2011; Brown 2024; Abramitzky-Sin 2014) by linking codified vernacular knowledge directly to industry growth rather than indirect outcomes like city growth.

What are the policy implications and their scope conditions?

Public provision of technical knowledge in the vernacular can relax a critical bottleneck to industrialization, especially for regions linguistically/geographically distant from the technology frontier where the market undersupplies this public good. Scope conditions: codification is necessary but NOT sufficient. The Meiji model required complementary investments—language/jargon standardization, mass education for absorptive capacity (literacy >90% for army conscripts by 1909; ~40% of elementary class time on science), tacit-knowledge acquisition (2,400 hired foreigners providing 9,506 person-years of training; study-abroad missions), and tax capacity (1873 Land Tax Reform). China’s post-1949 codification under Zhou did not yield sustained growth until Maoist policies (Great Leap, Cultural Revolution) ended—’the exception that proves the rule.’

What external-validity evidence is offered beyond Japan?

The Meiji codification model was studied and transplanted by Park Chung Hee in South Korea (took power 1961; KIST; researcher counts rose sharply) and Zhou Enlai in China (premier 1949; Russian-language translation drive with USSR as the ‘Britain’). In 1950, Japan had ~70,000 technical books, China ~1,000, Korea <100; China surpassed 30,000 by the early 1960s. Korea’s per capita income clearly rises after Park; China’s codification did not translate into growth until after 1976. These are explicitly presented as suggestive/non-causal, plus appendix discussions of British India and Late Imperial Russia.

What are notable caveats and measurement choices?

BPR uses British 1780-1852 patent synopses and English manuals deliberately (Britain as IR leader; Japan hired British instructors and used British textbooks; avoids endogeneity from Japanese translation choices). It excludes tacit knowledge and secrecy-protected innovation by design. English codification is likely underestimated (British Library was un-scrapable after a 2023 cyberattack; Library of Congress used instead). German patents/trade data were excluded for coverage/reliability reasons. Linguistic-distance evidence on 1870/1913 GDP is explicitly not interpreted causally. The aggregate growth correlations for Japan, Korea, and China are described as suggestive, not causal.

Key Concepts

Codification (of technical knowledge): The creation of a means of transmitting engineering, commercial, and industrial knowledge—via language creation and written messages (manuals, textbooks, dictionaries)—that does not require direct contact between the knowledge originator and the recipient (Cowan and Foray 1997). In the paper’s sense it is a non-rival public good that the market undersupplies.

Technical literacy / technical knowledge: Following Stevens (1995) and Mokyr, the codified engineering, commercial, and industrial practices a practitioner needs to set up and run modern factory-based manufacturing; the paper measures it as the stock of vernacular technical books (agriculture, applied sciences, commerce, industry, technology), excluding theoretical/hard-science and non-firm subjects like medicine.

British Patent Relevance (BPR): An industry-level measure equal to the cosine similarity (TF-IDF weighted) between the vectorized text of British patent synopses (1780-1852) and the vectorized text of English technical manuals for that industry; it proxies how much codifiable IR knowledge a given industry stood to gain, and is independent of what was actually translated into Japanese.

Access costs: Mokyr’s (2011) term for the cost of obtaining usable technical knowledge; the paper argues vernacular codification (dictionaries, translations) lowered these costs, and that linguistic distance from English/Latin-Greek roots and physical distance from Europe raised them.

Technology absorption / absorptive capacity: The complementary conditions needed to use codified knowledge—prior language/jargon development, literacy and scientific training, and tacit knowledge—all of which the Meiji state invested in (dictionaries, compulsory education, ’live machines’/foreign instructors, study-abroad missions).

Defensive modernization (Gerschenkronian state role): The paper’s reading that an existential external threat aligned the Japanese elite behind aggressive state-led adoption of Western science, casting the state as the critical agent supplying the codification public good in late industrialization—a Gerschenkronian extension of Mokyr applied outside Europe.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.