Macro Paper Warehouse Forthcoming macro & monetary research
Forthcoming [Quarterly Journal of Economics] doi:10.1093/qje/qjaf054

Enlightenment Ideals and Belief in Progress in the Run-up to the Industrial Revolution

Ali Almelhem

Murat Iyigun

Austin Kennedy

Jared Rubin

What this paper finds — and why it matters

This paper tests Joel Mokyr’s claim that Britain’s industrialization was preceded and enabled by a cultural shift — specifically, that Enlightenment ideals produced a “progress-oriented” view of science that diffused to artisans and craftsmen. The central research question is whether and when the language of science became more progress-oriented in the build-up to the Industrial Revolution, and whether this shift was concentrated in volumes directly linked to industrial production.

The authors assemble 173,031 unique volumes printed in England and written in English between 1500 and 1900, drawn from the Hathitrust Digital Library. Because copyright law prohibits downloading full text, they use HDL’s Extracted-Features “bag of words” dataset. After removing duplicates and Latin-language volumes from an initial set of 420,081, they apply Latent Dirichlet Allocation (LDA) with cross-validated perplexity minimization to identify an optimal T=60 topics. Topic-pair co-occurrence analysis identifies three categories — science, religion, and political economy — each anchored by three defining topics. Volume-level category weights are derived by multiplying each topic’s weight by its category coefficient. The resulting classification yields 50,090 science volumes, 102,565 political economy volumes, and 14,124 religion volumes.

Progressive sentiment is measured using a seven-word dictionary (progress, improvement, stride, betterment, advance, rise, amelioration) assembled from thesaurus synonyms for “progress,” manually vetted by all four authors, and restricted to words attested in the Oxford English Dictionary before 1643 (Newton’s birth year). Sentiment for each volume equals the count of progress-dictionary words divided by total word count. An analogous optimism-sentiment placebo dictionary is constructed separately.

Industrial relevance is scored using the digitized indexes of all five volumes of Appleby’s Illustrated Handbook of Machinery (1877–1903); the top industrial root words are crane (weight 51), electr (42), weight (37), rope (27), and cost (27). Each volume receives an industry score equal to the weighted occurrence of industrial root words normalized by volume length.

Three main findings emerge. First, the language of science and religion showed little overlap beginning in the 17th century — that is, the secularization of science predates the onset of industrialization. Science volumes shifted from approximately 40 percent religious content around 1700 to only about 10 percent by 1850, with scientific content rising correspondingly from roughly 40 percent to over 60 percent. This trend was stable from 1650 through 1900.

Second, while scientific volumes became more progress-oriented during the Enlightenment, this progressive shift was concentrated in volumes at the nexus of science and political economy. Volumes of “pure” science were largely neutral with respect to progress sentiment, and those at the science-religion nexus had on average negative progress sentiment. The marginal effect of scientific content on progress sentiment was greatest for volumes mixing science and political economy, and most of the increase in predicted sentiment at that nexus occurred during the 18th century, remaining stable thereafter. A placebo test using optimism sentiment finds the opposite pattern: volumes at the science-political economy nexus were among the least optimistic, while the most optimistic language appeared at the religion-political economy nexus. This rules out the interpretation that the measured shift reflects a general increase in positive affect rather than specifically progress-oriented language.

Third, volumes employing industrial terminology that also sat at the science-political economy nexus were distinctively progressive beginning in the mid-18th century. At the 90th percentile of industry score, predicted progress sentiment at the science-political economy nexus was positive throughout the sample; at zero industry score, it was negative until the mid-18th century. Volumes at the religion-political economy nexus showed modestly positive and time-stable progress sentiment regardless of industry score.

The paper concludes that it was the pragmatic, applied volumes — those bridging science and political economy, written for artisans and a broader literate public rather than for the human-capital elite alone — that embodied the cultural values Mokyr identifies as central to Britain’s industrialization.

Q: What gap in the existing literature does this paper address?

A: Prior work on the cultural deep roots of economic growth rarely tracks how culture changes over time, relying instead on cross-sectional variation or qualitative case studies. Quantitative evidence that the language of science itself became more progress-oriented — and that this change reached beyond elite thinkers to artisans and craftsmen — had not been marshaled before. The paper provides inaugural quantitative support by analyzing 173,031 volumes spanning four centuries.

Q: Why does the paper restrict the progress-sentiment dictionary to words attested before 1643?

A: Words that entered English only after 1643 (Newton’s birth year) could not have appeared in volumes from the early Enlightenment, so including them would bias sentiment scores toward the later part of the sample. The restriction ensures the dictionary is applicable and unbiased across the full 1500–1900 period. The final retained words are: progress, improvement, stride, betterment, advance, rise, amelioration.

Q: How does LDA classify volumes, and how is T=60 selected?

A: LDA treats each volume as a bag of words and derives a Dirichlet distribution such that observed documents are generated by repeated topic sampling. The number of topics T is selected by minimizing perplexity on held-out data via 4-fold cross-validation, rotating training and test sets across folds; this procedure yields T=60 as optimal. Each volume is then represented as a mixture over those 60 topics.

Q: What are the three categories and their anchor topics?

A: Political Economy is anchored by topics on law/public opinion, governance/parliament, and trade/price/labour. Religion is anchored by topics on church/Christian doctrine, God/faith/sin, and virtue/fame/religion. Science is anchored by topics on engineering/steam/electricity, chemistry/acid/heat, and geometry/equations/trigonometry. These three sets of topics were selected for high corpus-wide importance and mutual independence.

Q: What does the finding on science-religion separation imply for timing?

A: The separation of scientific and religious language was already visible by 1600 and firmly established by the mid-17th century, well before the Industrial Revolution conventionally dated to the mid-18th century. This supports Mokyr’s argument that the secularization of science was an Enlightenment-era precursor to industrialization rather than a product of it. The trend remained stable from 1650 through 1900.

Q: How does the progressive sentiment differ between pure science and the science-political economy nexus?

A: Volumes of pure science were largely neutral with respect to progress-oriented language and in some periods showed slightly negative predicted progress sentiment. The science-religion nexus showed consistently negative progress sentiment. By contrast, volumes at the science-political economy nexus showed the highest level of progressive sentiment beginning in the mid-18th century, and most of this growth in predicted sentiment occurred during the 18th century, after which it remained stable.

Q: What does the placebo optimism test show?

A: The optimism sentiment scores are nearly the mirror opposite of the progress scores: the most optimistic language appears at the religion-political economy nexus, while volumes at the science-political economy nexus are among the least optimistic. This dissociation rules out the interpretation that the measured progress-sentiment rise reflects a general shift toward positive language rather than a specific cultural embrace of science as a tool for improving human welfare.

Q: How is the industrial score constructed and what are the most heavily weighted terms?

A: The authors digitized the detailed indexes of all five volumes of Appleby’s Illustrated Handbook of Machinery (1877–1903), restricted to words attested before 1643, and weighted each industrial root word by its index frequency. Each corpus volume’s industry score equals the sum of (word count × index weight) across all industrial words, normalized by volume length, yielding a score between 0 and 1. The top-weighted terms are crane (51), electr (42), weight (37), rope (27), and cost (27).

Q: What is the key result linking industrial scores to progressive sentiment?

A: At the science-political economy nexus, volumes with industry scores at the 90th percentile had persistently positive predicted progress sentiment throughout the sample, while volumes at that nexus with zero industry score had negative predicted sentiment until the mid-18th century. The shift to positive sentiment for high-industry volumes at this nexus occurred in the mid-18th century — roughly coinciding with the onset of Britain’s industrialization — and those volumes remained the most progress-oriented in the corpus thereafter.

Q: What is the paper’s interpretation of the science-political economy nexus finding in relation to Mokyr?

A: The authors interpret volumes at the science-political economy nexus as pragmatic, applied works aimed at a broader literate audience including artisans and craftsmen, not exclusively the human-capital elite. These are precisely the volumes Mokyr’s “Industrial Enlightenment” thesis predicts would carry progress-oriented cultural values into the mechanical and artisanal pursuits that drove industrialization. The finding that pure-science volumes were not especially progressive, while applied volumes bridging science and political economy were, is consistent with Mokyr’s argument that it was the diffusion of Enlightenment ideals to skilled practitioners — not just to elite scientists — that mattered.

Q: What qualitative examples support the quantitative findings?

A: Martin Clare’s The Motion of Fluids (1735) explicitly addresses “the Unlearned” and states in its preface that the work is meant to be “of singular Use and Benefit to Mankind” — a direct expression of the progress-oriented language the algorithm detects. George Stephenson’s 1831 railway report argues that rail infrastructure would allow Ireland to “reciprocate with England and with other nations, the products of industry,” exemplifying how progress-oriented language pervaded industrial writing by the early 19th century. These examples confirm that the high progress-sentiment scores for industrial volumes at the science-political economy nexus reflect genuine rhetorical content, not measurement artifacts.

Q: What are the paper’s limitations regarding early sample periods?

A: The corpus is thin in earlier eras, particularly around 1550, so results from the earliest decades must be interpreted with caution. The HDL data derive from digitized scans with OCR output of very old books, introducing errors such as the “long-S” misread (e.g., “juftice” for “justice”) that require manual correction. Additionally, the bag-of-words model discards word order, which may obscure some semantic distinctions.

Q: What future research directions do the authors identify?

A: The authors propose applying the same textual analysis techniques to test whether English-language volumes began reflecting greater freedom of expression in the run-up to Britain’s economic takeoff, connecting to the literature on European political fragmentation and the marketplace of ideas. They also suggest applying the approach to corpora in other languages — Dutch (following McCloskey’s argument about bourgeois values) and Spanish (to examine whether the Counter-Reformation and Spain’s economic lag are reflected in cultural attitudes toward progress and science).

LDA (Latent Dirichlet Allocation): An unsupervised generative statistical model that treats each document as a bag of words and extracts latent topics as multinomial distributions over vocabulary; used here to reduce 173,031 volumes to mixtures of 60 topics without imposing prior scholarly interpretations.

Progressive Sentiment Score: The fraction of words in a volume belonging to a seven-word dictionary of progress synonyms (progress, improvement, stride, betterment, advance, rise, amelioration), normalized by total word count; measures the cultural orientation toward the betterment of humankind as embedded in text.

Industrial Score: A volume-level measure equal to the weighted count of industrial root words — derived from the indexes of Appleby’s Illustrated Handbook of Machinery (1877–1903) — normalized by volume length; captures the degree to which a volume’s vocabulary overlaps with industrial production terminology.

Science-Political Economy Nexus: The region of the topic simplex where volumes carry substantial weight in both the science and political economy categories but low weight in religion; the paper finds this is where progress-oriented language was most concentrated from the mid-18th century onward, interpreted as applied science aimed at artisans and a broader literate public.

Industrial Enlightenment: Joel Mokyr’s (2009) concept describing the diffusion of Enlightenment ideals about the practical utility of science into the mechanical and artisanal pursuits that drove Britain’s industrialization; the paper provides quantitative support for this thesis by showing that industrial volumes at the science-political economy nexus were distinctively progress-oriented.

Culture of Growth: Mokyr’s (2016) broader argument that a pan-European network of elite intellectuals fostered a progress-oriented view of science — the idea that scientific understanding could improve the human condition — and that this cultural norm, in combination with Britain’s stock of skilled craftsmen, made industrialization possible.

Bag of Words: A representation of text that records only word frequencies within a document, discarding word order; used here both because HDL copyright restrictions prevent full-text download and because it is the input format required by LDA.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.