Forthcoming [Journal of Monetary Economics] doi:10.1016/j.jmoneco.2026.103926

Comment on: Is it AI or data that drives market power?

Miao Ben Zhang

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

This paper is a published comment by Miao Ben Zhang (USC Marshall School of Business) on Mihet, Rishabh, and Gomes (2025), “Is It AI or Data That Drives Market Power?” Zhang identifies three contributions of the commented paper and benchmarks each against the existing literature, offering targeted suggestions for strengthening the analysis.

The first contribution Zhang discusses is the commented paper’s distinction between raw data, AI capability, and processed data. Raw data is modeled as a by-product of production linearly related to firm size; processed data is modeled as the abundance of signals improving the precision of firms’ next-period productivity predictions. The commented paper’s key modeling innovation is a formula linking raw data (n_{i,t}), firm-level AI capability (z_i), and processed data (n_{i,t}-tilde): processed data equals a weighted sum of an information entropy effect — e^(-z_i) * (-n_{i,t} * ln(n_{i,t})) — and an AI capability effect — (1 - e^(-z_i)) * n_{i,t} * e^(n_{i,t}). Zhang notes this formula implies that the marginal value of raw data can turn negative for firms with low AI capability, consistent with information-theoretic constraints from the rational inattention literature (Sims, 2003). Zhang requests more empirical support for this equation, specifically asking whether low-AI firms exhibit lower TFP than high-AI firms at similar data-intensity levels, and encouraging discussion of existing measures of data-processing ability such as human capital in data engineering and ML pipeline automation.

The second contribution is the commented paper’s modeling of a secondary market for trading processed data among firms. Zhang notes that facilitating processed data markets — for example via APIs or structured knowledge sharing — can, per the commented paper’s simulation and empirical analysis, democratize innovation and reduce market concentration, enabling even low-AI firms to compete. Zhang flags that the paper is silent on firm acquisition as an alternative channel for accessing processed data, arguing this omission is significant given that processed data, unlike ideas or technologies, is less portable and cannot be obtained simply by poaching skilled employees.

The third contribution is the commented paper’s empirical strategy. The commented paper constructs firm-level proxies for AI intensity and data intensity, then exploits two exogenous technological shocks — the advent of AWS cloud computing and transformer-based architectures — to identify causal effects of improvements in compute and processed data accessibility. The evidence shows that compute improvements disproportionately benefit data-rich firms, while processed data access disproportionately benefits low-AI firms. The central empirical message is that access to raw data tends to foster market concentration, whereas access to processed data tends to reduce market concentration. Zhang raises a measurement concern: the commented paper relies on firm-level Herfindahl-Hirschman Index (HHI) calculations based on time-varying, text-based industry definitions (Hoberg and Phillips, 2016). Zhang argues a positive effect on this HHI could reflect either genuine firm growth relative to competitors or reclassification of the firm into different, possibly more concentrated, sectors — making the HHI measure alone insufficient to support claims about product market concentration. Zhang recommends complementing this with industry-level concentration measures anchored to fixed baseline industry codes (FIC codes from Hoberg and Phillips, 2016), constructed at the FIC-year level, following the approach of Gutierrez and Philippon (2017) on industries’ growth and median Q.

No quantitative magnitudes from regressions or calibrations are reported in the comment itself, as this is a discussion piece rather than an original empirical paper. All claims above are drawn directly from the text.

Q: What are the three contributions of Mihet, Rishabh, and Gomes (2025) that Zhang identifies? A: First, the paper explicitly models the distinct roles of raw data, AI capability, and processed data, linking the information entropy literature to firm production. Second, it models a secondary market for trading processed data among firms, relevant for policy on data sharing platforms. Third, it empirically tests the model’s predictions using firm-level proxies and two exogenous technological shocks.

Q: What is the core formula linking raw data, AI capability, and processed data in the commented paper? A: Processed data (n_{i,t}-tilde) equals e^(-z_i) * (-n_{i,t} * ln(n_{i,t})) plus (1 - e^(-z_i)) * n_{i,t} * e^(n_{i,t}), where z_i is firm-level AI capability and n_{i,t} is raw data. The first term captures the information entropy effect (which can reduce or negate the value of raw data for low-AI firms) and the second captures the AI capability effect (where AI turns raw data into abundant useful signals).

Q: Why can the marginal value of raw data turn negative, according to the framework? A: Information-theoretic constraints — long studied through concepts like Shannon entropy and Sims’s rational inattention — imply that unprocessed raw data may harm rather than help firms that lack adequate processing capabilities. Zhang situates this in the broader macro-finance literature on information choice (Sims, 2003; Veldkamp, 2011).

Q: What empirical suggestion does Zhang make regarding the raw data versus AI capability distinction? A: Zhang asks whether, in the commented paper’s sample of publicly-traded firms with measures of data intensity and AI intensity, low-AI firms exhibit lower TFP (following Imrohoroglu and Tuzel, 2012) than high-AI firms when controlling for similar levels of data intensity. Zhang also encourages discussion of anecdotal evidence for negative information entropy effects and of existing measures of data processing ability such as human capital in data engineering, annotation, cleaning, or ML pipeline automation (Abis and Veldkamp, 2024).

Q: What is the policy relevance of the secondary market for processed data? A: The commented paper’s simulation and empirical analysis shows that facilitating processed data markets (e.g., via APIs or structured knowledge sharing) can democratize innovation and reduce market concentration, enabling even low-AI firms to compete. This aligns with recent literature on secondary markets for structured data and foundation model outputs (Gans, 2018, 2024; Conti et al., 2023, 2024; Athey, 2019). Platforms may have incentives to restrict processed data access, potentially reinforcing incumbent power (Carballa Smichowski et al., 2023).

Q: What channel does Zhang argue the commented paper neglects in its analysis of market concentration? A: Zhang argues the paper is silent on firm acquisition as an alternative means by which firms access processed data, noting that processed data is less portable than ideas or technologies — it cannot be obtained simply by poaching a skilled employee. Zhang contends this acquisition channel appears central to the paper’s focus on market concentration and encourages the authors to include a discussion of it.

Q: What is the central empirical finding of the commented paper regarding raw versus processed data and market concentration? A: Access to raw data tends to foster market concentration, while access to processed data tends to reduce market concentration. The evidence shows that compute improvements (proxied by the AWS shock) disproportionately benefit data-rich firms, while processed data accessibility (proxied by the transformer architecture shock) disproportionately benefits low-AI firms, consistent with theoretical predictions.

Q: What is Zhang’s specific concern about the HHI measure used in the commented paper? A: The commented paper constructs firm-level HHI using time-varying, text-based industry definitions (Hoberg and Phillips, 2016). Zhang argues a positive effect on this HHI is ambiguous: it could reflect genuine firm growth relative to competitors or reclassification of the firm into different, possibly more concentrated, sectors. Zhang concludes that the HHI measure alone is not strong enough to support claims about product market concentration.

Q: What robustness check does Zhang recommend for the empirical analysis? A: Zhang recommends constructing industry-level concentration measures at the FIC-year level using fixed baseline FIC codes from Hoberg and Phillips (2016), available at the Hoberg-Phillips Data Library. The authors could then analyze how industries with high versus low average or median AI intensity and data intensity respond to the two technological shocks in terms of concentration. Zhang cites Gutierrez and Philippon (2017) as an example of this approach and notes it would help distinguish within-industry dynamics from shifts in firm business focus, aligning with best practices from De Loecker, Eeckhout, and Unger (2020) on persistent market power.

Raw data: A by-product of firms’ production, modeled as linearly related to firm size; represents unprocessed observations that have not yet been transformed into useful signals. Distinguished from processed data, which is what actually improves productivity predictions.

Processed data: Modeled as the abundance of signals that improves the precision of firms’ predictions of their next-period productivity (following Farboodi and Veldkamp, 2022). Unlike ideas or technologies, processed data is less portable and cannot easily be transferred by poaching skilled employees.

AI capability (z_i): Firm-level ability to transform raw data into processed data. Firms with low AI capability may receive negative marginal value from additional raw data due to information entropy effects; firms with high AI capability extract large gains from the same raw data.

Information entropy effect: The component of the raw-to-processed-data transformation — e^(-z_i) * (-n_{i,t} * ln(n_{i,t})) — that captures the information-theoretic cost of possessing raw data without adequate processing capability. At low AI capability, this effect can reduce or negate the precision of signals.

Secondary market for processed data: A market in which firms trade processed data, modeled in the commented paper as a platform or API-based exchange. The commented paper’s analysis shows this market can democratize innovation and reduce market concentration by enabling low-AI firms to access processed data they cannot produce internally.

Firm-level HHI (text-based): Herfindahl-Hirschman Index calculated using time-varying, text-based industry definitions (Hoberg and Phillips, 2016). Zhang identifies a measurement ambiguity: a positive effect on this measure could reflect genuine competitive gains or reclassification into more concentrated sectors.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.