Forthcoming [Review of Economic Studies] doi:10.1093/restud/rdag022

Normal Approximation in Large Network Models

Michael P Leung

Hyungsik Roger Moon

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

This paper proves a central limit theorem (CLT) for network formation models with strategic interactions and homophilous agents, addressing a foundational inferential gap in the econometrics of large networks. The setting is one where the econometrician observes a single large network — the asymptotic framework sends network size n to infinity — which is the empirically relevant case for most network datasets. The network moments of interest are averages of node-level statistics (1/n) Σ ψ_i, where ψ_i can capture degree, clustering coefficients, or subnetwork counts (triangles, k-stars) that have been used for structural inference in network formation games.

The model is a pairwise-stability network formation game augmented onto a latent-space/geometric-graph structure. Each node i has an i.i.d. type (X_i, Z_i), where X_i is a continuously distributed position vector capturing homophilous attributes. Two nodes i and j form a link if a joint-surplus function V(·) exceeds zero, where V depends on the scaled distance r_n^{-1}‖X_i − X_j‖ between positions, a vector of strategic interaction statistics S_{ij} (functions of neighboring links), node attributes Z_i, Z_j, and an i.i.d. utility shock ζ_{ij}. Homophily enters as a monotonicity requirement: V is decreasing in the distance component, so dissimilar nodes are less likely to link. Sparsity is ensured by setting r_n = (κ/n)^{1/d}, which keeps expected degree asymptotically bounded.

Strategic interactions enter through S_{ij}, which depends on links involving neighbors of i or j (local externalities), generating chains of cross-sectional dependence that are the central obstacle to the CLT. The paper identifies two distinct sources of dependence: (1) link interdependencies from best-response chains, where the realization of one link influences neighboring links; and (2) global coordination in equilibrium selection, where agents may condition on a common signal.

The main technical contribution is adapting “stabilization” conditions from the literature on geometric graphs (Penrose and Yukich 2003, 2008) to the strategic setting. Exponential stabilization (Assumption 5) requires that the radius of stabilization R_i — the smallest neighborhood of i such that ψ_i depends only on nodes within that neighborhood — has a distribution with exponential tails. This bounds the effective dependence neighborhood and provides the weak dependence structure needed for the CLT.

To verify stabilization from primitive conditions, the paper employs branching process theory. The key construct is the “strategic neighborhood” C_i^+, the component of i in the network of non-robust links D (pairs where strategic interactions can change the link outcome). The paper bounds |C_i^+| by a subcritical Galton-Watson branching process: if the mean offspring is below 1 (subcriticality, Assumption 7, stated as ‖h*‖_m < 1), the process is non-explosive and its size has exponential tails, yielding the required stabilization. The subcriticality condition directly restricts the strength of strategic interactions and is the network analog of the condition ‖β‖ < 1 in linear autoregressive models. A second condition (Assumption 8, decentralized selection) requires that equilibrium selection operates independently across disjoint strategic neighborhoods, ruling out global coordination; this holds under myopic best-response dynamics.

For inference, the paper proposes a network HAC variance estimator hat_Σ_n = (1/n) Σ_i Σ_j k(d_{ij}/b_n) hat_ψ_i hat_ψ_j^T, where k(·) is a kernel, d_{ij} is the path distance in A, and b_n is a bandwidth, and a network bootstrap that resamples nodes with replacement. Both are shown to be consistent (Theorem 3). Simulation results with n up to 500, varying strategic interaction strength θ_2 from 0 to 0.5, show that the network HAC estimator achieves nominal 5% rejection rates and 95% coverage for n ≥ 500, while the bootstrap slightly over-rejects in small samples and performance degrades as θ_2 increases.

The scope conditions are explicit: the CLT applies to sparse networks (expected degree bounded), undirected networks with local externalities, models admitting a pairwise-stability equilibrium, and equilibrium selection satisfying decentralization. Extensions to directed or denser networks are left for future work.

Q: What is the primary research question and why does it require new theory? A: The paper asks when sample averages of network statistics — degree, clustering, subnetwork counts — satisfy a CLT in strategic network formation models observed as a single large network. Standard CLT proofs require weakly dependent observations, but strategic interactions generate chains of link dependence of a priori unbounded length, and multiple equilibria allow global coordination, both of which can destroy asymptotic normality. Prior work (Leung 2019b; Menzel 2024) established laws of large numbers but not CLTs, which require stronger conditions.

Q: What is the stabilization condition and why is it the right formulation of weak dependence? A: Exponential stabilization (Assumption 5) requires that the radius of stabilization R_i — the smallest K such that ψ_i depends only on the K-neighborhood of i in the network — has a distribution with exponential tails: lim sup_{w→∞} w^{-η} max{log τ_{b,ε}(w), log τ_p(w)} < 0 for some η ∈ (0,1]. This implies that each node’s statistic depends effectively only on a bounded fraction of the network, making {ψ_i} weakly dependent. The condition is a modification of stabilization conditions from the geometric graph literature (Penrose and Yukich 2003, 2008) adapted to allow strategic interactions.

Q: How does the paper connect the abstract stabilization condition to primitive model conditions? A: The paper defines the strategic neighborhood C_i^+ as the union of one-step network neighborhoods of nodes in i’s component in the non-robust link network D (where D_{ij} = 1 iff the link A_{ij} can be switched by strategic interactions). The size |C_i^+| controls the radius of stabilization. By mapping exploration of C_i via breadth-first search onto a Galton-Watson branching process, subcriticality (mean offspring < 1, i.e., ‖h*‖_m < 1) implies that |C_i^+| has exponential tails, which yields exponential stabilization with η = 1 (Theorem 2).

Q: What is the subcriticality condition and what does it restrict? A: Subcriticality (Assumption 7) requires that the mean interaction-strength measure satisfies ‖h*‖_m < 1, where h* bounds the probability that a given link is non-robust as a function of node attributes. This restricts how strongly the existence of one link influences the probability of neighboring links. The authors explicitly analogize this to the condition ‖β‖ < 1 in linear autoregressive models: both bound the magnitude of “autoregressive” dependence below one to prevent explosive propagation of dependence.

Q: What is the decentralized selection condition and what does it rule out? A: Assumption 8 (decentralized selection) requires that the equilibrium selection mechanism operates independently across disjoint strategic neighborhoods: A_{H_l} = λ_{|H_l|}(r^{-1}T_{H_l}, ζ_{H_l}) for each disjoint strategic neighborhood H_l. This rules out global coordination where agents condition on a common signal (such as the type of a particular node) to jointly select an equilibrium. The condition is satisfied by myopic best-response dynamics and is described as the single-network analog of requiring equilibrium selection to be independent across networks under many-network asymptotics.

Q: What is the structure of the CLT proof? A: The proof has two steps. Step 1 proves a CLT for the Poissonized model where the number of nodes N_n ~ Poisson(n), leveraging results from Penrose and Yukich (2008) for geometric graphs extended to the strategic setting. Step 2 is a de-Poissonization argument that transfers the Poissonized CLT back to the fixed-n model. The abstract CLT (Theorem 1) requires Assumptions 5 and 6, and Theorem 2 establishes that Assumptions 1–8 imply Assumption 5 with η = 1.

Q: How does the network HAC estimator work and what are its consistency conditions? A: The estimator is hat_Σ_n = (1/n) Σ_i Σ_j k(d_{ij}/b_n) hat_ψ_i hat_ψ_j^T, where d_{ij} is the path distance between i and j in the observed network A, k(·) is a kernel function, b_n is a bandwidth, and hat_ψ_i = ψ_i(N_n) − (1/n) Σ_j ψ_j(N_n) is the demeaned statistic. Consistency (hat_Σ_n →^p Σ_n) is established under appropriate conditions on the bandwidth b_n (Theorem 3). The bandwidth plays the same role as in time-series HAC estimation, controlling the window over which covariances are summed.

Q: What do the simulations show about finite-sample performance? A: Using a DGP with X_i ~ U([0,1]^2), ζ_{ij} ~ N(0,1), and θ_2 varying from 0 to 0.5 to control strategic interaction strength, the network HAC estimator achieves nominal 5% rejection rates and 95% coverage at n ≥ 500 across all settings. The bootstrap slightly over-rejects in small samples. Performance of all procedures degrades as θ_2 increases (stronger strategic interactions), consistent with the theoretical condition that subcriticality must hold. These results support practical use of the inference procedures based on Theorem 1.

Q: How does this paper relate to prior work on CLTs for network data? A: Kojevnikov et al. (2021) prove a CLT for node-level data conditional on the network, but this does not apply to network formation because the network is the outcome, not a conditioning variable. Leung (2019b) and Menzel (2024) prove laws of large numbers for strategic network formation but not CLTs. Kuersteiner (2019) takes a different approach using a conditional mixingale assumption. The paper’s abstract CLT extends Penrose and Yukich (2008) by modifying the stabilization condition to accommodate strategic interactions; the primitive conditions are new and use branching process tools that build on Leung (2019b).

Q: What network moments can the CLT be applied to? A: The CLT applies to any average of node statistics ψ_i that depends only on the K-neighborhood of i in the network (Assumption 4 with finite K). Explicit examples include average degree (ψ_i = Σ_j A_{ij}), average clustering coefficient, and counts of connected subnetworks such as triangles and k-stars. Subnetwork counts have been used as the basis for structural identification and estimation of network formation games (Sheng 2020), making the CLT directly applicable to inference in those models.

Q: What are the scope limitations and directions for future work? A: The CLT applies to sparse undirected networks with local externalities (Assumption 2), homophily in positions (Assumption 1), and equilibrium selection satisfying decentralization (Assumption 8). It does not cover directed networks, denser networks where expected degree grows with n, or models with global link externalities. The authors identify extending results to directed and denser networks and developing more powerful inference procedures exploiting network structure as priorities for future work.

Stabilization (exponential): The condition that the radius of stabilization R_i — the smallest neighborhood of i beyond which ψ_i does not depend on further nodes — has a distribution with exponential tails (lim sup_{w→∞} w^{-η} log τ(w) < 0 for η ∈ (0,1]). This is the paper’s operative formulation of weak dependence for network statistics and is adapted from geometric graph theory to the strategic setting.

Strategic neighborhood (C_i^+): The union of one-step neighborhoods of nodes in i’s component in the non-robust link network D. A link (i,j) is non-robust (D_{ij} = 1) if strategic interactions can change its realization — i.e., the surplus V can be positive under some interaction configurations and non-positive under others. The size of C_i^+ governs the radius of stabilization and hence the degree of cross-sectional dependence.

Subcriticality (‖h*‖_m < 1): The condition that the mean-field interaction strength measure satisfies ‖h*‖_m < 1, where h* bounds the conditional probability that a link is non-robust. Subcriticality ensures that breadth-first search of the strategic neighborhood is dominated by a subcritical Galton-Watson process (mean offspring < 1), preventing explosive growth of the dependence neighborhood. The paper explicitly frames this as the network analog of ‖β‖ < 1 in autoregressive models.

Decentralized selection (Assumption 8): The requirement that the equilibrium selection mechanism assigns outcomes independently across disjoint strategic neighborhoods: A_{H_l} = λ_{|H_l|}(r^{-1}T_{H_l}, ζ_{H_l}) for each disjoint H_l. This rules out global coordination — agents conditioning on a common signal to select among equilibria — while permitting local coordination within strategic neighborhoods. Satisfied by myopic best-response dynamics.

Pairwise stability: The solution concept underlying the model. A network A satisfies pairwise stability under transferable utility if A_{ij} = 1{V_{ij} > 0}, meaning a link forms exactly when the joint surplus is positive. This is the equilibrium condition from which the strategic interaction statistics S_{ij} and non-robustness indicators D_{ij} are derived.

Network HAC estimator: The variance estimator hat_Σ_n = (1/n) Σ_i Σ_j k(d_{ij}/b_n) hat_ψ_i hat_ψ_j^T, where d_{ij} is the path distance in the observed network, k(·) is a kernel, and b_n is a bandwidth. It is the network analog of heteroskedasticity- and autocorrelation-consistent (HAC) estimators in time series, using path distance in place of temporal lag distance.

Homophily (in this paper’s sense): The property that the joint-surplus function V is decreasing in the first argument r_n^{-1}‖X_i − X_j‖ (scaled positional distance), so nodes that are more dissimilar in position are strictly less likely to form links. Combined with the sparsity scaling r_n = (κ/n)^{1/d}, this ensures that links decay with distance in social space and that the network remains sparse as n grows.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.