Forthcoming [Quarterly Journal of Economics] doi:10.1093/qje/qjae027 Online 1 Sep 2024 · Issue forthcoming

Measuring and Mitigating Racial Disparities in Tax Audits

Hadi Elzayn

Evelyn Smith

Thomas Hertz

Cameron Guage

Arun Ramesh

Robin Fisher

Daniel E Ho

Jacob Goldin

Canonical DOI Free to read · GREEN Open access ↗

What this paper finds — and why it matters

Overview

Research Question. Do Black taxpayers face higher IRS audit rates than non-Black taxpayers, despite race-blind audit selection? And if so, why — and what would mitigation look like?

Data and Methodology. The authors use comprehensive administrative microdata covering approximately 148 million individual income tax returns and 780,627 operational audits for tax year 2014, supplemented with 71,878 research audits from the IRS National Research Program (NRP) pooled over 2010-2014. Because neither the researchers nor the IRS observe taxpayer race, the authors employ Bayesian Improved First Name Surname Geocoding (BIFSG), which imputes the probability that a taxpayer is Black from first name, surname, and Census Block Group. They develop a novel partial identification strategy: two estimators (a probabilistic estimator and a linear estimator) that, under conditions verified using a matched North Carolina voter-registration dataset containing self-reported race, asymptotically bound the true racial audit disparity from below and above respectively. To address the selective labels problem — underreporting is observable only for audited returns — the authors combine operational audit data with NRP random-sample audits to simulate counterfactual audit selection algorithms.

Main Findings.

Magnitude of the disparity. The probabilistic estimator implies a racial audit disparity of 0.81 percentage points; the linear estimator implies 1.34 percentage points. Against a base audit rate of 0.54% for the overall U.S. population in 2014, these bounds imply that Black taxpayers are audited at between 2.9 and 4.7 times the rate of non-Black taxpayers.

Role of the EITC. The disparity is concentrated among EITC claimants. The estimated disparity within the EITC population is 1.96 to 2.90 percentage points, compared to only 0.10 to 0.18 percentage points among non-EITC claimants. In relative terms, Black EITC claimants are audited at 2.9 to 4.4 times the rate of non-Black EITC claimants. A formal decomposition attributes 70-73% of the overall disparity to higher audit rates among Black EITC claimants, 20-21% to racial differences in EITC claiming rates, and 7-8% to differential audit rates among non-EITC filers. Within EITC claimants, 78.5% of the observed audit disparity is attributable to the Dependent Database (DDb) program.

Source of the disparity — algorithmic objective. Using counterfactual audit selection algorithms estimated on NRP data, the authors find that allocating EITC audits to maximize detected total underreporting (from any source) would produce audit rates of 0.74% for Black EITC claimants versus 1.63% for non-Black EITC claimants — reversing the disparity. In contrast, the status quo, which prioritizes detecting overclaimed refundable credits, yields 3.00% for Black claimants versus 1.04% for non-Black claimants. The primary driver is a difference in the types of noncompliance that are more prevalent by race: dependent-claiming errors are more common among Black EITC claimants (dependent error rate of 26.6% vs. 16.3% for non-Black), while the highest underreporting via business income underreporting is disproportionately concentrated among non-Black EITC claimants. An algorithm focused on refundable credit overclaims implicitly targets dependent errors and therefore selects Black taxpayers at higher rates.

Prediction model bias. Even conditional on the refundable-credit objective, the status quo disparity (1.96 p.p.) exceeds the disparity that would arise under an oracle that uses actual rather than predicted refundable credit overclaims (1.08 p.p.), suggesting that prediction errors are unevenly distributed by race. The refundable credit prediction algorithm generates a disparity of 1.75 p.p., approximately 60% larger than the oracle. The authors find suggestive evidence of missingness in birth certificate data (paternal information is disproportionately missing for children claimed on Black taxpayers’ returns) and differential predictive accuracy in the DDb risk score across race.

Operational consequences. Switching the objective from refundable credit overclaims to total underreporting would shift the composition of audited returns from predominantly dependent-eligibility issues (80% of refundable credit oracle-selected returns contain a dependent error) toward business income (86% of total-underreporting oracle-selected returns have business income underreporting). EITC returns with substantial business income (gross receipts above $25,000) cost on average $369.70 to audit versus $23.09 for other EITC returns. Holding the audit rate fixed, the switch would raise average examination costs by nearly an order of magnitude, while also increasing detected underreporting (mean adjustment of $22,578 per return under the total underreporting oracle versus $9,595 under the refundable credit oracle).

Scope Conditions. Results pertain primarily to tax year 2014. The paper finds similar patterns for tax years 2010, 2012, 2016, and 2018. The analysis covers Black versus non-Black taxpayers; disparities for other racial and ethnic groups are not the focus. The selective labels identification strategy relies on the NRP random-audit sample and the bounding conditions verified in the North Carolina matched data.

Q&A

Q1. Why can’t the disparity be attributed simply to Black taxpayers being more likely to claim the EITC, combined with EITC claimants facing higher audit rates generally?

The authors test this directly by estimating racial audit disparities separately within EITC claimants and non-claimants. If differential EITC claiming rates were the full explanation, the within-EITC disparity would be close to zero. Instead, the disparity among EITC claimants (1.96-2.90 p.p.) is larger in absolute terms than the overall disparity (0.81-1.34 p.p.), indicating that Black EITC claimants face substantially higher audit rates than non-Black EITC claimants even holding EITC claimant status fixed. The formal decomposition attributes 70-73% of the overall disparity to differential audit rates within the EITC claimant population, not to differential claiming rates across the population.

Q2. How does the partial identification strategy work, and what are its key identifying assumptions?

The authors derive two estimators of the racial audit disparity that use BIFSG-imputed race probabilities rather than observed race. The probabilistic estimator weights each taxpayer’s contribution by their estimated probability of being Black; it is downward-biased when there is a positive residual covariance between audits and true race after conditioning on imputed race (E[Cov(Y,B|b)] > 0). The linear estimator regresses audit status on imputed race probability; it is upward-biased when there is a positive residual covariance between audits and imputed race after conditioning on true race (E[Cov(Y,b|B)] > 0). When both covariance terms are positive, the probabilistic and linear estimates bound the true disparity from below and above. The authors verify both conditions are positive and statistically significant (p < 0.01) in the matched North Carolina dataset, for the full population and the EITC population specifically.

Q3. Does the racial audit disparity within EITC claimants disappear when comparing taxpayers with similar levels of underreporting?

No. The authors use NRP data to estimate audit rates by race within each underreporting decile among EITC claimants. Within every decile of the underreporting distribution, the estimated audit rate for Black taxpayers exceeds that for non-Black taxpayers. An oracle algorithm that selects returns in descending order of actual underreporting produces an audit rate of 0.74% for Black EITC claimants and 1.63% for non-Black EITC claimants — the opposite of the status quo pattern (3.00% for Black, 1.04% for non-Black). This rules out total-dollar underreporting as the primary driver of the observed disparity.

Q4. Why does focusing audit selection on refundable credit overclaims specifically lead to higher audit rates for Black taxpayers?

Two mechanisms operate simultaneously. First, EITC eligibility is linked to children, so detecting erroneously claimed dependents generates large refundable credit adjustments. The dependent error rate is higher among Black EITC claimants than non-Black EITC claimants (26.6% vs. 16.3% in the probabilistic estimate, or 30.8% vs. 15.4% in the linear estimate). Second, the highest-dollar noncompliance via underreported business income is disproportionately concentrated among non-Black EITC claimants: among EITC claimants in the top 1% of business income underreporting, the probabilistic estimate shows 0.05% are Black compared to 0.21% non-Black. An algorithm aimed at refundable credit overclaims implicitly targets dependent errors and therefore selects Black taxpayers at higher rates; one aimed at total underreporting would prioritize business income underreporting instead and therefore select non-Black taxpayers at higher rates.

Q5. How do the simulated algorithms compare to the actual IRS algorithms?

The authors cannot directly replicate the IRS’s confidential DDb algorithm, but they provide three pieces of evidence that their refundable credit prediction algorithm is a reasonable proxy. First, public governmental documents describe DDb’s stated goal as identifying taxpayers who do not meet refundable credit eligibility requirements. Second, when selecting audits based on predicted refundable credit overclaims using largely the same features available to IRS, the authors generate a disparity (1.75 p.p.) close to the status quo disparity (1.96 p.p.). Third, operational audits of EITC returns are strongly associated with their predicted refundable credit overclaims measure but show a much weaker association with predicted total underreporting.

Q6. What does the status quo disparity exceeding the refundable credit oracle disparity reveal about prediction model design?

The status quo disparity (1.96 p.p.) is approximately 80% larger than the disparity that would arise if the IRS were perfectly informed about actual refundable credit overclaims and selected accordingly (oracle disparity: 1.08 p.p.). The refundable credit prediction algorithm generates a disparity of 1.75 p.p., approximately 60% larger than the oracle. This gap between the oracle and prediction disparity is consistent with prediction errors being distributed unevenly by race. The authors find that birth certificates of children claimed on Black taxpayers’ returns are substantially more likely to be missing paternal identity information, which may reduce the predictive accuracy of the DDb model for this population. They provide suggestive evidence that modifying the predictive features used could reduce the disparity without substantially degrading credit overclaim detection.

Q7. What are the downstream operational consequences of switching the algorithmic objective?

Switching from refundable credit overclaims to total underreporting would shift audited issues from dependent eligibility (80% of refundable credit oracle-selected returns have a dependent error) toward business income (86% of total underreporting oracle-selected returns have business income underreporting). Auditing business income returns is substantially more resource-intensive: $369.70 per return on average for returns with gross receipts above $25,000, versus $23.09 for other EITC returns. Holding the current EITC audit rate fixed, the share of audited returns with substantial business income would rise from 3% to 93%, raising total examination costs by nearly an order of magnitude. However, because total detected underreporting per audited return would also rise substantially (mean of $22,578 vs. $9,595), the increase in detected noncompliance would exceed the increase in audit costs, and the qualitative pattern persists even when accounting for higher per-return costs.

Q8. Is the disparity consistent across years, and is it driven by a particular audit type?

The authors find comparable audit disparities for tax years 2010, 2012, 2016, and 2018, confirming the 2014 results are not year-specific. The disparity is concentrated in correspondence audits: the estimated disparity in correspondence audit rates is 0.804-1.328 p.p. for the full population, while the disparity in field/office audit rates is only 0.010-0.016 p.p. The disparity is present in both pre-refund and post-refund audits, though pre-refund audits show a larger disparity even among correspondence audits alone. Among EITC claimants, the correspondence audit channel is nearly entirely responsible for the group-level disparity.

Q9. What heterogeneity exists within EITC claimants?

The disparity is especially pronounced among unmarried male EITC claimants with dependents: among this subgroup, the audit rate for Black men exceeds the audit rate for non-Black men by more than 4 percentage points, and both are an order of magnitude above the overall U.S. population audit rate. Disparities are smaller among joint filers, unmarried women, and unmarried men without dependents, though the ratio of Black to non-Black audit rates remains substantial across all subgroups. The concentration of the disparity among unmarried men with dependents is consistent with the role of dependent-claiming errors, which are more likely to arise in family structures characterized by nonmarital cohabitation — a pattern more prevalent among Black Americans due to lower marriage rates.

Q10. Can the disparity be attributed to disparate treatment — i.e., race-conscious selection?

The authors rule out disparate treatment for the EITC population. The DDb audit selection process for EITC returns is automated (no manual review), and IRS does not use race or geography as an input into audit selection. The disparity is therefore the product of disparate impact: race-neutral selection criteria interact with racially correlated patterns of tax return characteristics to produce differential audit rates. For higher-income non-EITC taxpayers, where audit selection may involve human classifiers, the authors cannot rule out disparate treatment.

Key Concepts

Audit Disparity (D). Defined in the paper as D = E[Y|B=1] - E[Y|B=0], the difference in audit rates between Black taxpayers (B=1) and non-Black taxpayers (B=0). This is a group-level difference in selection rates, not conditional on any other characteristic, and is the primary estimand throughout.

Probabilistic Disparity Estimator. An estimator that calculates group-specific audit rates by weighting each taxpayer’s contribution by their BIFSG-imputed probability of being Black (or non-Black). It is shown to be downward-biased when E[Cov(Y,B|b)] > 0, i.e., when there is residual positive association between true race and audits after conditioning on imputed race.

Linear Disparity Estimator. An estimator based on regressing audit status (Y) on BIFSG-imputed race probability (b). It is shown to be upward-biased when E[Cov(Y,b|B)] > 0, i.e., when imputed race probability predicts audits even after conditioning on true race. Together, the probabilistic and linear estimators form bounds on the true disparity under conditions verified empirically.

BIFSG (Bayesian Improved First Name Surname Geocoding). A probabilistic race imputation method that uses Bayes rule under a conditional independence assumption (first name, surname, and geography are independent given race) to compute Pr[Black | first name, surname, Census Block Group]. Applied here to all 148 million tax returns; calibrated and validated against matched North Carolina voter registration data with self-reported race.

Selective Labels Problem. The problem that noncompliance (underreporting) is observed only for returns selected for audit, not for the full filing population. In this paper it means the IRS cannot directly observe the underreporting distribution for unaudited returns. The authors address this using NRP random-audit data, which allows estimation of the unaudited underreporting distribution and construction of counterfactual selection algorithms.

Algorithmic Objective. The paper distinguishes between (1) the prediction component of audit selection — which model to use to forecast noncompliance — and (2) the objective component — what type of noncompliance to predict and pursue (overclaimed refundable credits versus total underreporting from any source). The paper finds that the objective, not just prediction error, is an independent driver of the racial audit disparity.

Dependent Database (DDb) Program. The IRS’s primary EITC audit selection program, responsible for approximately 75% of audited EITC returns in 2014. DDb flags returns based on rules, heuristics, and proprietary risk scores, with the stated goal of identifying taxpayers who do not meet refundable credit eligibility requirements. Selection through DDb is fully automated, without human classifier review.

National Research Program (NRP). A stratified random sample audit program through which the IRS conducts near-line-by-line examinations of a small fraction of the filing population each year (approximately 2% of audited returns in 2014). The paper pools 71,878 NRP audits from 2010-2014 to identify the distribution of underreporting in the full EITC filing population and to estimate counterfactual selection algorithms.

How this summary was made. Bibliographic fields are pulled from Crossref and OpenAlex and are not model-generated. The summary was drafted from the open-access manuscript , checked by a claim-grounding and calibration review pass, and approved before publishing. Found an error or a misrepresentation? Flag it here — corrections are welcome, especially from the authors.