The Case For Applying AI to FOIA Processing

Jason R. Baron - March 15, 2023

How Does Access To Government Records Work When An Agency Holds Hundreds of Millions of Emails?

Graphic image of a computerized file folder with a magnifying glass over it, implying searching for a document

Almost 30 years before the 1966 enactment of the Freedom of Information Act, the National Archives opened a reading room to the public for the purpose of accessing government records.

From the agency’s earliest incarnation in 1934, through to the present day, the laudable mission of the National Archives and Records Administration , has been to “make access happen” (borrowing from the language of NARA’s current Strategic Plan).

Fast forward to the present: after a series of OMB-NARA memoranda issued starting in 2012 setting deadlines for the transition to fully electronic recordkeeping, under M-23-07 agencies now face an extended deadline of June 2024 to (i) manage all federal records in electronic or digital formats, and (ii) begin accessioning into NARA their permanent records created after that date solely in those formats.

Those requirements, coupled with the widescale adoption of NARA’s Capstone approach to email archiving by hundreds of governmental components, all reduce to one thing: larger agencies will be increasingly awash in archiving tens to hundreds of millions of emails along with other forms of electronic records, all of which are subject to FOIA.

Unfortunately, achieving timely access to responsive electronic records is increasingly aspirational when electronic record repositories reach millions in size. There are two fundamental challenges FOIA staff face when attempting to access these vast pools of public records.

First, for reasons that have been well-known in e-discovery legal circles for over a decade, the size of the universe of potentially corporate records, coupled with imprecision and ambiguity of language, makes it difficult to find all or even almost all responsive records to a party’s request for documents simply by using automated keyword searches.

These searches tend to bring back a tremendous number of false positive “hits” that need to be reviewed for possible withholding, while at the same time also miss a large number of potentially responsive records.

And second, in the special case of government records, many are chock full of one or more types of exempt information that must be reviewed before release — including most prominently personal information, including but not limited to what is known as PII (personally identifiable information).

Together, the adequacy of search challenge, and the sensitivities in huge numbers of records, have the potential to cause substantial FOIA delays, over and above all existing, well-known, and vexing FOIA processing issues.

Moreover, the second challenge — especially with respect to the need to withhold personal information — threatens access for many future decades to the permanent records that will be arriving at NARA in substantial quantities after June 2024.

Consider the case of White House email records preserved by NARA as a harbinger of things to come government-wide.

With the advent of the Presidential Records Act of 1978 (PRA), coupled with litigation over White House email (most notably in the landmark PROFS case, Armstrong v. Executive Office of the President, 1 F.3d 1274 (D.C. Cir. 1993)), successive administrations have been preserving White House email records and attachments.

A best current estimate is that over 600 million email records covered under the PRA, consisting of the equivalent of between 1 billion and 3 billion pages of records counting attachments, have been transferred into the legal custody of NARA starting with the Reagan White House. How many of those emails are open and accessible to the public?

The answer is something on the order of less than a tenth of 1% — consisting of emails requested by Congress during the nomination process for Justices John Roberts, Elena Kagan, and Brett Kavanaugh, each of whom generated records while working in a prior White House, as well as emails that have been opened due to other litigation and FOIA requests.

The remaining 99.9% of White House email records remain inaccessible, absent the filing of future FOIA requests and lawsuits.

What can be done to make access happen at agencies on a timelier basis? Again, the e-discovery experience is illuminating.

A form of machine learning, known as “technology assisted review,” is now routinely used by lawyers in the private sector in large, complex cases both to (i) search through hundreds of terabytes of corporate records, and (ii) filter records for possible withholding under grounds of privilege.

Lawyers in general counsel offices at some government agencies already are familiar with and employ this form of advanced search method to greatly reduce the time it takes to find responsive records in litigation; however, few FOIA offices are using this type of software at present.

Additionally, practical AI tools are just around the corner that may be able to do a good job of assisting in segregating at least some types of FOIA exempt material in documents, a development that hopefully will soon make review by FOIA staff and attorneys much more efficient.

The 2018-2020 FOIA Advisory Committee’s Final Report (Recommendation #22) called on the Archivist of the US to work with government agencies, academia and industry to help accomplish the goal of fostering the use of AI in FOIA processing, and NARA should continue to take the lead in doing so.

The problem of delays caused by the increase in the volume of agency records in electronic form is a looming public policy issue that needs to be understood both by the FOIA requestor community and by Congress.

In testimony before the Senate Homeland Security and Governmental affairs Committee in March 2022, I proposed that an AI Advisory Committee be formed to oversee swifter adoption of state-of-the-art technologies to aid in FOIA processing. And of course, agencies and particularly their FOIA staff need the resources to implement improvements in this area.

The paradoxical truth is that the welcome transition to electronic government, heralded for the last decade by NARA, will make searching and reviewing agency records more difficult absent 21st century technology being brought to bear.

Without AI methods being deployed, well-meaning FOIA offices throughout large agencies will simply be overwhelmed with the task of complying with FOIA deadlines and meeting the requirements for reasonable searches being conducted.

For the health of our democracy, we must find the means to be able to provide timely access to the overwhelming percentage of the government’s records (including from the White House) that exist only in electronic form.

Making progress in solving the problem of access through the use of AI methods is a key to ensuring that FOIA continues to provide meaningful access to government records and the history of our Nation.

This post was originally written by Jason R. Baron for the Americans for Prosperity Foundation 2023 Sunshine Week essay series on how government transparency and the Freedom of Information Act have transformed society. Discover more thought-provoking essays on how government transparency and the Freedom of Information Act have transformed society.


Jason R. Baron is a Professor of the Practice in the College of Information Studies at the University of Maryland. He previously served as a trial lawyer and senior counsel at the Department of Justice, and as the first Director of Litigation at the National Archives and Administration. He has served as co-chair of the D.C. Bar’s E-discovery and Information Governance Committee and is a current member of the 2022-2024 FOIA Advisory Committee to the Archivist. Among his awards while in public service, Jason is a recipient of the Justice Tom C. Clark Outstanding Government Lawyer Award from the Federal Bar Association. He is a frequent media contributor on recordkeeping controversies, with appearances on CNN, MSNBC, NBC News, Good Morning America, and NPR’s All Things Considered, and citations in the New York Times, Washington Post, Wall Street Journal, TIME Magazine, and numerous other media outlets. He received his B.A. magna cum laude with honors from Wesleyan University and his J.D. from Boston University School of Law.