Shaking Up the Archives: Computational Archival Science Accelerates Historical Research and Digital Readiness

Laurie Robinson - February 20, 2024

Educators train the next generation of archivists in this interdisciplinary field

Image of half closed laptop with glowing red screen.

Custodians of history—archivists —are facing a paradigm shift. Traditional archival processing involves painstaking manual work: poring over documents, deciphering handwriting, and contextualizing. However, the proliferation of digital information and its sheer scale, sometimes described as the digital tsunami, has changed this, requiring novel approaches to managing, preserving, and analyzing records. This is where Computational Archival Science (CAS) comes into play (devised in 2016), blending archival expertise with computational thinking. Thanks to modern computing, what once took archivists years can now be accomplished in months or even weeks.

CAS allows for data mining—extracting patterns and information from vast datasets. This process enables scholars to glean insights across thousands of documents simultaneously, revealing trends and connections that would otherwise remain obscured. Founding CAS partners include: Victoria Lemieux (U. British Columbia), William Underwood (U. Maryland), Mark Hedges (Kings College London), Mark Conrad (AI-Collaboratory), Maria Esteva (U. Texas Austin), and Michael Kurtz (deceased). Lemieux and Hedges have co-hosted a series of eight CAS workshops, the latest of which can be seen here

“CAS is about addressing all aspects of archival work in dealing with the digital tsunami,” says University of Maryland College of Information Studies (INFO) Professor and AI-Collaboratory Co-Founder Richard Marciano.  The latest definition can be found on the CAS portal

“CAS is a transdisciplinary field grounded in archival, information, and computational science that is concerned with the application of computational methods and resources, design patterns, sociotechnical constructs, and human-technology interaction, to large-scale (big data) records/archives processing, analysis, storage, long-term preservation, and access problems, with the aim of improving and optimizing efficiency, authenticity, truthfulness, provenance, productivity, computation, information structure and design, precision, and human technology interaction in support of acquisition, appraisal, arrangement and description, preservation, communication, transmission, analysis, and access decisions.”

Computational Archival Science in the Classroom

A national network of researchers across a dozen institutions, including four iSchools and five cultural partners, are bringing CAS into library and information science (LIS) classrooms, training the next generation of archivists and historians in this interdisciplinary field. Through an Institute of Museum and Library Services (IMLS) grant, the “Piloting Network,” they created a specialized instructional tool designed to nurture this new academic discipline, the Computational Archival Science Education System (CASES). CASES is a platform that serves both instructors and students, providing a rich array of resources that include datasets, computational notebooks, and lesson plans. 

“This project focuses on master ’s-level LIS students to target the professional development of future practitioners across the U.S. The Pilot Network aims at supporting faculty and students by building a community of educators and practitioners, dedicated to modernizing archival and library education. The ultimate goal is to contribute to the development of faculty and library digital leaders,” says Marciano.

In their computational notebooks–an open-source, cloud-based interactive platform called Jupyter Notebooks–students combine software code, computational output, explanatory text, and multimedia resources in a single document. Jupyter Notebooks facilitate interactive learning by allowing students to execute code and see immediate results. This hands-on approach allows students to understand the complexities of data manipulation, analysis, and visualization. Students gain practical experience with computational tools, from digitization processes to metadata schema applications and beyond.

The content students work within their notebooks comes from an online virtual international research network launched in February 2020, the Advanced Information Collaboratory (AIC), focused on: (1) exploring the opportunities and challenges of “disruptive technologies” for archives and records management (including digital curation, machine learning, artificial intelligence), (2) leveraging the latest technologies to unlock the hidden information in massive stores of records, (3) pursuing multidisciplinary collaborations to share relevant knowledge across domains, (4) training current and future generations of information professionals to think computationally and rapidly adapt new technologies to meet their increasingly large and complex workloads, and (5) promoting ethical information access and use.

“Our experience indicates that CAS requires students to ‘get their hands dirty’ with the technology to truly understand it,” says Marciano. “This implies that students should learn by doing lab-based in-class activities and group or individual assignments that help to build their experience and skills in working with and applying technology to a range of archival tasks. A recent example of this is documented by INFO students themselves after taking part in a 15-week graduate course focused on implementing digital curation through hands-on experiential learning.”

Beyond Master’s Education: Training for Professionals and Doctoral Students

INFO is also involved in two other IMLS-funded initiatives: DCIP (Digital Curation for Information Professionals)  and LEADING (LIS Education And Data Science Integrated Network Group).

For students who want to delve deeper into CAS, DCIP is the first professional certificate program designed for them to enhance their digital skills. This online certificate program is nine months and is compartmentalized into three progressive courses. Each course is designed to transition participants from fundamental to advanced levels of digital curation. What sets the program apart is its commitment to practicality, culminating in a hands-on capstone project that allows students to apply their acquired knowledge in real-world scenarios.

The curriculum is diverse and forward-thinking, tapping into various topics crucial for contemporary digital curation. Students delve into the world of synthetic data and generative artificial intelligence (AI), exploring how AI can assist in the creation and management of digital assets. The program addresses the critical skills needed for the digital preservation of legacy file formats, ensuring that vital historical and cultural records are not lost to technology’s relentless march forward. Students learn how to index handwritten text, which remains a significant challenge for archivists dealing with older documents. The course also introduces concepts such as data-driven reparative description, a modern approach ensuring accurate and inclusive portrayals of communities and historical events in data records. 

LEADING, an IMLS-supported initiative, which builds on LEADS, is led by Jane Greenberg at Drexel University and focuses on doctoral students and early and mid-career information professionals. LEADING’s model includes community hubs at the University of New Mexico College of Libraries and Learning Sciences, UC San Diego Library, and OCLC, and partner nodes from leading libraries, archives, and data centers. The AI-Collaboratory is one of 18 member nodes that serve as project mentoring sites, with Greg Jansen leading CAS-related projects.

The DCIP and LEADING initiatives are making a difference in the number of students and professionals impacted and their exposure to computational training. In aggregate, these investments in the future of archival teaching total over $2M at Drexel and INFO.

Towards a National  Educator Network

To address the shortage of CAS educators and the many challenges of providing CAS education, a concerted effort is being made to stimulate a productive interchange of ideas among educators of various disciplines. This dialogue aims to nurture CAS as an emerging transdiscipline, paving the way for a new chapter in archival and library education. The TALENT Network (Training of Archival & Library Educators with iNnovative Technologies) is supporting this through a transformative project funded by IMLS.

TALENT is a pioneering national educator initiative bringing together 25 experts from across the U.S., including archivists, librarians, and educators in LIS, along with historians, learning scientists, cognitive scientists, computer scientists, and software engineers. This collaboration aims to forge a robust, diverse, and multidisciplinary national community with a shared commitment to cultivating digital acumen and leadership abilities among educators specializing in archival and library studies.

INFO partners with Anne Gilliland at UCLA who addresses the social and ethical concerns that arise from computational and algorithmic thinking. The project doubles the existing Piloting Network to now include Kent State (Karen Gracy), University of Missouri (Sarah Buchanan), Clayton State (Joshua Kitchens), Drexel U. (Adelaida Alban Medlock), Indiana University Bloomington (Devan Donaldson), and University of Washington (Melanie Walsh). The project also engages Historically Black Colleges and Universities (HBCU) students in the greater Atlanta community through a pilot program with two HBCUs: Spelman College (Archives: Holly Smith, and Computer Science: Raquel Hill, and Sandrilla Washington), and Clark Atlanta University (Rico Chapman), with coordination support from the Georgia Tech Library (Aisha Johnson). Curriculum development and assessments using Jupyter Notebooks are conducted by Phil Piety (AIC), Andrea Chiba (UCSD), and Rogers Hall and Andrew Hostetler (Vanderbilt U.). INFO partners include Greg Jansen, Mark Conrad, and Michael Kurtz. INFO doctoral students include Lori Perine, Jennifer Proctor, and Rajesh Kumar Gnanasekaran.

Future Directions

“The teaching of CAS is not merely an option but a necessity for the archival profession to stay relevant and responsive to the changing landscape. It will equip professionals with the tools and perspectives needed to navigate the complexities of digital recordkeeping, ensuring that archives remain accessible, trustworthy, and reflective of our evolving society,” says Lemieux and Marciano in an upcoming book chapter entitled “Archival Pedagogies,” edited by James Lowry et al. “This commitment to evolving education in CAS highlights its significance in shaping a future where archives and the archival perspective are more than applicable to the past, but dynamic resources and ways of understanding and engaging with our present world.”