SoDa Symposium: Rehabilitation of Open-ends: Creating a Codebook for Open-ends Using Machine Learning Techniques and Human Intervention That Then Can Be Used to Drive Action

Event Start Date: Tuesday, December 13, 2022 - 12:00 pm

Event End Date: Tuesday, December 13, 2022 - 1:00 pm

Location: Virtual EST, Registration Required

A Panel Discussion with Q&A


Open-ends are a well known problem in survey research: language can yield extremely rich responses, including bringing to the surface aspects of a question or issue that the researcher might not have known to look for, but the analysis of text is costly and labor-intensive.  As a result, there is a tendency to include open-ends as an afterthought, to use them minimally, or to avoid them altogether. Computational methods can potentially help, but they often raise concerns about whether the results they provide are as trustworthy and actionable as other kinds of responses.

We will talk about approaches we’ve been taking to the analysis of open-ends, which combine automation with human intervention in order to navigate the balance between automation and trustworthiness.  Two experiments were run independently on the same set of 16,648 responses on Reddit to a question about reasons that people who considered suicide did not end up killing themselves.  The first experiment had human intervention at the start using a machine learning process that included word clouds and TF/IDF techniques to help human coders develop a codebook that was actionable. The second experiment used topic modeling, an unsupervised machine learning approach, to pull out latent categories from the open-ends, which then guided a step-by-step content analysis protocol carried out by subject matter experts to identify category labels and descriptions. We will compare/contrast our results at the symposium and more generally discuss the potential for techniques of this kind to bring open-ends out of the shadows in survey research.


Carol Haney, Head of Research and Data Science, Qualtrics

Philip Resnik, Professor, Institute for Advanced Computer Studies and Department of Linguistics


Frauke Kreuter, Co-Director, Social Data Science Center (SoDa) and Professor, Joint Program in Survey Methodology, University of Maryland

The SoDa Center at UMD:

The powerful information available in large social science data sets is critical to understanding and addressing many of our nation and world’s most pressing challenges: from Covid-19 to racial, social and economic injustice; and from climate change to deep and damaging political and cultural divides. To help address these challenges, the University of Maryland has launched a new Social Data Science Center (SoDa) designed to advance research, education, and applications of social data measurement and analysis. This center leverages UMD’s strengths in survey methods, measurement, information management, visualization, and analytics. Facebook is providing support for the center’s research and education programs over the next three years.

Register for Zoom

Research Talks/Events