(Video) SoDa Symposium: Rehabilitation of Open-ends

Emily Davidson - December 15, 2022

Carol Haney of Qualtrics and Professor Philip Resnik discuss the advantages of open-ends in survey research.

An aerial view of four people working on laptops with an illustrated network overlay

On December 13, 2022, Carol Haney of Qualtrics and Professor Philip Resnik presented “Rehabilitation of Open-ends: Creating a Codebook for Open-ends Using Machine Learning Techniques and Human Intervention That Then Can be Used to Drive Action.”

Open-ends are a well known problem in survey research: language can yield extremely rich responses, such as illuminating aspects of a question or issue that the researcher might not have known to look for, but the analysis of text is costly and labor-intensive. As a result, there is a tendency to include open-ends as an afterthought, to use them minimally, or to avoid them altogether. Computational methods can potentially help, but they often raise concerns about whether the results they provide are as trustworthy and actionable as other kinds of responses.

Haney and Resnik discussed approaches they had been taking to the analysis of open-ends, which combine automation with human intervention in order to navigate the balance between automation and trustworthiness. Two experiments were run independently on the same set of 16,648 responses to a question on Reddit about the reasons that people who considered suicide did not ultimately take their own lives. The first experiment had human intervention at the start using a machine learning process that included word clouds and TF/IDF techniques to help human coders develop a codebook that was actionable. The second experiment used topic modeling, an unsupervised machine learning approach, to pull out latent categories from the open-ends, which then guided a step-by-step content analysis protocol carried out by subject matter experts to identify category labels and descriptions. During the symposium, Haney and Resnik compared/contrasted the results and more generally discussed the potential for techniques of this kind to bring open-ends out of the shadows in survey research.

At Qualtrics, Carol Haney is head of research and data science. Her principal research area is online quantitative research, specifically focusing on best practices around sampling, Total Survey Error, and advanced analytics. Haney currently works with multiple commercial clients, mostly in the financial, health, and tech spaces. She has experience running large survey programs that involve customer experience, segmentation, and performance measurement. In 2015, she was honored by Qualtrics as the most valuable player. Prior to Qualtrics, Haney has worked in executive positions at Toluna; Harris Interactive; TNS; SPSS; and the National Opinion Research Center at the University of Chicago. She currently leads all the formative research for the CDC’s anti-smoking ads for the past five years, a campaign that has in part contributed to the five-year decline in smoking rate in the U.S. amongst adults from 23% to 14%.

Philip Resnik holds a joint appointment as Professor in the University of Maryland Institute for Advanced Computer Studies and the Department of Linguistics, and an Affiliate Professor appointment in Computer Science. He earned his bachelor’s degree in Computer Science at Harvard in 1987, and his Ph.D. in Computer and Information Science at University of Pennsylvania in 1993, and joined the University of Maryland faculty in 1996. His industry experience prior to entering academia includes time in R&D at Bolt Beranek and Newman, IBM T.J. Watson Research Center, and Sun Microsystems Laboratories. Resnik’s research focuses on computational modeling of language that brings together linguistic knowledge, domain expertise, and data-driven machine learning methods, with an emphasis on applications in computational social science as well as experience in multilingual text analysis and machine translation, and scientific interests in computational cognitive neuroscience. He holds two patents and has authored or co-authored more than 100 peer-reviewed articles and conference papers. At various times his work has been highlighted in Newsweek, The Economist, New Scientist, and on National Public Radio, and he has been a repeat organizer and panelist at SXSW Interactive. Outside academia, Resnik was a technical co-founder of CodeRyte (clinical natural language processing, acquired in 2012 by 3M Health Information Systems), and is an advisor to Converseon (social strategy and analytics), FiscalNote (machine learning and analytics for government relations), and SoloSegment (web site search and content optimization).

You can watch the full December 13 SoDa Symposium webinar below or on YouTube.