(Video) SoDa Symposium: Trustworthiness in Social Data Science

iSchool News Staff - February 15, 2022

Researchers from UMD and Cornell University discuss the ethical dilemmas in social data science and how ethnographic studies can help guide new research methods.

UMD’s Social Data Science Center (SoDa) on Feb. 8

Illustration by Darren Garrett

Dr. Katie Shilton, Associate Professor at the University of Maryland’s College of Information Studies (iSchool), and Dr. Emanuel Moss, Postdoctoral Associate at Cornell University, led a discussion on the problems with trustworthiness in social data science during a talk hosted by UMD’s Social Data Science Center (SoDa) on Feb. 8. Moderated by iSchool Professor and Senior Associate Dean, Dr. Brian Butler, the discussion focused specifically on scholarship and research that relies on rich information generated about people through digital interactions, why existing research ethics like the Belmont Report and U.S. Common Rule aren’t a perfect fit for guiding social data science, and the ways ethnographic methods can better inform data research ethics.

“Digital data has become a gold mine for what’s being referred to as computational social science, or social data science – the use of digital data to understand people – but new data collection methods have also raised new questions about ethics and participation in this sort of research,” said Shilton.

Shilton and Moss are part of the PERVADE Project, an interdisciplinary collaboration between researchers at six institutions to answer empirical questions in data ethics. When the team started looking at ethics in data science, most of the solutions on the table were reliant on approaches already developed for social research, but weren’t appropriate for the emerging paradigms of doing research with big social digital data sets.

In a recent paper published by the PERVADE team, they argued that data scientists should probe both appropriateness and complex potential harms using two lenses directly inspired by ethnography – “participant awareness” and “reflections on power.” The first lens is backed by a suite of techniques developed by ethnographers to gain the trust of participants through establishing an entreé, or permission from participants to be in their space, participant checking to validate findings, and collaborative ethnography with people to create research goals and collect data. The second lens has led to the ongoing efforts to unpack the history of ethnography and its colonial ties, diversify the profession, and shifted topics of study to include groups of more, or at least equally, socially powerful people than the researcher.

To think about what awareness and power look like in social data science research, the PERVADE team began examining the data itself and then mapped social data onto two spectra – automatic and intentional, and private and public. This map is a heuristic for thinking about where data comes from that can help people reason with the likely or reasonable expectations they have about how their data is collected and how it will be used as it pertains to them.

“If data science is to be trustworthy, we need to explicitly grapple with the fact that our data and our methods are also used for surveillance and control,” said Moss. “There are shades of gray for all of this, but thinking about where data came from helps serve as a starting point for thinking about ethical implications of data research.”

You can watch the full SoDa Symposium event below or watch it on YouTube here.

About PERVADE:

The PERVADE project was conceived during the Consortium for the Science of Sociotechnical Systems (CSST) 2016 Summer Institute Catalyst Working Group, an NSF-supported research coordination network (Grant #1144934). PERVADE’s multidisciplinary team conducts a suite of empirical projects to rethink computational research ethics. PERVADE is made up of sub-projects which explore Technical Investigations, User Communities, Computing Research Communities, and Data Ethics Regulators.