SaTC: CORE: Medium: Learning Code(s): Community-Centered Design of Automated Content Moderation

Online platforms bring out the best and worst of free speech: while they help us make connections and share ideas, they can also facilitate hate speech and extremism. Content moderators work to enforce community rules designed to mitigate these negative behaviors, but face a high burden from repeated exposure to toxic content. In principle, automated tools that use natural language processing (NLP) and machine learning (ML) techniques could ease this burden. However, current NLP and ML techniques can be circumvented by determined posters through the use of subtle and coded language, and the moderation tools that use them are often hard for moderators to configure for their community’s norms, policies, and moderation practices. This project leverages the fact that communities already make and enforce diverse speech policies online to 1) teach software to learn nuance from the decisions moderators make in existing communities; 2) support moderators by not only flagging content, but also suggesting decisions and providing explanations for those decisions; and 3) provide auditing tools that help community members know that moderators are acting in accordance with norms and policies. In doing this research, the team will develop tools to support healthier online communities, particularly volunteer-led communities, by strengthening policy enforcement, enabling better working conditions for online moderators (who are often from marginalized communities), creating more flexible software responses to community policies, and supporting adaptability to future regulation of content moderation.

To achieve these goals, the cross-disciplinary project team is conducting cycles the involve empirical needs-finding studies with moderators, development of NLP and ML-based tools, evaluation, and iterative improvement of those tools. The project’s empirical studies will advance knowledge of how “machine-in-the-loop” moderation (where automated tools make or support moderation decisions) impacts moderator working conditions and online participant experiences, as well as informing evaluation mechanisms for measuring the success of the ML tools at respecting online community policies and identifying unwritten community norms. The project’s design process will make fundamental progress in ML algorithms that learn from few labels using justifications provided by moderators, as well as improving explanations for machine decisions based on human rationales. Together, these advances produce new design methods for ML tools that adapt to complex written policies and identify unwritten social norms, serving multiple stakeholders accountably and transparently.

10/1/2021 - 9/30/2025 (Estimated)

Principal Investigator(s):

Additional UMD Investigator(s):
Michelle Mazurek, Hal Daume

Research Funder:

Total Award Amount:

Research Areas: