How to Make Critical Decision-Making Systems Fairer and Build Trust in AI

Laurie Robinson - March 22, 2024

Researchers from UMD and GW are creating a new measure to make AI systems involved in parole decisions and loan approvals fairer


A university admissions panel reviews two almost identical applications. Both candidates boast the same SAT scores, a testament to their academic aptitude, but one comes from a prestigious preparatory school and the other from a resource-constrained public school. The preparatory school applicant, with access to tutors and tailored instruction, presents a polished case of readiness. In contrast, the other displays an equivalent potential molded by perseverance and perhaps a broader set of experiences. The question of which student expended more effort to achieve their SAT score is considered. The student who went to public school—and had none of the privileges that come with an elite education—had to overcome more obstacles and the admissions committee wants to account for that. 

Humans can note the subtle differences between both students’ test scores, but would a machine learning AI system be able to? AI-infused systems help humans make high stakes decisions all the time—such as loan approvals or parole decisions—but the effort individuals expend to have “gotten where they are” is never considered. Researchers from the University of Maryland (Hal Daumé III, endowed Computer Science [CMNS] professor; Furong Huang, CS assistant professor; Zubin Jelveh, College of Information Studies [INFO] and the Department of Criminology and Criminal Justice [BSOS] assistant professor; Tin Nguyen, second-year Computer Science [CMNS] PhD student) and George Washington University (Donald Braman, associate professor of Law) are attempting to change that by creating an algorithm that takes into account measures of fairness that might seem obvious to some people but that machines overlook. 

There is something that is traditionally measured, which is the number of convictions or SAT scores or whatever, but how you got to that number is traditionally not measured,” says Daumé III. “How you got to that number for a lot of cases makes a difference for what that number actually means.”  

A Scientific Approach to Measuring Effort

The researchers are creating an effort-aware fairness measure for AI systems using two real-world datasets. The first is the Client Legal Utility Engine (CLUE) dataset for recidivism risk, featuring comprehensive records from Maryland, including over 40 million entries from the mid-1980s to 2021. It contains criminal history like arrests and convictions as well as race, gender, and age. The second dataset focuses on credit score classification.

According to Nguyen, “A promising approach to quantifying effort is based on force, which is a phenomenon that comes from physics.” Force denotes the interaction that changes the state of motion of an object. In a social context, effort can be likened to a force that propels individuals forward. That effort is influenced by inertial factors that emerge from historical and systemic disadvantages. For example, African Americans are frequently subjected to systemic biases, such as disproportionately high rates of false arrests. Such historical patterns exemplify the inertia that society imposes on these individuals, representing the resistance they encounter in their pursuit of progress. This “holding back effect” necessitates a greater exertion of effort for the individual to overcome obstacles and achieve positive outcomes. 

In addition to considering the inertial factors that represent the resistance individuals face, the researchers also examine indicators of an individual’s momentum or “velocity” in society. For velocity, the researchers look at indicators that evolve over time, specifically within the context of recidivism risk assessment. For example, over a period of four years, changes in factors such as arrest records, conviction history, and the nature of crimes (violent or non-violent) can be indicative of an individual’s momentum in society. Positive contributions, such as the hours of service provided to the community or constructive activities undertaken while incarcerated, are also considered. These positive indicators showcase a person’s effort towards rehabilitation and integration into society, countering the negative inertia and laying the groundwork for a fair evaluation of their social movement.

“One of the things we’ve been thinking about is whether you can measure a person’s efforts via their trajectories in the sense that you have two people and they’ve both had 10 arrests over the last year,” says Jelveh. “One had escalating arrests and the other had deescalating arrests. It would seem that one person is on the uptick and the other person is cooling off. In a scenario like this, if we agree on what an effort-measure is then the solution is simple. We allow the algorithm to see these different trajectories.”

A Human Study

In the last phase of their project, the researchers will conduct human studies to gauge the correlation between effort-aware fairness scores and public opinion on fairness. Such empirical validation is crucial for advancing responsible and socially acceptable AI systems. 

The first part of the human study focuses on understanding the alignment between the effort-aware metric and human understanding regarding the effort exerted by individuals. In this study, participants will examine cases involving pairs of defendants who have received contradictory evaluations of their efforts: one assessment made by the effort-aware metric and the other by a conventional baseline. The participants will then judge which defendant has made a greater effort. This task not only sheds light on the relevancy of the effort metric to human judgment but also serves to validate or challenge the measure’s standings against traditional baselines. 

The second part of the study presents a different set of participants with information from the CLUE defendants, including recidivism predictions from several AI models. The participants will rank these models based on their perceived fairness. This ranking will provide valuable insights into whether the effort-aware fairness metrics correspond with human notions of fairness.

“I think different people will have different perspectives on what they consider fair outcomes,” says Daumé III. “People who have been in and out of the criminal justice system might perceive what’s fair to be different from judges or from someone who doesn’t have that history. We’re still at the stage where we’re trying to get a holistic view of what people think.” 

Through these findings, the research team aims to refine their metrics, potentially paving the way for more equitable AI models that resonate with public values and contribute to the responsible use of technology in judicial and other critical decision-making processes.

“If it turns out that people do care about effort-aware fairness then literally every algorithm in every context needs to reassess how effort is being measured,” says Jelveh.