World

The Algorithm’s Dilemma: When Does the Law Turn “Unfair”?

Shreya Ram
May 10, 2026
5 min

Image - Herbin Issac

Proponents argue that artificial intelligence offers a structured solution to perennial problems in procedural justice, namely involving implicit bias and cognitive fallibility if handled by humans. As criminal interactions have slowly transitioned to being more data-driven, AI is able to analyse large offender databases, procedural history, and relevant points inputted by judges to develop statistical models that interpret the future likelihood of recidivism. This functionality allows AI to calibrate information from lengthy legal documents, analyse ubiquitous behavioral patterns, and future predictive risks, helping decrease the tedious work performed by court personnel. Artificial intelligence also brings various economical benefits for judges; as of 2022, US courts faced a backlog of 700,000 cases which costs U.S. taxpayers $14 billion annually to detain those awaiting trial. With courts using artificial intelligence to expedite their review process, the U.S. could save up to $1 billion annually.

However, with proprietary technology available only to programmers comes an increased suspicion of legal moral legitimacy from victims, whose lives are often determined by recidivism results that AI offers. Whilst judges use artificial intelligence to counter their own implicit biases, the system itself is riddled with stereotypical data, either from past cases or prosecutorial discretions that often show a pattern, for example, of charging people of color with offenses that carry heavier sentences. These biased outputs then reinforce the feedback loop and are recorded as new data points, further entrenching and amplifying the biases in the existing dataset. In this article, I will unpack a key legal artificial intelligence dispute in the United States - the COMPAS controversy investigated by ProPublica - and provide my own judgments on both the plaintiff and defendant’s cases. 

The COMPAS dataset has often been targeted for relying on biased data sources that influence its output, which judges ultimately take into account when informing their final decisions. In the State v. Loomis (Wisconsin Supreme Court)4, petitioner Eric Loomis requested a second hearing on the grounds that the Circuit Court’s consideration of using a COMPAS risk assessment during sentencing violated his rights of due process. In Loomis, the Wisconsin Supreme Court ultimately allowed COMPAS but required a warning about its limitations. 

At this hearing, Loomis offered the testimony of Dr. David Thompson regarding the question on whether risk assessment tools have a credible role in court judgement, to which he responded “The Court does not know how the COMPAS compares that individual’s history with the population that it’s comparing them with… There’s all kinds of information that the court doesn’t have, and what we’re doing is we’re mis-informing the court when we put these graphs in front of them and let them use it for sentences.”  He argued that the tool runs a tremendous risk of overestimating recidivism levels, often weaving in external data points that might not be applicable to the case. His testimony underscores a fundamental procedural violation: Loomis had no ability to cross-examine the COMPAS algorithm, as he would have done if confronted by a human expert, whose methodology can be challenged under oath. This poses a fundamental issue that renders defenses against an algorithm futile, violating the spirit, if not the letter, of the Confrontation Clause.

Loomis’ defense against the alleged misuse of COMPAS was further supported by an opinion by ProPublica in 2016, that tested the COMPAS system adopted in the State of Florida using the same benchmark as COMPAS - the likelihood of re-offending in the next two years. ProPublica found that the formula was twice as likely to flag black defendants as future criminals and consider white criminals a lower risk.  In addition, investigators found that the algorithm performed poorly when asked to detect future crimes: only 20% of people predicted to commit violent crimes actually went on to do so. When including a further variety of crimes, such as misdemeanors, the correlation was higher but was still not fully accurate, as 61% of people who were deemed likely to reoffend were actually arrested for subsequent crimes in the next two years. When reviewing previous COMPAS data, ProPublica found that under false positives classification (FPC), 23.5% of whites who did not reoffend were misclassified as ‘high risk’ defendants, compared to 44.9% of blacks. Under false negative classification (FNC), 47.7% of whites who reoffended were misclassified as ‘low risk’ defendants compared to 28% of blacks. Even when ProPublica ran a statistical test to determine whether this racial disparity could be justified by the defendants prior crimes or types of crimes they had been offended for by removing factors of race, age, and gender from the database, Blacks were still 77% more likely to commit a violent crime in the future and 45% more likely to be predicted to commit a future crime of any kind.

In response to these claims, however, Northpointe disputed that ProPublica had overlooked aspects of the technology that performed well in other standard measures of fairness. They claimed that the software was not discriminatory against blacks due to predictive parity, it can discriminate between recidivists and non-recidivists equally well for black and white defendants measured by the area under the curve of the receiver-operating characteristic curve, and the likelihood for any given recidivism score will be the same regardless of race. Under false positive prediction (FPP), they found that among those labeled ‘high risk’, 41% of whites and 37% of blacks did not reoffend, and under false negative prediction (FNP), among those labeled ‘low risk’, 29% of whites and 35% of blacks reoffended. By using this data, Northpointe argued that the algorithm’s scores essentially mean the same thing over different racial groups, even though the specific reasons for errors differ.

These “specific reasons” are the core distinction between Northpointe and ProPublica: they both may not only differ over results of their analysis, but perhaps differ more profoundly over their definitions of algorithmic “fairness”. Northpointe countered the claim of bias by rejecting the “classification parity” definition, which focuses on the overall false positive and false negative classifications across groups, and rather focused on “predictive parity”, which focuses on the predictive value of the score being the same across groups. Because Northpointe claimed that the rate of white defendants who were labeled ‘high risk’ was similar to the rate among Black defendants labeled ‘high risk’, the predictions were, in this perception of fairness, equally reliable for both groups. ProPublica’s argument was based around false positives/negatives, shifting the focus from a mere ‘calibration’ of fairness to the real-world impact and error rates of the algorithm’s decisions. Although both definitions of algorithmic fairness were, in a narrow sense, mathematically correct, it would be impossible in a real-world scenario to satisfy both definitions simultaneously - if the population being studied reflects deep social and racial inequity, then the recidivism rate would be higher for Black defendants, meaning that a model calibrated to be accurate for both groups would necessarily have different error rates across those groups. This demonstrates, how in fact, “fairness” towards defendants is not a single, monolithic concept; rather, it is a multifaceted ideal that shifts the focus from if the algorithm is biased to what type of fairness should be prioritized if the definitions of fairness themselves can be mutually exclusive. It is imperative that the developers of such statistical models let their choices be fairly conspicuous in order to not violate the due process requirement that evidence be reliable and its limits be known.

In summary, It is difficult to uncover the root cause of such inaccuracies in the system, but these debates provoke a compelling perspective: perhaps the system is subjective due to its objectivity. Algorithms organize the information that it receives into different weighted factors; hence, while someone who has molested a minor may be categorised as a “low risk” because they have a job, someone charged with public intoxication may be at “high risk” because they are homeless. These risk factors do not tell the judges whether the defendant should necessarily go to prison, but rather what the probation conditions ought to be. Such unclarity has caused prominent groups such as the Pretrial Justice Institute (PJI) to reverse their fervent advocacy for COMPAS, stating that “pretrial risk assessment tools, designed to predict an individual’s appearance in court without a new arrest, can no longer be a part of our solution for building equitable pretrial justice systems.

About the author

Shreya Ram

Shreya is a student with an interest in world politics, legal ethics, technology law, and jurisprudence. She hopes to study law at university and is passionate about the critical human decisions that shape the world around us. She is especially interested in exploring how artificial intelligence will modify legal theory, judicial decision-making, and procedural fairness, a field that continues to gain prominence in modern society.