Retooling AI: Algorithm Bias and the Struggle to Do No Harm

AI-Bias-300x174Say what you want about the digital ad you received today for the shoes you bought yesterday, but research shows that algorithms are a powerful tool in online retail and marketing. By some estimates, 80 percent of Netflix viewing hours and 33 percent of Amazon purchases are prompted by automated recommendations based on the consumer’s viewing or buying history.

But algorithms may be even more powerful where they’re less visible—which is to say, everywhere else. Between 2015 and 2019, the use of artificial intelligence technology by businesses grew by more than 270 percent, and that growth certainly isn’t limited to the private sector.

From health care and financial services to education and criminal justice, software that turns data points into predictive patterns has become indispensable for leveraging the voluminous information trail we leave in our wake. Algorithms can help doctors make diagnoses, banks avoid risky loans, universities identify high-potential students, and judges recognize likely reoffenders.

They can. Except when they don’t.

Algorithm bias—“discriminatory treatment of individuals by a machine learning system”—happens for several reasons, all traceable to the humans who designed or implemented the algorithm or influenced the dataset it was built on. Because algorithmic software is ubiquitous, so is algorithm bias. For its victims, the consequences are far worse than spending 10 minutes watching Dummy because they previously watched Lars and the Real Girl.

“Algorithmic Horror Shows”
There are many cases of AI gone bad—an early, infamous one being the flash crash of 2010, when algorithmic trading software contributed to a $1 trillion stock selloff over 36 minutes. While the market rebounded relatively quickly, the incident triggered prompt congressional scrutiny and tougher federal safeguards, including the banning of deceptive automated trading known as “spoofing.”

But algorithm bias has proven to be more pernicious problem. Because it is rooted in human bias, it similarly targets the statistical outlier, whose best means of fighting back is usually a court of law. Congressional attempts to address it have so far been piecemeal.

Karen Yeung, a law and computer science expert from the University of Birmingham, in the United Kingdom, has noted that while algorithms are great for accomplishing “very repetitive, straightforward tasks, […] what we need to attend to is thinking about these technologies as complex sociotechnical systems, particularly when the consequences are concrete for people’s lives.” Wealthy people can generally find access to correct the problem, but others end up trapped in “algorithmic horror shows.”

The most egregious cases tend to affect people of color, with the horror show playing out in the criminal justice system. The facial recognition software used by government, including law enforcement, has been shown to be less accurate for darker-skinned people—particularly women—who are often underrepresented in historical datasets used for machine learning. That inherent bias may be compounded by photographic technology optimized for lighter skin tones. The resultant cases of mistaken identity have ranged from headache-inducing (a revoked driver’s license), to potentially life-altering (felony arrest).

Predictive technology used in courts and policing can lead to more systemic problems. An investigation by the nonprofit organization ProPublica turned up racial bias in COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), algorithmic software that judges use to assess the risk of recidivism when considering whether to grant bail. According to the report, “blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend,” while whites “are much more likely than blacks to be labeled lower-risk but go on to commit other crimes.”

In Chicago, an effort to prevent violent crime through predictive policing ended up targeting innocent people people in non-white neighborhoods, because the software designed to identify potential criminals was trained on location-based arrest data.

As more federal and state agencies rely on algorithmic software to prevent fraud and identity theft, gains in quickness and efficiency are offset by the statistical likelihood of more individuals being unjustly penalized.

Life-or-Death Consequences
Even outside the scope of the criminal justice system—when AI is used in the administration of government services, for example—a bad algorithm can mete out draconian punishment.

In California, thousands of people were thrown off Medicare because of an error in an automated system used by the Department of Health Services. In Australia, “robodebt” software designed to calculate and collect welfare overpayments—a job previously handled manually—generated 475,000 notices demanding repayment for nonexistent or inflated debts. In June 2020, in response to a class-action lawsuit, the Australian government announced it would clear or reimburse more than A$1 billion in erroneous charges, which it blamed on the software’s use of income averaging. In addition to the financial impacts, the debacle may have even triggered some suicides.

Clearly these are extreme cases, but the life-or-death consequences of algorithm bias do exist and may be starkest in our health care system, where machine learning is a particularly fast-growing trend. As U.S. doctors increasingly rely on algorithmic diagnostic tools, most of the data that trains those tools still comes from just three states: California, New York and Massachusetts. Hardly a cross-section of Americans.

Compounding that geographic bias is the racial bias that continues to seep into our medical software. At least a dozen health algorithms still use race as a factor. As a result, people of color have less-timely access to specialty care and even kidney transplants than do white people.

Even software designed to predict health care costs will be racially biased, if it’s fed data reflecting the fact that minorities historically have poorer access to medical treatment.

Still, They Persist
Like race bias, gender bias permeates algorithmic software that then reinforces and perpetuates it. The yawning wage and status gap between men and women in the workforce is perfect example. In 2015, when a mere 27 percent of American CEOs were women, a Google image search for the term “CEO” was even more disheartening, turning up photos of women only 11 percent of the time. That same year, a Carnegie Mellon study of 1,000 online job seekers showed that men received targeted ads for high-paying jobs six times more often than women did.

What’s frustrating is that these biases exist despite our heightened awareness of them. Last year, when a prominent software developer tweeted that his Apple Card credit limit was 20 times higher than that of his wife, who had the better credit score, Apple cofounder Steve Wozniak replied, “The same thing happened to us. I got 10x the credit limit. We have no separate bank or credit card accounts or any separate assets. Hard to get to a human for a correction though. It’s big tech in 2019.”

That year several bills addressing algorithm bias and related issues stalled in the House or Senate, including the Algorithmic Accountability Act, the Commercial Facial Recognition Privacy Act and the No Biometric Barriers to Housing Act.

Then came 2020 and the pandemic. As it has done with other systemic inequities, COVID-19 has highlighted the problem of algorithm bias and lent fresh urgency to the need to correct it. As the public health crisis spiraled into an economic crisis, a group of U.S. lawmakers tried again to prevent the inadvertent weaponization of AI against Americans. In May, with the unemployment rate hovering around 13 percent, they wrote a letter to House and Senate leadership proposing that further COVID-19 relief for businesses should be contingent on bias screening of any algorithmic software used for employment or financial considerations.

Business leaders also felt the pressure. The dual factors of widespread economic distress and a national reckoning on racial injustice prompted the Mortgage Bankers Association to reverse its support for a U.S. Housing and Urban Development proposal that would have made it harder to prove disparate impact caused by algorithmic lending practices.

Still, COVID-19 has managed to produce a series of fresh algorithmic horror shows. The takeaway: There are some spaces where AI just might not yet belong.

In the United Kingdom, when secondary-school graduates could not safely take their final exams in person, it was decided that grades should be assigned by computer modeling. (For more on that story, see “A-levels fiasco.”)

AI researcher Meredith Broussard, writing of a similar academic catastrophe affecting International Baccalaureate students in the United States, argues that “crude generalizations work for Netflix predictions because the stakes are low… In education, the stakes are much higher. A transcript follows you for years; when I was 25 and well out of college, I applied for a job that asked for my SAT scores.”

Lowering the Stakes
The stakes are high in health care, employment, banking, policing, all the places where algorithms can cause lives to be disrupted or harmed. When that happens, who’s responsible—the algorithm’s creator, or its user? Does the user have to be aware of algorithm bias to be held accountable for it? There is precedent for assigning liability for knowingly using a faulty tool. Beyond that, the legal landscape is murky.

Legislation forcing greater scrutiny of AI used in government and business would help defuse potential legal landmines. In the meantime, business, tech and advocacy groups are suggesting best practices for avoiding algorithm bias, including:

Among the most comprehensive solutions is offered by The Brookings Institution, whose framework for “algorithmic hygiene” can help an organization identify faulty machine learning tools.

Might Makes Right?
With the United States still in the thick of the pandemic, the promise of AI looms especially large. Right now, there’s preliminary research on the use of biometrics—digital eye scans—to diagnose COVID-19. We already have the capacity to use geolocation and social media monitoring to predict new outbreaks, and to use facial recognition software to monitor mask wearing and facilitate contact tracing.

But with algorithms, as with any powerful tool, what we could do and what we should do are often two different things.


Facial Recognition, Racial Recognition and the Clear and Present Issues with AI Bias

Digitalized Discrimination: COVID-19 and the Impact of Bias in Artificial Intelligence

The Bias in the Machine: Facial Recognition Has Arrived, but Its Flaws Remain