Data Ethics in the Wild: IRL Examples

Meryl Marie
Apr 18, 2021
2 min read

In a nutshell, the 2015 movie "Spotlight" is "the true story of how the Boston Globe uncovered the massive scandal of child molestation and cover-up within the local Catholic Archdiocese..."

I watched this movie around the same time we had our "Data Ethics" lecture in the General Assembly Data Science Immersive course. There are many documentaries, books, and articles about the ethics of using concepts like machine learning and artificial intelligence. For example, some US judges use an algorithm to predict whether a defendant is likely to reoffend. They then use that score to influence their sentence. It is not a surprise that these scores end up biased. The algorithms are built on data that is inherently biased, as our criminal justice system has historically and unfairly targeted African Americans. In 2014 Eric Holder, then US Attorney General, warned that these algorithms may interfere with "individualized and equal justice." Making assumptions based on statistics without contextual information has proven dangerous.

So what does this have to do with the movie "Spotlight"? *Spoilers ahead*

During the movie (which, again, is a true story) the investigative team at the Boston Globe interviews a former priest who claims that 6% of all priests molest minors. That means in Boston, where 1,500 priests work and reside, 90 priests could potentially be predatory towards children. Operating under this assumption, the team accesses church directories -records from the Catholic church that show how the church moved priests around to different parishes. Some priests would inexplicably move somewhere new from one year to another, and sometimes they were "unassigned" or "on sick-leave". These designations became a pattern among certain priests, and the team created a list of names to investigate.

The team of journalists operate under 2 assumptions that appear to be tenuous on the surface. 1, the claim that 6% of priests are sexual predators. This gives them a framework of 90 priests to potentially look for. The next assumption is that the priests who move from parish to parish, and under the pretense that they are "sick" or on leave are part of a broader conspiracy to cover up the sexual abuse to avoid scandal and lawsuits.

In the end, the team of journalists were correct. They publish evidence of cover-up stories of 75 priests in the city of Boston. While in this case the pattern was uncovered and successfully investigated, this story proves that without proper context, the statistics and numbers are not enough. The team went through a ruthless process of pulling court records and interviewing those connected. If they were to just publish a list of names that seemed suspect, their journalistic integrity would be in danger.

As data scientists, we must remember that each record in a spreadsheet/dataframe/database/dataset is a separate case. For whatever we investigate, whether it is temperature, prison populations, crime rates, or vaccine data - we must remember that there is qualitative data that adds context and important details. It is dangerous to be enveloped in numbers all day at your desk, and assume that when you leave work the world reflects the black and white of a computer screen. We mustn't forget the grey areas, and find inspiration in the journalistic integrity of the team at the Boston Globe - who did their due diligence to find justice.

Art

Blog

Data Ethics in the Wild: IRL Examples

Recent Posts

Comments

Life After a Data Science Bootcamp

Attention Sociology Students: Try Data Science!

The Unseen Pandemic: COVID-19 in US Prisons