8 March 2023

Ethics in data science

Many people assume no connection between the two

  • Data science is about math/stats and computer science
  • Ethics is about social science and philosophy
  • The reality is that they are completely intertwined

Data ethics

[S]tudies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions

Floridi & Taddeo (2016)

Ethics in data science

Consider this statement:

Data are inherently objective, but people are not

Ethics in data science

Consider this statement:

Data are inherently objective, but people are not.


Do you agree? Why or why not?

FAIR principles

for scientific data management and stewardship

  • Findable
  • Accessible
  • Interoperable
  • Reusable

Indigenous Peoples’ rights & interests

  • FAIR principles largely focus on aspects that will facilitate increased data sharing
  • They largely ignore power differentials and historical contexts
  • This creates a tension for Indigenous Peoples who are asserting greater control over the application and use of Indigenous data & knowledge for collective benefit

CARE principles

for Indigenous data governance

  • Collective benefit
  • Authority to control
  • Responsibility
  • Ethics

Predictive policing

Kristian Lum

What’s an algorithm got to do with it?

Kristian Lum

Are police records a representative sample?

  • Variation in reporting rates
  • Variation in police attention
  • Variation in rates of enforcement
  • Collecting a random sample is difficult

Kristian Lum

The punchline

  • ML will reproduce the biases in the data used to train it
  • We need to think about what’s missing from the training data and what types of biases the data encode
  • We need to be aware of the consequences of reinforcing those biases

Faculty job market

A data‐based guide to the North American ecology faculty job market

Fox (2020)

Motivation

  • Provide ecology faculty job seekers info about the job market
  • Evaluate whether current hiring is diverse & equitable

A data‐based guide to the North American ecology faculty job market

Fox (2020)

Approach

  • Determine who was hired into those jobs
  • Infer genders from names & pictures

Discuss

Do you see any ethical problems with this approach?

Backlash

Backlash

Fox’s initial response

It is important to identify any systemic gender disparities in the ecology faculty job market, and to identify their causes so that the disparities can be remedied. Inferring a gender binary from a person’s name, as I did, is standard practice in research on gender disparities in many areas, including in ecology and allied fields. This approach performs well. In cases of ambiguity, standard practice is to resolve the ambiguity where it is feasible to do so, by consulting publicly-available photographs and pronoun use in social media profiles. I did so.

Fox’s final response

Using a gender binary, and inferring the genders of some new hires from their names, is an imperfect approach. Gender is not binary, and ecologists who do not identify as men or women are our colleagues.

Community’s response

My response

Always consider the impact of your intentions

Moving forward

Questions to ask yourself

Are the data valid for their intended use?

Questions to ask yourself

Have we identified & minimized any bias in the data or in the model?

Questions to ask yourself

Have we identified & minimized any bias in the data scientist?

Questions to ask yourself

Is the analysis transparent and reproducible?

Questions to ask yourself

What are the likely misinterpretations of the results and what can be done to prevent them?

Oxford-Munich code of conduct

for professional data scientists

  • Lawfulness
  • Competence
  • Dealing with data
  • Algorithms and models
  • Transparency, objectivity & truth
  • Working alone and with others