Authors
- Dora Zhao*
- Morgan Klaus Scheuerman
- Pooja Chitre*
- Jerone Andrews
- Georgia Panagiotidou*
- Shawn Walker*
- Kathleen H. Pine*
- Alice Xiang
* External authors
Venue
- NeurIPS 2024
Date
- 2024
A Taxonomy of Challenges to Curating Fair Datasets
Dora Zhao*
Morgan Klaus Scheuerman
Pooja Chitre*
Georgia Panagiotidou*
Shawn Walker*
Kathleen H. Pine*
* External authors
NeurIPS 2024
2024
Abstract
Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.
Related Publications
AI technologies have become ubiquitous, influencing domains from healthcare to finance and permeating our daily lives. Concerns about the values underlying the creation and use of datasets to develop AI technologies are growing. Current dataset practices often disregard crit…
Data workers play a key role in the big data industry. Clients hire data workers to collect and annotate data with human identity concepts, like demographic categories or clothing items. Often, such workers are treated as computational—they are expected to quickly and object…
Large language models (LLMs) are the new hot trend being rapidly integrated into products and services—often, in chatbots. LLM-powered chatbots are expected to respond to any number of topics, including topics central to gender identity. In light of rising anti-trans discour…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.