Authors
- Dora Zhao*
- Morgan Klaus Scheuerman
- Pooja Chitre*
- Jerone Andrews
- Georgia Panagiotidou*
- Shawn Walker*
- Kathleen H. Pine*
- Alice Xiang
* External authors
Venue
- NeurIPS 2024
Date
- 2024
A Taxonomy of Challenges to Curating Fair Datasets
Dora Zhao*
Morgan Klaus Scheuerman
Pooja Chitre*
Jerone Andrews
Georgia Panagiotidou*
Shawn Walker*
Kathleen H. Pine*
* External authors
NeurIPS 2024
2024
Abstract
Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.
Related Publications
We propose a generative agent that augments training datasets with synthetic datafor model fine-tuning. Unlike prior work, which uniformly samples synthetic data,our agent iteratively generates relevant samples on-the-fly, aligning with the targetdistribution. It prioritizes…
AI technologies have become ubiquitous, influencing domains from healthcare to finance and permeating our daily lives. Concerns about the values underlying the creation and use of datasets to develop AI technologies are growing. Current dataset practices often disregard crit…
Data workers play a key role in the big data industry. Clients hire data workers to collect and annotate data with human identity concepts, like demographic categories or clothing items. Often, such workers are treated as computational—they are expected to quickly and object…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.



