Authors
- Dora Zhao*
- Morgan Klaus Scheuerman
- Pooja Chitre*
- Jerone Andrews
- Georgia Panagiotidou*
- Shawn Walker*
- Kathleen H. Pine*
- Alice Xiang
* External authors
Venue
- NeurIPS 2024
Date
- 2024
A Taxonomy of Challenges to Curating Fair Datasets
Dora Zhao*
Morgan Klaus Scheuerman
Pooja Chitre*
Georgia Panagiotidou*
Shawn Walker*
Kathleen H. Pine*
* External authors
NeurIPS 2024
2024
Abstract
Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.
Related Publications
Vision-language models (VLMs) pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with specific objects or scenarios. Current methods, which focus on modifying inputs and monitoring changes in the model's output probability scor…
We tackle societal bias in image-text datasets by removing spurious correlations between protected groups and image attributes. Traditional methods only target labeled attributes, ignoring biases from unlabeled ones. Using text-guided inpainting models, our approach ensures …
Deep neural networks trained via empirical risk minimisation often exhibit significant performance disparities across groups, particularly when group and task labels are spuriously correlated (e.g., “grassy background” and “cows”). Existing bias mitigation methods that aim t…
JOIN US
Shape the Future of AI with Sony AI
We want to hear from those of you who have a strong desire
to shape the future of AI.