Authors

* External authors

Venue

Date

Share

Augmented data sheets for speech datasets and ethical decision-making

Orestis Papakyriakopoulos*

Anna Seo Gyeong Choi*

William Thong

Dora Zhao*

Jerone Andrews

Rebecca Bourke

Alice Xiang

Allison Koenecke*

* External authors

FAccT 2023

2023

Abstract


Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several prominent datasets. Recent works have tried to reverse this trend, for example, by proposing analytical frameworks for ethically evaluating datasets, the standardization of dataset documentation and curation practices, privacy preservation methodologies, as well as tools for surfacing and mitigating representational biases. Little attention, however, has been paid to the realities of operationalizing ethical data collection. To fill this gap, we present a set of key ethical considerations and practical recommendations for collecting more ethically-minded human-centric image data. Our research directly addresses issues of privacy and bias by contributing to the research community best practices for ethical data collection, covering purpose, privacy and consent, as well as diversity. We motivate each consideration by drawing on lessons from current practices, dataset withdrawals and audits, and analytical ethical frameworks. Our research is intended to augment recent scholarship, representing an important step toward more responsible data curation practices.

Related Publications

Measure dataset diversity, don’t just claim it

ICML, 2024
Dora Zhao*, Jerone T. A. Andrews, Orestis Papakyriakopoulos*, Alice Xiang

Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets. Despite their prevalence, these ter…

Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators

FaccT, 2024
Wiebke Hutiri*, Orestis Papakyriakopoulos*, Alice Xiang

The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States…

Ethical Considerations for Responsible Data Curation

NeurIPS, 2023
Jerone Andrews, Dora Zhao*, William Thong, Apostolos Modas, Orestis Papakyriakopoulos*, Alice Xiang

Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustnes…

  • HOME
  • Publications
  • Augmented data sheets for speech datasets and ethical decision-making

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.