Skip to content

Augmented data sheets for speech datasets and ethical decision-making

Abstract


Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several prominent datasets. Recent works have tried to reverse this trend, for example, by proposing analytical frameworks for ethically evaluating datasets, the standardization of dataset documentation and curation practices, privacy preservation methodologies, as well as tools for surfacing and mitigating representational biases. Little attention, however, has been paid to the realities of operationalizing ethical data collection. To fill this gap, we present a set of key ethical considerations and practical recommendations for collecting more ethically-minded human-centric image data. Our research directly addresses issues of privacy and bias by contributing to the research community best practices for ethical data collection, covering purpose, privacy and consent, as well as diversity. We motivate each consideration by drawing on lessons from current practices, dataset withdrawals and audits, and analytical ethical frameworks. Our research is intended to augment recent scholarship, representing an important step toward more responsible data curation practices.

Authors

  • Orestis Papakyriakopoulos*
  • Anna Seo Gyeong Choi*
  • William Thong
  • Dora Zhao*
  • Jerone Andrews
  • Rebecca Bourke
  • Alice Xiang
  • Allison Koenecke*

*External Authors

Venue

FAccT 2023

Date

2023

Share

Related Publications

Join Us on the Cutting-Edge of AI Innovation