* External authors




Augmented data sheets for speech datasets and ethical decision-making

Orestis Papakyriakopoulos

Anna Seo Gyeong Choi*

William Thong

Dora Zhao

Jerone Andrews

Rebecca Bourke

Alice Xiang

Allison Koenecke*

* External authors

FAccT 2023



Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several prominent datasets. Recent works have tried to reverse this trend, for example, by proposing analytical frameworks for ethically evaluating datasets, the standardization of dataset documentation and curation practices, privacy preservation methodologies, as well as tools for surfacing and mitigating representational biases. Little attention, however, has been paid to the realities of operationalizing ethical data collection. To fill this gap, we present a set of key ethical considerations and practical recommendations for collecting more ethically-minded human-centric image data. Our research directly addresses issues of privacy and bias by contributing to the research community best practices for ethical data collection, covering purpose, privacy and consent, as well as diversity. We motivate each consideration by drawing on lessons from current practices, dataset withdrawals and audits, and analytical ethical frameworks. Our research is intended to augment recent scholarship, representing an important step toward more responsible data curation practices.

Related Publications

Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color

ICCV, 2023
William Thong, Przemyslaw Joniak*, Alice Xiang

This paper strives to measure apparent skin color in computer vision, beyond a unidimensional scale on skin tone. In their seminal paper Gender Shades, Buolamwini and Gebru have shown how gender classification systems can be biased against women with darker skin tones. While…

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

AIES, 2023
Keziah Naggita*, Julienne LaChance, Alice Xiang

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centri…

A Reflection on How Cross-Cultural Perspectives on the Ethics of Facial Analysis AI Can Inform EU Policymaking

EWAF, 2023
Chiara Ullstein*, Severin Engelmann*, Orestis Papakyriakopoulos, Jens Grossklags*

The EU AI Act proposal addresses, among other applications, AI systems that enable facial classification and emotion recognition. As part of previous work, we have investigated how citizens deliberate about the validity of AI-based facial classifications in the advertisement…

  • HOME
  • Publications
  • Augmented data sheets for speech datasets and ethical decision-making


Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.