Authors

* External authors

Venue

Date

Share

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Keziah Naggita*

Julienne LaChance

Alice Xiang

* External authors

AIES 2023

2023

Abstract

Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.

Related Publications

Measure dataset diversity, don’t just claim it

ICML, 2024
Dora Zhao*, Jerone T. A. Andrews, Orestis Papakyriakopoulos*, Alice Xiang

Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets. Despite their prevalence, these ter…

Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators

FaccT, 2024
Wiebke Hutiri*, Orestis Papakyriakopoulos*, Alice Xiang

The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States…

Ethical Considerations for Responsible Data Curation

NeurIPS, 2023
Jerone Andrews, Dora Zhao*, William Thong, Apostolos Modas, Orestis Papakyriakopoulos*, Alice Xiang

Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustnes…

  • HOME
  • Publications
  • Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.