Category

Share

Not My Voice! A Framework for Identifying the Ethical and Safety Harms of Speech Generators

Events

Sony AI

June 4, 2024

In recent years, the rise of AI-driven speech generation has led to both remarkable advancements and significant ethical concerns. Speech generation can be a driver for accessibility and inclusion—giving people with speech impediments an ability to communicate fluently, or enabling creators to translate their work to different languages to reach a wider audience. However, incidents involving AI-generated speech are also making headlines, highlighting the potential for misuse.

Recently, actress Scarlett Johanssen’s statement that her voice was used in speech generation products without her consent and against her will, has led to outrage amongst creators and the public. In other instances, synthetic voices have been used in swatting attacks in the United States, where anonymous perpetrators use AI to create fake emergency calls, leading to violent police responses at innocent people's homes. As our research "Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators'' notes, “In a growing number of swatting attacks in the United States, anonymous perpetrators create synthetic voices that call police officers to close down schools and hospitals, or to violently gain access to innocent citizens’ homes”​​.

In another disturbing case, AI-generated voices were used to simulate the voices of deceased children, sharing fake narratives of their tragic deaths on social media. This demonstrates how “speech generation, colloquially referred to as audio deepfakes, is fueling an unprecedented wave of malicious activity.”​​

These examples underscore the importance of developing generative AI with an acute awareness of ethical and safety risks, rather than retrospectively scrambling to address unintended consequences.

Our Groundbreaking Research at ACM FAccT 2024

But how exactly should we look at the risks of a technology as complex and versatile as speech generation? In our recent research "Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators", we propose a way forward. The research was presented at the prestigious ACM Fairness, Accountability, and Transparency (FAccT) 2024 conference, and breaks new ground through its comprehensive analysis of the ethical and safety risks posed by generative AI systems that create human speech.

As stated in our paper, “The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed”​​. Motivated by a vision for AI technology that should, first and foremost, serve humans and enable creativity to flourish, authors, Wiebke Hutiri, Orestis Papakyriakopoulos, and Alice Xiang, argue “[T]hat risks and harms of speech generators are not merely features or failures of the technology. Rather, speech generators exist within complex sociotechnical systems.”​​ We present a new conceptual framework for understanding these risks, focusing on the interactions between technical systems and human stakeholders.

Methodology

Previous taxonomies which inspired this work focus on categorizing specific harm types—rather than considering who is affected, who is responsible, and how this affects the harms that materialized. Our research, thus, sets out to develop a classification that can model the interactions between affected and responsible stakeholders, and speech generation technology. The key idea behind this was to map the causal pathways to harm, so that leverage points for mitigating the risks of speech generators can be revealed.

However, this process of creating a taxonomy was not without challenges. The affected and responsible entities in speech generation incidents are not always obvious. Moreover, several specific harms often co-occur. We grounded our approach in empirical evidence, identifying and analyzing 35 unique, reported incidents from various AI incident databases that occurred between 2019 and 2023.

As our research expounds, “We used an iterative design science research approach to develop the conceptual framework for modeling pathways to AI harms and the taxonomy of harms of speech generators in parallel.”​​ Iterative revisions and applications of this taxonomy helped refine our framework, resulting in a detailed classification of harms associated with speech generators.

Key Contributions

Our research presents a new way for categorizing specific harms from speech generation AI according to the exposure of affected individuals. These include being the subject of, interacting with, suffering due to, or being excluded from speech generation systems. This categorization scheme has enabled us to identify specific harms that were not included in previous taxonomies. For example, we were the first to identify the violation of right to publicity as a risk that arises when someone’s name, likeness, or other recognisable aspects of their persona, such as their voice, are used for commercial purposes, without consent. As the case of Scarlett Johanssen demonstrates, specific harms of this nature are already prevalent.

The motives of the creators and deployers of these systems also play a crucial role in the resulting harm. For example, speech generators pose different risks when used for malicious purposes, such as fraud or electoral manipulation, compared to when they are deployed negligently or without considering potential negative consequences. We show that by connecting the harm types and exposure of affected entities with the intent and harm motives of responsible entities, pathways that lead to specific harms can be established. Common pathways can help to identify recurring incident patterns, such as the five patterns that the authors identify: voice clones of voice actors, consumer fraud, swatting at scale, bringing back the dead, and audio deepfakes of politicians.

Potential Impact

The taxonomy proposed in our paper can influence policy interventions and decision-making processes in AI development. By providing a clear framework for understanding the complex risks associated with speech generation, this research supports the responsible creation of AI technologies. It also highlights the need for robust regulatory measures to prevent misuse and protect individuals from the harms identified.

Our research also proposes that new methods for evaluating speech generators are needed to better understand and mitigate potentially harmful outputs. Specifically, certain harms, such as a model creating false information, are testable and can be measured through rigorous capability evaluations. Our research advocates for "...identifying capabilities that should be measured, operationalizing these measurements, and recognizing the potential limitations of current evaluation methods." We point out that "while such evaluations are increasingly done for large language models (LLMs), they are not yet common for speech generators, which rely on subjective listener tests that are often unreliable."

Our research further distinguishes testable harms from non-testable harms, which include broader system-level risks like the erosion of trust in public information and the pollution of the information ecosystem. Harm pathways like those introduced in this research, offer an opportunity to study and understand how these harms emerge, their dependence on various stakeholders, and leverage points where interventions can be applied to mediate their impact. Modeling harm pathways can help policy makers to “develop more effective strategies to prevent and mitigate these broader, more insidious harms."

Conclusion

The presentation of this paper at ACM FAccT 2024 contributes to the ongoing discourse on AI ethics and safety. As generative AI continues to evolve, the insights provided by our research contribute to guiding the development of policies and technologies that prioritize ethical considerations and societal well-being. The taxonomy of harms of speech generators is not only a tool for researchers and policymakers but also a crucial step towards safer and more responsible AI deployment in our increasingly digital world.

At Sony AI, we believe that developing measures to understand and address risks is important at present, and is equally necessary to ensure that the future development of AI technologies aligns with our ethical commitments to society. We are proud to lead the way in this critical area of research and look forward to continued collaboration and innovation to mitigate the harms of generative speech AI.

Latest Blog

November 15, 2024 | Sony AI

Breaking New Ground in AI Image Generation Research: GenWarp and PaGoDA at NeurI…

At NeurIPS 2024, Sony AI is set to showcase two new research explorations into methods for image generation: GenWarp and PaGoDA. These two research papers highlight advancements in…

October 4, 2024 | AI Ethics

Mitigating Bias in AI Models: A New Approach with TAB

Artificial intelligence models, especially deep neural networks (DNNs), have proven to be powerful tools in tasks like image recognition and natural language processing. However, t…

September 14, 2024 | Scientific Discovery

Behind the Research: How Sony AI Researchers are Pioneering AI Models for Scient…

The pace of scientific research is accelerating, with an exponential increase in published research articles each year. For instance, in 1980, approximately 500,000 scientific arti…

  • HOME
  • Blog
  • Not My Voice! A Framework for Identifying the Ethical and Safety Harms of Speech Generators

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.