hero-image
The Ethical Challenge of AI Accent Neutralization vs. Sonic Diversity

For many people, a regional accent is worn (or spoken) with pride. The way people speak and the inflections used tell a story about who they are, where they come from and the fabric that makes up their lives.

In audio advertising, however, the concept of broadcasting creatives with localized accents is a bit more complex. For some time now, brands have grappled with a dilemma: Are accents polarizing and off-putting, or are they a tool that can be used to bring people together?

But voice accents used in audio advertising matter beyond accent bias goals and whether audiences are more or less likely to skip ads if local accents are used. There are significant cultural and ethical implications that warrant consideration, especially for brands that have pledged a commitment to true multicultural inclusion and diversity.

The theory behind AI accent neutralization

There’s a commonly held (albeit debatable) belief that individuals who are powerful, educated and generally more likable speak in something known as Standard American unaccented English (SAE)—aka, they speak “normally.” Any speech that is different from that constructed norm is called an accent.

As a result, many organizations have sought out new AI technology applications that utilize SAE to curb dissatisfaction and bolster their attempts at greater brand perception. Sanas, a startup with a unique approach to AI voice, is an example. This software uses speech recognition and synthesis to change the speaker’s accent in almost real time.

And while the technology on its own is undoubtedly impressive and representative of just how far science has come, it poses a true ethical concern that lies ahead within the industry. It’s touted as being the way of the future, but it begs the question: Whose future, exactly, are we considering?

The impact on multicultural audiences

The Linguistic Society of America highlights a truth that is increasingly important to acknowledge: Everyone who speaks English does so with an accent. It’s important to acknowledge this truth because in everyday conversations, people commonly talk about an accent as something a person has or doesn’t have. Accents can connect consumers to places, products and history in culturally nuanced ways that, at times, general “neutral-accented standard American English” cannot.

But a significant fact remains: Multicultural audiences are substantial across the country, and ignoring their presence does a brand no good.

Brands would do well to prioritize diversity and inclusion during the talent selection process.

According to the online linguistics tutoring program Preplay.com, one in five adults speaks another language at home in the U.S., and one in 10 children is an emergent bilingual. These numbers are increasing and they foreshadow a multilingual future for American culture.

This increase in multicultural demographics, as well as the Black Lives Matter movement as a response to 2020’s wave of police brutality and murder of black men and women, sparked a knee-jerk reaction from brands. Pledges were made to change how they were going to do business in multicultural spaces or increase diversity, equity and inclusion initiatives. Enter the ethical dilemma of diversity in audio advertising.

The ethical dilemma is to neutralize or not

Brands and advertisers alike cannot in one breath promote sonic and cultural diversity as well as AI accent neutralization or minimization techniques. AI accent neutralization has the potential to whitewash and homogenize an increasingly diverse world of accents as a concession to those who struggle with understanding or prefer different accents from their own.

There’s definitely a relatability effect concerning Black, Indigenous, and people of color (BIPOC) voice over (VO) actors with accents delivering a message to other listeners within the BIPOC space. However, if brands limit the use of BIPOC accents to BIPOC silos, they further recenter white-midwestern accent VO standards and messaging behavior.

Advertisers’ ability to create a sense of relatability through storytelling and language is imperative, and brands need to embrace and utilize accents in non-pandering demonstrations of cultural attentiveness.

AI voice cloning technology may offer a happy middle ground. This solution refers to a virtual version of a real, individual person’s voice, and it may be the best way to establish a consistent, human-sounding, vocal identity for brands to use across assets (websites, audio advertising, call centers).

The screen-to-sonic future

Brands need a recognizable and consistent identity. As the future moves off-screen because of massively popular audio-first mediums (such as podcasts and music streaming), it’s not enough to only have a good logo or visual design. The exploration of a unified voice, be it on the company website, audio advertisement or automated call center, is essential. By using a consistent voice across all marketing and services, companies build trust with customers and unify their brand image and sound.

For those who decide to implement this approach, understanding the difference between consistency and avoidance is key. A brand opting to use one voice, regardless of accent, to establish brand consistency is fine and, in many cases, ideal. It is not, by any means, the same as neutralizing diverse accents because they might be viewed as off-putting or difficult to understand.

In the case for consistent voice talent, brands would do well to prioritize diversity and inclusion during the talent selection process. Rather than shying away from actors with distinct accents, consider how they might play a role in further developing your brand’s sonic identity. Know when and why your brand is using voice talent and be intentional with your selections.

Here’s to accents

Everyone has an accent, and it’s time we start celebrating each and every one. Creatives and advertisers that embrace real voice and culture win big. Let’s not popularize accent modification—let’s popularize better listening practices, instead.