ElevenLabs Unveils Audio Tags for Enhanced AI Speech Performance

ElevenLabs has launched an advanced feature in its v3 update, known as Audio Tags, which is designed to enhance AI-generated speech by adding situational awareness. This development allows users to control not only the content of the AI’s speech but also how it is delivered, according to ElevenLabs. The integration of these tags can significantly improve the naturalness of AI-driven conversations by adjusting tone, emotion, and pacing.

Transforming Narration into Performance

The introduction of Audio Tags enables the AI to perform rather than just read. For instance, in a football match highlight video, the AI can increase its intensity alongside the action: “He cuts past one defender — [EXCITED] here comes the cross — [SHOUTING] GOAAAL!” Similarly, in an audiobook, suspense can be heightened with tags like [WHISPERING] or [PAUSE].

Common Tags for Diverse Scenarios

Audio Tags offer a variety of emotional and physical cues, such as:

Emotional tone: [EXCITED], [NERVOUS], [FRUSTRATED], [TIRED]
Reactions: [GASP], [SIGH], [LAUGHS], [GULPS]
Volume & energy: [WHISPERING], [SHOUTING], [QUIETLY], [LOUDLY]
Pacing & rhythm: [PAUSES], [STAMMERS], [RUSHED]

The ability to layer these tags adds depth to the AI’s performance, allowing for nuanced delivery that can significantly alter the perceived emotion and impact of the content.

Creative Control for Developers and Storytellers

Eleven v3’s support for these tags is backed by a deeper contextual model, which can dynamically change tone within a line and manage interruptions while maintaining a natural flow. This improvement provides voice designers, game developers, and storytellers with a new level of creative control, transforming them from mere writers to directors of AI performances.

Choosing the Right Voice

Currently, Professional Voice Clones (PVCs) are not fully optimized for Eleven v3, which may result in lower quality compared to previous models. During this research preview stage, it is recommended to use Instant Voice Clones (IVC) or designed voices to fully leverage the new features of version 3. Optimizations for PVCs are expected to be available in the near future.

For more insights on this development, visit the official ElevenLabs blog.

Image source: Shutterstock

Source: https://blockchain.news/news/elevenlabs-unveils-audio-tags-enhanced-ai-speech-performance