With more than 13,000 skills in the Alexa Skills Store and counting, Alexa developers like you are creating imaginative and new voice experiences every day. And many of you are using audio clips to take your skills to the next level. Some of you are incorporating audio to enhance the voice experience, even adding a second (or even third!) voice personality into the interaction.
With the Alexa Skills Kit (ASK), you can incorporate audio clips beyond Alexa’s text-to-speech (TTS) voice into your skills. There are several different ways you can include audio:
- Short audio files (like sound effects) between TTS responses using Speech Synthesis Markup Language (SSML)
- Longer audio content with playback control (like podcasts) using the AudioPlayer Interface
- Frequently updated content (like news headlines) using the Flash Briefing Skill API
These audio capabilities allow you to create richer, more immersive voice experiences. However, skill builders often encounter challenges with inconsistent volume or loudness; the audio is either too loud or too quiet compared to Alexa’s TTS. Users leave negative feedback in skill reviews when they feel the difference in audio loudness disrupts their experience, even if they enjoy the content, like:
- “Please fix volume. You can’t hear anything.”
- “Love this, but want to be able to hear it.”
- “I had to disable this skill because it is so much LOUDER than the rest of my flash briefing. ”
That’s why it’s essential to analyze your audio file, find the loudness measurement, and adjust the audio loudness as needed. Here’s how to check and set your skill’s audio volume.
Audio Spec Requirements
To start, let’s check your audio formats. Below are the spec requirements for the three formats you can use with your Alexa skills:
SSML
- Format: MPEG Version 2 Layer III
- Bitrate: 48 kbps
- Sample rate: 16000 Hz
- Up to five audio clips per response with a combined length of 90 secs
Audio Player
- Formats: MP3, AAC/MP4, HLS, PLS and M3U
- Bitrates: 16kbps to 384 kbps
Flash Briefing
- Format: MP3
- Bitrate: 256kbps mono or stereo
Recommended Loudness Specification
The loudness of your audio content should be relative to Alexa’s TTS volume. Users shouldn’t need to adjust the volume of their Alexa device in order to comfortably hear the audio content.
We recommend the following volume specs for your audio content (for full audio requirements, see below):
- Program loudness: -14.0 dB LUFS/LKFS
or
Total RMS value: between -15 to -13 dB
- True-peak value: should not exceed -2 dBFS
For a more detailed, technical explanation about loudness concepts, check out this guide on loudness.
Step-by-Step Guide to Verify Loudness of Audio
First, remember to perform regular ear checks of your audio content as you build your skill. This is particularly helpful in balancing all of the audio in your skill and against other content. Even if you follow the guide below as well as our recommended specifications, nothing beats ear checks by live users. Ask those around you to listen to a sample of your content on a device and provide their feedback.
The following steps will show you one way to analyze and normalize your audio content loudness using Audacity, a free open-source tool, with the dpMeter2 plugin.
Analyzing Audio Content Loudness
1. Load audio file into Audacity.
2. Choose dpMeter2 from the Effect menu. If this is the first time using dpMeter2, you’ll need to change the measurement type to LUFS. Select the RMS button and change the setting to "EBU R128." (The background should change from orange to blue.
3. Press the Play button and let the meter measure for at least 30 seconds.
4. After the dpMeter2 has finished measuring, find the average LUFS in the Integrated Loudness field.
Per our recommended specifications, this measurement should be -14.0 dB LUFS.
Adjusting Audio Content Loudness
Note: Keep in mind, adjusting any part of an audio file could impact the overall quality of the audio. For example, pushing the highs and lows outside the standard ranges will distort the sound.
1. Analyze your audio to find the current LUFS measurement. For this example, the dpMeter2 measures -9.8 dB LUFS for the Integrated Loudness.
2. Calculate the adjustment (increase or decrease) needed to normalize the loudness.
Current LUFS -9.8 dB
Target LUFS -14.0 dB
GAIN Difference -4.2 dB
3. Set the difference calculation in the GAIN Control setting and click Apply.
4. Reset the dpMeter2 and run the analysis again to check the new LUFS loudness.
5. If your calculations are correct, the new reading should be close to -14.0 dB LUFS.
More Examples
Once you analyze and correct audio loudness, you’ll be able to provide detailed feedback to your content creators on how future audio should be adjusted.
Audio content that measures as too loud:
Audio content that measures as too quiet:
Follow the tips outlined above as you incorporate audio into your skills. And remember tha loudness can have big impact on the user experience. And audio clips, when done right, can delight users and keep them coming back over time.
Build a Skill, Get an Echo Dot
The Alexa Skills Kit (ASK) enables developers to build capabilities, called skills, for Alexa. ASK is a collection of self-service APIs, documentation, tools, and code samples that make it fast and easy for anyone to add skills to Alexa.
Developers have built more than 13,000 skills with ASK. Explore the stories behind some of these innovations, then start building your own skill. Once you publish your skill, apply to receive a free Echo Dot. This promotion is available in the US only. Check out our other promotions in the UK, Germany, and India.
Source: Alexa