Headroom in digital audio
To fully understand why headroom is so important, let’s first explain in a simple way how an analog audio signal is made digital.
How digital audio is created
Audio is made up of the sum of a large number of sines. Sinuses are the primary building blocks of all audio signals. Audio signals are always composed of sine waves of a certain frequency (in Hz, kHz) and a certain voltage in Volts (V), milli Volts (mV) or micro Volts (µV) (and a phase differences which we will ignore for this moment).
The frequency of the sine determines the tone and the amplitude of the sinus determines the volume. Such a signal is what is called an “analog” (audio) signal. You can also see such audio signal with an oscilloscope.
The “Stereo Tool” users among you will know that in “Stereo Tool” there is an oscilloscope function that can show IN and OUTPUT graphically:
From analog to digital
The oscilloscope displays all the sine waves (their summation) that can be found in the audio. To clearly demonstrate how analog audio is made digital, let’s look at a the sine wave with a fictitious frequency and a fictitious audio level:
We are now going to digitize this continuous sine wave by applying sampling.
Because a digital signal consists of “zeros and ones”, “samples” will be taken from the sine wave. The more samples one takes of that sine wave, the more detailed the sine wave is represented by the samples.
The number of samples taken (sampling rate) is expressed in “number of samples per second”. “kHz” (cycles per second) is used as the unit. In the example of our sinus wave above, 28 samples are taken per period of the sine wave.
– A CD uses 44.1 kHz (44,100 samples per second)
– DAB+ mainly uses 48 kHz (48,000 samples per second), 32 kHz (32,000 samples per second), 24 kHz (24,000 samples per second) and 16 kHz (16,000 samples per second) (most radio stations work with 48kHz as sampling frequency)
Highest audio frequency vs sampling rate
The highest audio frequency that can be correctly reproduced by the digital signal depends on the sampling rate used. In theory, the highest audio frequency that can be digitized correctly is half of the used sampling rate.
Unfortunately, these theoretical audio frequencies are slightly lower because absolutely no frequencies higher than half of the sampling frequency MAY APPEAR at the input of the “analog digital conversion”. This is necessary to avoid the “aliasing” effect. Aliasing causes unwanted components (sinuses) in the audio signal (distortion of the audio signal) that we can’t allow.
Due to the imperfect input filter, the practical frequencies will usually be two kHz lower:
– highest audio frequency at 48 kHz sampling rate is 22 kHz
– highest audio frequency at 44.1 kHz sampling rate is 20.05 kHz
– highest audio frequency at 32 kHz sampling rate is 14 kHz
– highest audio frequency at 24 kHz sampling rate is 10 kHz
– highest audio frequency at 16 kHz sampling rate is 6 kHz
The sound level (amplitude) of the sinus must also be given a “digital value”. The amplitude of our sinus will be determined by the “bit depth” (audio bit depth). For example, CD uses “16 bit”, DVD and Blu-Ray uses “24 bit”, Pro audio even uses “32 bit”.
“Bit depth”, also called “resolution”, defines the number of possible values that a sample of our sine can have and is indicated in “bits”. If the resolution is 1 bit, only two values are possible: 0 and 1. For each added resolution bit, the number of possible values is multiplied by two:
2 bits = 4 digital values
3 bits = 8 digital values
4 bits = 16 digital values
5 bits = 32 digital values
6 bits = 64 digital values
7 bits = 128 digital values
8 bits = 256 digital values
9 bits = 512 digital values
10 bits = 1 024 digital values
11 bits = 2 048 digital values
12 bits = 4 096 digital values
13 bits = 8 192 digital values
14 bits = 16 384 digital values
15 bits = 32 768 digital values
16 bits = 65 536 values (Compact Disc)
20 bits = 1 048 576 values
24 bits = 16 777 216 values (Digital Processing and Digital Mixing Consoles)
32 bits = 4 294 967 296 values (Digital Mastering)
The more bits one takes, the more precisely we can capture the amplitude of the audio signal. In practice, this results in higher dynamics and better signal-to-noise ratio. We have put the 3 most used bit depths for audio broadcasting in bold.
Bit depth vs noise level
Bit-resolution and Signal-to-Noise level is strictly connected with each other. We don’t want to go too deep into this matter, it would lead us way too far. The rule is… more bits results in less amplitude errors and the less amplitude errors the less noise.
Important is to remember that “for every 1 bit of added resolution, the dynamic range over which a signal can be correctly recorded increases by 6 dB“.
In other words, we see that for every additional bit, the dynamic range increases by 6dB.
Maybe it is confusing because when we speak about power 3dB is the double. However, if speak audio it is 6dB because we don’t speak about power but about voltage. Voltage and power have a squared relation P=U²/R. In logarithmic the square becomes a multiplication by 2! So, instead of 3dB for double power it becomes 6dB for double voltage!
On the audiometer of “Adobe Audition (CS6)” (with in this example 24 dB range) those bits can be represented as follows:
Where can things go wrong in practice?
The most important thing when streaming is that the volume of the audio is set correctly. The audio volume cannot go higher than the dynamic range, ie. the number of bits available.
With an audiometer for digital audio, the scale stops at 0dB. That is why it is also called “0dB full scale (fs)”. Higher is not possible because it’s the end of the scale!
When we compare the scale of an audiometer for digital audio with the scale of an analog audio level meter, there is a very big difference! After all, with a VU meter for recording analog audio (such as those on cassette decks, tape recorders and mixing consoles) the scale goes higher than 0dB.
Let’s put both scales together. We compare the scale of a dB(fs) meter (according to IEC 60268-18) with an (American) VU meter (according to IEC 60268-17):
The rule is: “zero dB(fs)” should be considered as a wall. Going beyond that “zero dB(fs)” is basically impossible because, simply put, there are “no more bits left” to encode the audio.
If there is a violation of this rule, it will immediately be heard in the form of crackling. First in the peaks of the music and if you want to make the volume even higher, eventually only crackles (and even silence) will be heard.
So it is very important to apply a “safety margin”: “headroom”.
In the pro-audio world, “minus 18 dB(fs)” is generally used as the standard for the maximum output of the audiometer. One also speaks of “18 dB headroom“. In that case, there are “three bits” as spares that can absorb any spikes in the sound level of the audio.
Important to know:
4dB peaks can still occur, even with highly compressed audio.
The output of the decoder can be up to 1.7dB HIGHER than what you send into the encoder.
Also read the recommendation of the “International Telecommunication Union” regarding television and radio ITU-R BS.2054 (“Audio-levels and loudness”):
It shows that a level of 0 VU is even -20 dBfs. A space of 11dB is provided for the peaks so that a headroom of 9dB remains free to avoid clipping.
We see that many “poor DAB+ sound” complaints at (local) radios have to do with the “studio Link (STL)” overdrive and then probably also the overdrive of the “DAB+ encoder” (on the output of the STL, the encoding and decoding gain 1.7dB).
During our listening sessions and measurements, we discovered that a headroom of about “min 6 dB(fs)” is almost always used on the DAB+ muxes in Wallonia (Belgium). Broadcasters on the DAB+ muxen from the Netherlands have also been using 6dB headroom for a while.
With the help of the limiters and/or clippers in the sound processing, it can of course be ensured that the “zero dB(fs)” is not exceeded. But even then, it is possible that the sine wave after the “digital to analog converter (DAC)” is still overdriven. The cause is the conversion from time domain to frequency domain and the perceptual coding in the encoder, a story in itself. Important is that one is aware that it can happen.
Of course, many radio broadcasters want to sound as loud as possible. Each local radio must therefore decide for itself how far it wants to go in this “war of loudness“.
Perhaps the (local) radio broadcasters should consult together to use “min 6dB(fs)” (or even an even larger headroom) as the standard for maximum output, just like in Wallonia and the Netherlands? This way everyone can still set a certain loudness without overloading the (digital) audio.