Audio bit rate compression


How does it work ?

If you want to convert an audio signal linearly (without bit-compression) into a digital signal without any loss of quality, you will need a relatively large number of bits to maintain the original quality. Such a digital signal is called a “PCM-coded signal” (Pulse Code Modulation). PCM-encoding is also used for Compact Disc, the AES/EBU and AES67 digital audio interfaces.

In PCM, the amplitude of an audio signal is sampled with very short pulses (very small time lapses). Below is an example how a sine wave is sampled.

Analog sine wave
Digitized sine wave with 28 samples

The continuous analog signal is converted into a series of consecutive pulses corresponding to the amplitude of the analog signal at the time of sampling. The frequency at which sampling is performed is called the “sampling frequency”.

The size of each pulse is measured and converted into a binary number with a certain precision. The precision of conversion is called the resolution. If you use numbers of 16-bit, then you have a resolution of 16-bit (also called bit depth).

A Compact Disc (CD) has a bit depth of 16-bit (per channel) with a sample rate of 44.1 kHz (44,100 samples/16-bit values ​​per second). The left and right signals are sampled separately, each at 16-bit and 44,100 times per second. A stereo CD therefore delivers 2x 44,100 samples per second with values ​​of each 16-bit resolution! These are 88,200 values ​​of 16bit per second or 1 411 200 bits/s.

Using a sampling of 48.000 samples per second, one arrives at 1 536 000 bits/s.

If you know that DAB can only send 2,304 kbit/s, including the error correction, you won’t get very far with PCM-signals for DAB. Using a sampling rate of 44.1 kHz and a resolution of 16-bit means that the analog signal is transformed in 44,100 samples of 16 bits. A small calculation tells us that per second we need 705,600 bits to digitize 1 second of PCM-audio. Such high bit rates are acceptable when transport will be done over very small distances. However, for many applications such as DAB it is impossible to use PCM. It would mean that only one radio station could fit in a DAB ensemble. It would erase all the benefits of DAB.

The solution is bit rate compression, which means reducing the number of bits in the audio information. Therefore ways have been found to get this bit rate down. Of course, this comes at the cost of some quality.

Fortunately, there are methods that can limit the loss of quality to 3% loss (97% quality instead of 100%) with only 10% of the original bit rate. Knowing frequency spectrum for radio broadcasts is a scarce resource that many stations like to make use of, one has to put the 3% loss of quality into perspective.

Think about this:

Given all the misery on the FM band over the past 40 years, it is better to see 10 radio stations with 97% quality than 1 radio station with 100% quality.

Not all listeners will be able to reproduce such a 100% perfect sound quality. A small loss in audio quality is therefore perfectly acceptable for radio broadcast.

Conclusion:

Audio bit rate compression and the minor loss of sound quality is a necessary evil that everyone has to live with. Moreover, the human ear is almost incapable of detecting the 3% loss of quality.

Further audio data compression can be divided into two types of compression explained in the following paragraph.

“Lossless” audio data compression

As the words say, with “lossless” compression you don’t lose any quality. This compression method can exactly reconstruct the original signal. A well-known example is FLAC and ALAC. Most of these kind of data compression algorithms are based on Huffman codes.

The principle is simple: the most common bit sequences get a shorter code and less common bit sequences get a longer code. The Morse code already applies this principle: the very frequently used letter e (•) has a shorter code than the less frequently occurring letter y (- • – -).

In the case of audio, this principle is applied dynamically to blocks of audio data. A small bit rate reduction can thus be achieved without any loss of data (or audio quality). However, the reduction of number of bits required will be rather small.

“Lossy” audio data compression

However, “lossless” compression is insufficient to lower the bit the ear rate so that it becomes possible to use this compression for radio applications such as DAB or DAB+. That’s why they switch to “lossy” compression methods (lossy = loss). As the word implies, an exact copy is compromised here and a (minor) loss of quality is accepted. Fortunately, ways have been developed to limit this loss to a few percent. They mainly use the imperfection of the human ear. The information that the ear has a small or zero chance to detect is removed from the signal.

This type of compression is called “perceptual” coding.

In “lossy compression” the audio signal will be unraveled into pieces of audio information. Those pieces are ranked on a scale from “very important” to “less important”. This ranking uses a ‘hearing curve’ that corresponds to the functioning of human hearing. After that, certain pieces of information are thrown away. Chances are, if you throw out 80% of the least important information, you won’t even hear the difference in the audio signal.

But how can audio information be classified into “very important information” and “less important” or even “superfluous information”?

Simply explained, the AAC bit compression method will perform a mathematical transformation on the PCM signal for this. This transformation is called “Modified Discrete Cosine Transform” or MDCT. This is a transformation from the time domain (taking a sequence of samples from the audio signal as input) to the frequency domain (providing a set of frequency components at the output) that extracts all the cosines from which the signal is composed. Then the obtained data is passed through a perceptual algorithm, leaving only the most necessary components that we can hear as humans. These components are then quantized (converted into discrete values) and forwarded as the AAC signal. The decoder will convert the received values ​​from the frequency domain back to the time domain to create an audible audio signal.

Important for DAB+:

The window (number of samples) that indicates how many samples are processed for one conversion (HE-AAC) has two window lengths i.e. 1024 samples and 960 samples. The time interval that samples are continuously processed must match the interval that is transmitted with the HE-AAC information.

This is important because most Internet stream decoders only recognize the most frequently used window of 1024 samples.

However, DAB+ works with the rarely used 960 samples window! This is important to know. We will certainly elaborate on this in the near future.

Note: DAB was developped for frames of 24ms. At 48000 samples per second (48kHz), 960 samples take 960/48000 = 0.02 s or 20ms. The result of 6 of these conversions then fits into the HE-AAC DAB+ superframe of 120ms. The superframe is transmitted in 5 DAB-frames of 24ms.

Back