Audio Conversion & Audio Coding


Efficient transporting of audio has became possible thanks to digitalisation, which started in the eighties of the previous century, and the availability of TCP/IP networks.

But what is digital audio?

We should distinguish between converted digital audio and coded digital audio.

Converted Digital Audio

It is the first step in digitising audio. The rules were to avoid losing audio information (when compared to the original analogue audio). The high standard conversion from analogue to digital audio uses many bits per audio sample (16- bit/ sample, 20-bit/sample, 24-bit/sample and even 32-bit/sample per channel). It is needed when live recording and studio editing.

The Nyquist rules and Quantisation rules were fundamental to reaching this quality.

Nyquist tells us that you must sample at a frequency higher than the highest audio frequency you want to quantify. Studio work for audio going from 0 to 20kHz (the highest frequency man can hear). The professional audio sector decided that 48kHz would be the standard. It means that high-quality audio needs 48 000 samples per second.

On the other hand, the quantisation rule tells us that the number of bits per sample defines the amplitude error of the sampled audio. These quantisation errors will result in noise (quantisation noise).

We use a sampling frequency of 48kHz with samples of at least 16-bit per channel. We use 44,1 kHz and 16 bits per channel for the digital CD, but this is not the broadcasting standard!

One second of quality audio with sample frequency 48kHz and 16-bit resolution per sample will generate 48000 samples * 16 bits, equal to a bitstream of  768kbps. It is far too high for wireless transmission. A solution was eminent to reduce the number of bits while the quality remained high. Without a solution, digital radio would never been possible.

The conversion into digital time samples (samples per unit of time) using several bits per sample is called PCM. PCM is the abbreviation of Pulse Code Modulation. PCM It is the digital conversion signal containing the complete audio information. The samples can be considered pulses, and the code refers to the digital value that encodes the temporal value of the audio amplitude.

Audio coding

It is an additional layer after audio conversion. To reduce the number of bits, reduce the amount of audio information with minimum loss of quality. This additional layer always comes after the essential high-quality analogue/digital conversion. You can never code analogue audio directly. The analogue/digital conversion will always be the first step to take.

By this, we mean that the input to the Audio Coder will always be PCM-coded audio. If an analogue input exists, you can be sure a PCM encoder (Analog Digital Converter) exists in the Audio Coder.

The bit reduction (wrongly indicated as audio “compression”) aims to reduce the number of bits so that a recording takes less memory to memorise a recording or less bandwidth for broadcasting. This reduction of bits is one of the basics of DAB.

One of the methods is Huffman coding. Compare it to zipping a file. You get fewer bits, but you keep the integrity of the audio bits. However, the reduction of bits stays very low. By this, we do not mean that Huffman coding does not appear in DAB. Powerful bit reduction algorithms (MP2 (DAB), MP3 and HE-AAC (DAB+)) include the Huffman coding in addition.

Once you want to reduce the number of bits, you cannot keep the integrity and must omit some audio information. In the first instance, we look at the human ear and ask what information will not be noticed by the human ear. We call this perceptual coding.

If you start to think deeper, you can find many possibilities to reduce the audio information.

  • Why should you sample the band from 0 to 30 Hz? The human ear does not hear it so you can drop it.
  • Why should you use 48000 samples per second for the band from 30 to 1kHz if a 2.5kHz sampling frequency is sufficient?
  • Why should you send individually left and right stereo channels if both contain much of the same audio?
  • Why should you even send frequencies above 10khz when you can replicate the high frequencies with the information in the low-frequency band (see HE-AAC SBR)?

Etc.

There is much room to extract redundant audio information from the digital PCM-modulated signal. However, you must convert the audio samples in the time domain to an equal number of samples in the frequency domain. Then, you can easily apply perceptual algorithms to the frequency samples. Frequency samples can be re-coded more efficiently than time samples. Finally, the bit-reduced audio bitstream will no longer contain the time samples but the efficiently coded and treated frequency samples, where the perceptual algorithm deleted many non-perceptional data.

Back