Digital representation of analog audio. Brief educational program



Dear readers, my name is Felix Harutyunyan. I am a student, a professional violinist. In this article I want to share with you an excerpt from my presentation, which I presented at the University of Graz Music and Theater on the subject of applied acoustics.

Consider the theoretical aspects of converting an analog (audio) signal to digital.
The article will not be comprehensive, but there will be hyperlinks in the text for further study of the topic.

What is the difference between digital audio and analog?


An analog (or continuum) signal is described by a continuous function of time, i.e. It hasa continuous line with a continuous set of possible values ​​(Fig. 1).

fig. 1


A digital signal is a signal that can be represented as a sequence of defined digital values. At any given time, it can take only one definite final value (Fig. 2).

fig. 2


The analog signal in the dynamic range can take on any value. An analog signal is converted to digital using two processes - discretization and quantization . The process queue is not important.

Discretization is the process of recording (measuring) the value of a signal at certain intervals (usually equal) of time (Fig. 3).

fig. 3


Quantization is the process of dividing the signal amplitude range into a certain number of levels and rounding the values ​​measured during sampling to the nearest level (Fig. 4).

fig. 4


Discretization breaks the signal in the time component (vertically, Fig. 5, left).
Quantization brings the signal to the given values, that is, it rounds the signal to the levels closest to it (horizontally, Fig. 5, to the right).

fig. 5


These two processes create a kind of coordinate system that allows you to describe the audio signal with a specific value at any time.
Digital is a signal to which discretization and quantization are applied. Digitization takes place in an analog-to-digital converter (ADC) . The larger the number of quantization levels and the higher the sampling frequency, the more accurately the digital signal corresponds to the analog one (Fig. 6).

fig. 6


Quantization levels are numbered and a binary code is assigned to each level . (fig. 7)

fig. 7


The number of bits that are assigned to each quantization level is called the bit depth or quantization depth (eng. Bit depth). The higher the bit depth, the more levels can be represented in binary code (Fig. 8).

fig. 8.


This formula allows you to calculate the number of quantization levels:

If N is the number of quantization levels,
n is the bit depth, then

N=2n



Typically, bits of 8, 12, 16, and 24 bits are used. It is easy to calculate that for n = 24 the number of levels is N = 16,777,216.

At n = 1, the audio signal will turn into Morse code: either there is a “knock” or not. There is also a 32 bit floating point. A conventional compact Audio-CD has a capacity of 16 bits. The lower the bit depth, the more values ​​are rounded and the greater the quantization error.

A quantization error is the deviation of a quantized signal from an analog, i.e. difference between input valueX and quantized valueX (XX )

Large quantization errors result in severe distortion of the audio signal (quantization noise).

The higher the bit depth, the more insignificant error quantization and bettersignal / noise ratio(Signal-to-noise ratio,and vice versa at a low bit depth grows noise (Fig 9.).

fig. 9


Bit depth also determines the dynamic range of the signal, that is, the ratio of the maximum and minimum values. With each bit, the dynamic range grows by about 6dB ( decibels ) (6dB is 2 times; that is, the grid becomes denser, gradation increases).

fig. 10. Noise intensity at bit depths of 6 bits and 8 bits


Quantization (rounding) errors due to an insufficient number of levels cannot be corrected.

quantization noise


signal amplitude at 1-bit (top) and 4-bit


Audio Example 1: 8bit / 44.1kHz, ~ 50dB SNR
Note: If audio files cannot be played online, please download them.


Audio Example 1


Audio Example 2: 4bit / 48kHz, ~ 25dB SNR


Audio Example 2


Audio Example 3: 1bit / 48kHz, ~ 8dB SNR


Audio Example 3


Now about sampling.

As mentioned earlier, this is a vertical splitting of the signal and measuring the value of a value after a certain period of time. This interval is called the sampling period or sampling interval. The sampling rate , or sampling rate (the well-known sample rate) is the value inverse to the sampling period and is measured in hertz . If
T is the sampling period,
F is the sampling frequency, then
F=1/T

In order for an analog signal to be converted back from a digital signal (to accurately reconstruct a continuous and smooth function from discrete, “point” values), one must follow the Kotelnikov theorem (Nyquist – Shannon theorem).

Kotelnikov's theorem states:
( ) , , , .
Do you know the number 44.1kHz? This is one of the standards for sampling frequency, and this number was chosen precisely because the human ear hears only signals up to 20kHz. The number 44.1 is more than twice as large as 20, so all frequencies in a digital signal accessible to the human ear can be converted in analog form without distortion.

But after all 20 * 2 = 40, why 44.1? It's all about compatibility with PAL and NTSC standards . But today we will not consider this moment. What will happen if you do not follow Kotelnikov’s theorem?

When a frequency is found in an audio signal that is higher than 1/2 of the sampling frequency, then aliasing occurs - an effect that leads to the superposition, indistinguishability of various continuous signals when they are sampled.

Aliasing


As can be seen from the previous picture, the sampling points are located so far from each other that when interpolating (i.e. converting the discrete points back to an analog signal), a completely different frequency is restored by mistake.

Audio Example 4: A linearly increasing frequency from ~ 100 to 8000Hz. Sampling frequency - 16000Hz. No aliasing.


Spectral analysis


Audio Example 5: The same file. Sampling frequency - 8000Hz. There is aliasing


Spectral analysis


Example:
There is audio material where the peak frequency is 2500Hz. Therefore, the sampling frequency must be selected at least 5000Hz.


The next characteristic of digital audio is bitrate . Bitrate is the amount of data transmitted per unit of time. Bit rate is usually measured in bits per second (Bit / s or bps). Bitrate can be variable, constant or averaged.

The following formula allows you to calculate the bitrate (valid only for uncompressed data streams):

Bitrate = Sample rate * Bit * Number of channels

For example, the Audio-CD bitrate can be calculated as follows:
44100 (sample rate) * 16 (bit) * 2 (number of channels, stereo ) = 1411200 bps = 1411.2 kbit / s

With constant bitrate (CBR), the transmission of the volume of the data stream per unit time does not change throughout the transmission. The main advantage is the ability to fairly accurately predict the size of the final file. Of the minuses - not the optimal ratio of size / quality, since the "density" of the audio material during a piece of music changes dynamically.

When encoding with variable bit rate (VBR), the codec selects the bit rate based on the desired desired quality. As the name implies, the bitrate varies over the encoded audio file. This method gives the best quality / size ratio of the output file. Of the minuses: the exact size of the final file is very poorly predicted.

The average bitrate (ABR) is a special case of VBR and takes an intermediate place between constant and variable bitrate. The specific bit rate is set by the user. The program still varies it in a certain range, but does not go beyond a given average value. 

For a given bitrate, VBR quality is usually higher than ABR. The quality of ABR, in turn, is higher than CBR: VBR> ABR> CBR.

ABR is suitable for users who need the benefits of VBR encoding, but with a relatively predictable file size. For ABR, encoding in 2 passes is usually required, since on the first pass the codec does not know which parts of the audio material should be encoded with the maximum bitrate.

There are 3 methods for storing digital audio material:

  • Uncompressed (raw) data
  • Lossless data
  • Lossy Compressed Data

Uncompressed (RAW) data format


contains just a sequence of binary values.
It is in this format that audio material is stored in an Audio-CD. An uncompressed audio file can be opened, for example, in Audacity. They have the extension .raw, .pcm, .sam, or have no extension at all. RAW does not contain a file header ( metadata ).

Another format for storing uncompressed audio stream is WAV . Unlike RAW, WAV contains a file header.

Lossless Audio Formats


The compression principle is similar to archivers (Winrar, Winzip, etc.). Data can be compressed and decompressed again any number of times without loss of information.

How to prove that with lossless compression, the information really remains untouched? This can be proved by the method of destructive interference . We take two audio tracks. In the first track, we import the original, uncompressed wav file. In the second track, we import the same audio file, losslessly compressed. Invert the phase of one of the tracks (mirror image). When playing both tracks simultaneously, the output signal will be silent.

This proves that both files contain absolutely identical information (Fig. 11).

fig. eleven


Lossless compression codecs: flac, WavPack, Monkey's Audio ...

If lossy compression

the emphasis is not on avoiding information loss, but on speculation with subjective perceptions ( Psychoacoustics ). For example, an adult's ear usually does not perceive frequencies above 16kHz. Using this fact, a lossy compression codec can simply hard cut off all frequencies above 16kHz, since "no one will hear the difference anyway."

Another example is the masking effect . Weak amplitudes that overlap with strong amplitudes can be reproduced with lower quality. At loud low frequencies, quiet mid frequencies are not captured by the ear. For example, if there is sound at 1kHz with a volume level of 80dB, then 2kHz sound with a volume of 40dB is no longer heard.

This uses the codec: 2kHz-sound can be removed.

Spectral analysis of mp3 codec with different compression levels


Lossy compression codecs: mp3, aac, ogg, wma, Musepack ...

Thank you for your attention.

UPD:
If for some reason the audio files do not load, you can download them here: cloud.mail.ru/public/HbzU/YEsT34i4c

All Articles