Layer 1: Fig
has already shown the block structure of the ISO/MPEG
Audio encoder and decoder for Layers 1 and 2. The layer 1 coder uses
fixed subbands blocks containing 12 decimated samples. Each scale
factor is represented by 6 b and is transmitted for each subband block
unless the bit allocation rule indicates that the subband block and
its scalefactor need not be transmitted at all. For each 12-sample
point the SMR is calculated via 512-point FFT. For each subband bit
allocation selects one uniform midtread quantizer out of a set of 15
quantizer.
The decoding is straightforward:the subband sequences are reconstructed on the basis of the 12 sample subband blocks taking into account the decoded scalefactor and bit allocation information. Each time the subband samples of all 32 subbands have been calculated, they are applied to synthesis filterbank, which also includes interpolation and windowing operations, and 32 consecutive 16 b PCM format audio samples are calculated. In the ISO/MPEG subjective tests this Layer 1 codec had a mean MOS value of around 4.7 at a rate of 192 kb/s per monophonic channel.
Layer 2: The ISO/MPEG Audio Layer 2 coder is basically similar to Layer 1 coder but has a higher complexity and achieves better performance according to three modifications. First, the input to the psychoacoustic model is a 1024 point FFT leading to finer frequency resolution for the calculation of the global signal-to-mask ratio. Second the overall scalefactor side information is reduced by a factor of around 2: in each subband, blocks of 12 samples are formed and scalefactors of three adjacent 12 sample blocks are calculated. Depending on their relative values, only one, two or all three scalefactors are transmitted. In case of large dynamic changes, all scalefactors may have to be used. Third, a finer quantization with up to 16 b amplitude resolution is provided. On the other hand, the number of available quantizers decreases with increasing subband index(which keeps the side information small). The decoding follows that of layer 1. Due to the scale factor selection process, the descaling has to be based on 3*12=36 subband samples hence introducing additional delay. The total delay(without processing delay) of the layer 2 codec is 45 msec at 48 kHz sampling rate.
Figure: Block structure of ISO/MPEG audio encoder and decoder, Layer III
Layer 3: Figure
shows the block structure of the ISO/MPEG Layer 3
Audio coder that introduces many new features. The coder achieves a
better performance, especially at low bit rates(64 kb/s per monophonic
channel) due to an improved time-to-frequency mapping, an
analysis-by-synthesis approach for the noise allocation, an advanced
pre-echo control, and finally by nonuniform quantization with entropy
coding. A higher frequency resolution is achieved by employing a
hybrid filterbank, a cascade of polyphase filterbank and dynamically
windowed MDCT transform. The dynamic window switching allows to switch
from a higher frequency resolution(18 point MDCT) to lower frequency
resolution(6 point MDCT) for subbands above a chosen index when a
higher time resolution is necessary in order to control time
artifacts(pre-echoes) during nonstationary periods of the signal.
The MDCT output samples are nonuniformly quantized, thus providing both
smaller mean squared errors and masking. Huffman coding based on 32
tabulated code tables is applied to represent the quantizer indices in
an efficient way. In addition run length coding of zero value sequences
increases the efficiency. A buffer maps the variable word length
codewords of the Huffman code tables into a constant bit rate. In
order to keep the quantization noise in all critical bands below the
masking threshold an iterative analysis-by-synthesis method is
employed whereby the process of scaling, bit allocation, quantization
and coding of spectral data is carried out within two nested iteration
loops.
The decoding follows that of the encoding process. At a rate of 64 kb/s per monophonic channel the mean MOS values for layers 2 and 3, as measured in ISO/MPEG subjective tests, are around 3.1 and 3.7, respectively. Obviously the higher complexity of the layer 3 coder pays off at low bit rates. At a 128 kb/s joint stereo bit rate seven of eight test items had a MOS value of 4 and above.