next up previous
Next: References Up: ISO/MPEG Standardization Previous: Pre-Echo Control

ISO/MPEG Layers

Layer 1: Fig gif has already shown the block structure of the ISO/MPEG Audio encoder and decoder for Layers 1 and 2. The layer 1 coder uses fixed subbands blocks containing 12 decimated samples. Each scale factor is represented by 6 b and is transmitted for each subband block unless the bit allocation rule indicates that the subband block and its scalefactor need not be transmitted at all. For each 12-sample point the SMR is calculated via 512-point FFT. For each subband bit allocation selects one uniform midtread quantizer out of a set of 15 quantizer.

The decoding is straightforward:the subband sequences are reconstructed on the basis of the 12 sample subband blocks taking into account the decoded scalefactor and bit allocation information. Each time the subband samples of all 32 subbands have been calculated, they are applied to synthesis filterbank, which also includes interpolation and windowing operations, and 32 consecutive 16 b PCM format audio samples are calculated. In the ISO/MPEG subjective tests this Layer 1 codec had a mean MOS value of around 4.7 at a rate of 192 kb/s per monophonic channel.

Layer 2: The ISO/MPEG Audio Layer 2 coder is basically similar to Layer 1 coder but has a higher complexity and achieves better performance according to three modifications. First, the input to the psychoacoustic model is a 1024 point FFT leading to finer frequency resolution for the calculation of the global signal-to-mask ratio. Second the overall scalefactor side information is reduced by a factor of around 2: in each subband, blocks of 12 samples are formed and scalefactors of three adjacent 12 sample blocks are calculated. Depending on their relative values, only one, two or all three scalefactors are transmitted. In case of large dynamic changes, all scalefactors may have to be used. Third, a finer quantization with up to 16 b amplitude resolution is provided. On the other hand, the number of available quantizers decreases with increasing subband index(which keeps the side information small). The decoding follows that of layer 1. Due to the scale factor selection process, the descaling has to be based on 3*12=36 subband samples hence introducing additional delay. The total delay(without processing delay) of the layer 2 codec is 45 msec at 48 kHz sampling rate.

   figure241
Figure: Block structure of ISO/MPEG audio encoder and decoder, Layer III

Layer 3: Figure gif shows the block structure of the ISO/MPEG Layer 3 Audio coder that introduces many new features. The coder achieves a better performance, especially at low bit rates(64 kb/s per monophonic channel) due to an improved time-to-frequency mapping, an analysis-by-synthesis approach for the noise allocation, an advanced pre-echo control, and finally by nonuniform quantization with entropy coding. A higher frequency resolution is achieved by employing a hybrid filterbank, a cascade of polyphase filterbank and dynamically windowed MDCT transform. The dynamic window switching allows to switch from a higher frequency resolution(18 point MDCT) to lower frequency resolution(6 point MDCT) for subbands above a chosen index when a higher time resolution is necessary in order to control time artifacts(pre-echoes) during nonstationary periods of the signal. The MDCT output samples are nonuniformly quantized, thus providing both smaller mean squared errors and masking. Huffman coding based on 32 tabulated code tables is applied to represent the quantizer indices in an efficient way. In addition run length coding of zero value sequences increases the efficiency. A buffer maps the variable word length codewords of the Huffman code tables into a constant bit rate. In order to keep the quantization noise in all critical bands below the masking threshold an iterative analysis-by-synthesis method is employed whereby the process of scaling, bit allocation, quantization and coding of spectral data is carried out within two nested iteration loops.

The decoding follows that of the encoding process. At a rate of 64 kb/s per monophonic channel the mean MOS values for layers 2 and 3, as measured in ISO/MPEG subjective tests, are around 3.1 and 3.7, respectively. Obviously the higher complexity of the layer 3 coder pays off at low bit rates. At a 128 kb/s joint stereo bit rate seven of eight test items had a MOS value of 4 and above.


next up previous
Next: References Up: ISO/MPEG Standardization Previous: Pre-Echo Control

Esin Darici Haritaoglu
Wed Jun 18 22:26:24 EDT 1997