next up previous
Next: ISO/MPEG Standardization Up: Wideband Speech and Previous: Perception Based Coding

AUDIO CODING

With the advent of mass marketed devices for digital coding and storage of high fidelity audio , including the Compact Disc (CD), the digital audio tape (DAT), and most recently the mini-disk (MD), and the digital compact cassette (DCC), the area of efficient digital coding of high fidelity audio has become a topic of great interest and a great deal of activity. Also driving this activity is the need for a digital audio standard for the sound for high definition TV (HDTV) and for digital audio broadcasting (DAB) of FM channels.

To appreciate the importance of coding digital audio efficiently and with quality which is essentially indistinguishable from that of an original CD , consider the bit rate that current CD's use to code audio. The sampling rate of a CD is approximately 44.1 kHz and each sample ( for both channels of a stereo broadcast) is coded with 16 bit accuracy. Hence a total of or 1.41 Mbps is used to code digital audio on a CD. Current state-of-the-art coding algorithms, such as the Perceptual Audio Coder or PAC developed at AT&T Bell Labs, are capable of coding 2 channels of digital audio at a total bit rate of 128 kbps with essentially no loss in quality from that of the original CD coding [6,2].

Typical application areas for digital audio are in the fields of audio production, program distribution and exchange, digital sound broadcasting (DSB), digital storage ( archives, studios, consumer electronics ). Digital audio is also useful for interpersonal communications such as video conferencing and multimedia applications, and for enhanced quality TV systems.

First steps to reduce audio bit rates have been based on techniques of instantaneous companding (e.g a conversion of uniform 14 bit PCM into a 11 bit nonuniform PCM presentation) and on various forms of block companding such as 16 to 14 bit scaling in digital satellite broadcasting systems. The BBC has used the ``near instantaneously companded audio multiplex'' (NICAM) technique for the transmission of sound in broadcast television networks. Such coders provide sufficient dynamic range for audio coding, but they do not reduce bit rates efficiently since they neither exploit statistical dependence between samples nor auditory masking effects [3].

A good to excellent audio coding performance has been obtained more recently with various frequency domain coders, both in the classes of sub-band coding (SBC) and adaptive transform coding (ATC). The difference between these proposed coders are in the number of spectral components and in the strategies for an efficient quantization of spectral components and the masking of the resulting coding errors. Frequency domain coding offers a more direct way than predictive coding for noise shaping and suppression of frequency components that needs not to be transmitted. In these coders source spectrum is split into frequency bands, each frequency component is quantized separately. Therefore the quantization noise associated with a particular band is contained within that band. The number of bits used to encode each frequency component varies: components being subjectively more important, are quantized more finely, while components being subjectively less important, have fewer bits allocated, or may not be encoded at all.A dynamic bit allocation has to be employed that is controlled by the spectral short-term envelope of the source signal, and therefore bit allocation information has to be transmitted to the decoder efficiently as side information.

More recently , transform-based audio coding schemes have been proposed and tested. One example is Dolby's 128 kbps AC-2 coder, a modified version of which has been evaluated in the CCIR process of digital audio broadcast standardization and has shown to be close in performance to the ISO/MPEG Layer 2 audio coding algorithm at its fixed bit rate. A second example is AT&T's Perceptual Audio Coder (PAC) that extends the idea of perceptual coding to stereo pairs. It uses both L/R (left/right) and M/S (sum/difference) coding, switched in both frequency and time in a signal dependent fashion.

Sony's Adaptive Transform Acoustic Coder (ATRAC) has been developed for portable digital audio, specifically for Sony's magneto optical MiniDisc (MD). The coder uses hybrid frequency mapping employing a signal splitting into three sub-bands(0-5.5,5.5-11 and 11-22 kHz) followed by a suitable dynamically windowed MDCT transforms.





next up previous
Next: ISO/MPEG Standardization Up: Wideband Speech and Previous: Perception Based Coding



Generated by latex2html-95.1
Tue Jan 23 15:51:32 EST 1996