First, a bit of background to get everyone on the same page. The current standard for high-fidelity music is "CD quality" which is quantitatively a 16 bit, 44.1khz, stereo, PCM (pulse code modulation) recording of music. The 16 bits show how accurately the volume measurements are taken. This means that it quantizes the continuous nature of audio volume into a discrete number ranging from -2^15 + 1 through 2^15 - 1, or -32767 -- +32767. The sample rate of 44.1khz means that it records the 16 bit measurement 44100 times per second. And stereo, of course, means that it records two separate channels simultaneously.
A quick comparison to DVD-A. The DVD-Audio spec has a six-channel recording of 96khz/24bits, or a two-channel recording of 192khz/24bits, something that audiophiles have dreamed of for some time. DAT, digital audio tape, is similar to CD, but samples at about 48khz.
A quick math calculation shows that "CD quality" audio streams 176,400kBps (kilobytes per second) or about 1.4Mbps. This is an immense amount of data to be streamed even on a high bandwidth connection. So the question that has been analyzed over and over again is how to reduce the amount of space sound takes up. This question has been answered to varying degrees of success.
The obvious way is to reduce either the number of bits per sample or the number of samples per second. The former compounds the fundamental problem of digital music, namely that continuous audio levels are quantized into exact volume levels. Quantizing the music further creates a noticible change, unpleasant to most people. (Try it yourself, by reducing a PCM encoded file [.wav] from 16 to 8 bit and listen to the difference.) The latter reduces the frequencies that can be heard. A perfect human ear can hear sounds of almost 20khz in frequency. The general rule of thumb is to record at least twice as many samples per second, hence 44.1khz.
It is possible to compress losslessly, that is, without any difference between the original audio file and the result of compressing and decompressing. One such codec is Shorten, developed by Softsound. The man page for shorten describes it this way:
Shorten reduces the size of waveform files (such as audio) using Huffman coding of prediction residuals and optional additional quantization. In lossless mode the amount of compression obtained depends on the nature of the waveform. Those composing of low frequencies and low amplitudes give the best compression, which may be 2:1 or better.
For people who trade live music online (see Etree.org), shorten is the only way to distribute exact copies of CD's online. A technical paper on shorten is available if you want more information. It is worth pointing out that shorten is a non-free codec, who's use is restricted by Softsound Inc.
There is one other emerging lossless codec out there, developed by Xiphophorus, called Ogg Squish. Unlike Shorten and MP3 (which I will get to later), Ogg Squish will be a free and open format. Also unlike Shorten, Ogg Squish is not available yet, though much of the code has been developed.
Those are the primary sources of lossless audio compression. Since their compression rate is marginally better than 2:1, however, music distribution is now usually accomplished using lossy codecs. The Moving Picture Experts Group (MPEG) is the leader in this respect, with many different lossy codecs available depending on processing power required and bandwidth required. One of the codecs has become famous: MPEG-1 Layer 3 (MP3).
Much of the following has been condensed from the second chapter of MP3: The Definitive Guide, by Scot Hacker.
The question on everyone's mind is how the MP3 codec can create such high-fidelity audio with about a 10:1 compression ratio. The heart of the beast is that it is a "perceptual" codec. That is, it works by eliminating the sounds that are not perceptible to the human ear, and compressing the end result. The perceptual part is the key:
Uncompressed audio, such as that found on CDs, stores more data than your brain can actually process. For example, if two notes are very similar and very close together, your brain may perceive only one of them. If two sounds are very different but one is much louder than the other, your brain may never perceive the quieter signal. And of course your ears are more sensitive to some frequencies than others. The study of these auditory phenomena is called psychoacoustics, and quite a lot is known about the process; so much so that it can be quite accurately described in tables and charts, and in mathematical models representing human hearing patterns.
The book has much more to say on the matter on psychoacoustics, so be sure to read that if you have any interest in investigating that specific part of a lossy codec.
The other notable features of MP3 are variable bit rate (VBR) and Joint Stereo mode. VBR is a technique where the bit rate is variable throughout the music file, such that parts of the song that require more bits to describe can simply use more bits. This is opposed to constant bit rate encoding where each second of music is assigned exactly the same amount of space. Joint stereo encodes stereo channels together, with a bit of "steering" information to allow some stereo sound to be preserved.
MP3 would be a great codec, were it not for one thing: the algorithm to encode MP3 files is encumbered by a patent held by the Fraunhofer Institute. This makes any implementation of it illegal in countries that have patent protection. Although they do not go after free implementations, any commercial implementations must pay patent royalties.
These restrictions have (in part) prompted the Xiphophorus team to create a free perceptual audio codec named Ogg Vorbis. The documentation on Ogg Vorbis is still severely lacking, so it is not possible to obtain an accurate analysis of the codec. It is similar in bit rate and quality to MP3, and currently the goal of the project is to create audio files that have better fidelity at a lower bit rate. It is still not a finished product, and with very few files encoded in the .ogg format, adoption is taking much longer than expected.
MP3 has gained so much popularity because it creates small audio files with a very reasonable quality. Ogg Vorbis is still in beta, and Shorten is geared toward live music and burning CD's, since it achieves only a 2:1 compression ratio. More audio codecs are sure to come, and MP3 will eventually be relegated to second class status.
This is the second article in what I hope to be a series of columns that cover current technological issues in depth. The first was on HDTV