Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Inside: Audio Compression

By zavyman in Media
Sat Nov 11, 2000 at 06:29:14 AM EST
Tags: Music (all tags)
Music

MP3 has taken all the headlines simply because it does one thing well: it makes it easy to download high quality music. It is not the only codec out there, but it is no mystery that it does so well, even in the face of better alternatives.


First, a bit of background to get everyone on the same page. The current standard for high-fidelity music is "CD quality" which is quantitatively a 16 bit, 44.1khz, stereo, PCM (pulse code modulation) recording of music. The 16 bits show how accurately the volume measurements are taken. This means that it quantizes the continuous nature of audio volume into a discrete number ranging from -2^15 + 1 through 2^15 - 1, or -32767 -- +32767. The sample rate of 44.1khz means that it records the 16 bit measurement 44100 times per second. And stereo, of course, means that it records two separate channels simultaneously.

A quick comparison to DVD-A. The DVD-Audio spec has a six-channel recording of 96khz/24bits, or a two-channel recording of 192khz/24bits, something that audiophiles have dreamed of for some time. DAT, digital audio tape, is similar to CD, but samples at about 48khz.

A quick math calculation shows that "CD quality" audio streams 176,400kBps (kilobytes per second) or about 1.4Mbps. This is an immense amount of data to be streamed even on a high bandwidth connection. So the question that has been analyzed over and over again is how to reduce the amount of space sound takes up. This question has been answered to varying degrees of success.

The obvious way is to reduce either the number of bits per sample or the number of samples per second. The former compounds the fundamental problem of digital music, namely that continuous audio levels are quantized into exact volume levels. Quantizing the music further creates a noticible change, unpleasant to most people. (Try it yourself, by reducing a PCM encoded file [.wav] from 16 to 8 bit and listen to the difference.) The latter reduces the frequencies that can be heard. A perfect human ear can hear sounds of almost 20khz in frequency. The general rule of thumb is to record at least twice as many samples per second, hence 44.1khz.

It is possible to compress losslessly, that is, without any difference between the original audio file and the result of compressing and decompressing. One such codec is Shorten, developed by Softsound. The man page for shorten describes it this way:

Shorten reduces the size of waveform files (such as audio) using Huffman coding of prediction residuals and optional additional quantization. In lossless mode the amount of compression obtained depends on the nature of the waveform. Those composing of low frequencies and low amplitudes give the best compression, which may be 2:1 or better.

For people who trade live music online (see Etree.org), shorten is the only way to distribute exact copies of CD's online. A technical paper on shorten is available if you want more information. It is worth pointing out that shorten is a non-free codec, who's use is restricted by Softsound Inc.

There is one other emerging lossless codec out there, developed by Xiphophorus, called Ogg Squish. Unlike Shorten and MP3 (which I will get to later), Ogg Squish will be a free and open format. Also unlike Shorten, Ogg Squish is not available yet, though much of the code has been developed.

Those are the primary sources of lossless audio compression. Since their compression rate is marginally better than 2:1, however, music distribution is now usually accomplished using lossy codecs. The Moving Picture Experts Group (MPEG) is the leader in this respect, with many different lossy codecs available depending on processing power required and bandwidth required. One of the codecs has become famous: MPEG-1 Layer 3 (MP3).



Much of the following has been condensed from the second chapter of MP3: The Definitive Guide, by Scot Hacker.

The question on everyone's mind is how the MP3 codec can create such high-fidelity audio with about a 10:1 compression ratio. The heart of the beast is that it is a "perceptual" codec. That is, it works by eliminating the sounds that are not perceptible to the human ear, and compressing the end result. The perceptual part is the key:

Uncompressed audio, such as that found on CDs, stores more data than your brain can actually process. For example, if two notes are very similar and very close together, your brain may perceive only one of them. If two sounds are very different but one is much louder than the other, your brain may never perceive the quieter signal. And of course your ears are more sensitive to some frequencies than others. The study of these auditory phenomena is called psychoacoustics, and quite a lot is known about the process; so much so that it can be quite accurately described in tables and charts, and in mathematical models representing human hearing patterns.
The book has much more to say on the matter on psychoacoustics, so be sure to read that if you have any interest in investigating that specific part of a lossy codec.

The other notable features of MP3 are variable bit rate (VBR) and Joint Stereo mode. VBR is a technique where the bit rate is variable throughout the music file, such that parts of the song that require more bits to describe can simply use more bits. This is opposed to constant bit rate encoding where each second of music is assigned exactly the same amount of space. Joint stereo encodes stereo channels together, with a bit of "steering" information to allow some stereo sound to be preserved.

MP3 would be a great codec, were it not for one thing: the algorithm to encode MP3 files is encumbered by a patent held by the Fraunhofer Institute. This makes any implementation of it illegal in countries that have patent protection. Although they do not go after free implementations, any commercial implementations must pay patent royalties.

These restrictions have (in part) prompted the Xiphophorus team to create a free perceptual audio codec named Ogg Vorbis. The documentation on Ogg Vorbis is still severely lacking, so it is not possible to obtain an accurate analysis of the codec. It is similar in bit rate and quality to MP3, and currently the goal of the project is to create audio files that have better fidelity at a lower bit rate. It is still not a finished product, and with very few files encoded in the .ogg format, adoption is taking much longer than expected.

MP3 has gained so much popularity because it creates small audio files with a very reasonable quality. Ogg Vorbis is still in beta, and Shorten is geared toward live music and burning CD's, since it achieves only a 2:1 compression ratio. More audio codecs are sure to come, and MP3 will eventually be relegated to second class status.



This is the second article in what I hope to be a series of columns that cover current technological issues in depth. The first was on HDTV

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
I usually get my music by...
o downloading MP3's 55%
o purchasing cd's 33%
o burning shorten files onto cd 2%
o going to live concerts 5%
o I don't listen to music 2%

Votes: 72
Results | Other Polls

Related Links
o Shorten
o Softsound
o Etree.org
o technical paper on shorten
o Xiphophoru s
o MPEG
o second chapter
o Ogg Vorbis
o HDTV
o Also by zavyman


Display: Sort:
Inside: Audio Compression | 14 comments (9 topical, 5 editorial, 0 hidden)
Reference book sub-optimal. (3.00 / 3) (#6)
by Christopher Thomas on Sat Nov 11, 2000 at 02:33:23 AM EST

Much of the following has been condensed from the second chapter of MP3: The Definitive Guide, by Scot Hacker.

I skimmed this book when a co-worker got a copy of it, and was disappointed. The book gives an introduction to perceptual encoding and a medium-depth description of the MP3 format, and then abandons the topics and goes on to devote several chapters to the various MP3 players and resources out there.

A more useful reference would have gone through the intro and then devoted a chapter to in-depth analysis of perceptual encoding approaches, a chapter to detailed specs of the MP3 file format, and a few chapters to the various tradeoffs involved in implementing an MP3 codec. As it was, there was little of real substance present.

I still found it better than nothing - I have the signal processing background to flesh out the missing pieces and so design a perceptual, variable-bitrate codec now - but I certainly hope that better references exist.

Data Compression for Real Programmers (none / 0) (#8)
by westfirst on Sun Nov 12, 2000 at 01:56:02 PM EST

The book Data Compression for Real Programmers is a pretty good, basic introduction to the topic. But it doesn't go into the meaty detail that makes it possible to understand MP3s. Still, there's basic stuff explaining wavelets and how they work.

[ Parent ]
glaring technical error (2.50 / 2) (#7)
by mikpos on Sun Nov 12, 2000 at 01:25:30 PM EST

The bandwith of CD-quality sound (aka 1x CD) is about 150kilobytes/second, not 1.4MB/s -- you're off by a factor of 10.

Besides that, I thought the write-up was a bit fluffy/general for my tastes. I think most people involved in digital equipment (e.g. computers) have a pretty good understanding of how signals are stored. More detailed information on audio-specific compression techniques, either lossless (e.g. RAR's "multimedia" setting) or lossy (e.g. MP3).

Also, can article writers please please please include an "Other" option in any poll made? TIA.

Finally, why can't I post this as an editorial comment? Boo.

Your misteak (none / 0) (#9)
by JonesBoy on Mon Nov 13, 2000 at 09:16:50 AM EST

I hate to have to point this out, but...

44100samples/sec*16bits/sample*2channels=1411200 bits/second
1411200bits/second / 8bits/byte / 1024bytes/kbyte=172kilobytes/second

Which sounds about right to me. I think you were confusing MB(megabyte) and Mb (megabit)


Speeding never killed anyone. Stopping did.
[ Parent ]
oops (none / 0) (#10)
by mikpos on Mon Nov 13, 2000 at 09:35:48 AM EST

Yes, that was my error. Thanks.

[ Parent ]
Basics of lossless compression (4.00 / 1) (#11)
by zavyman on Mon Nov 13, 2000 at 12:26:13 PM EST

note to self: don't post story so quickly next time...

Standard data compression does not work to well with audio. I've tried it myself, it is time-consuming and has a very weak compression ratio. But there are tricks to compressing audio.

Shorten, like most audio compressors, breaks the audio file into blocks, by a method aptly titled blocking. If the block size is too small, any savings that may have been accomplished by approximation are lost, and if the block size is too large, the approximation becomes a poor model. Shorten uses about a 256 sample block size, but that is user configurable.

Approximation, in this case, is accomplished my linear predictive coding (LPC). Just as it seems, a simpler equation to model the sound wave, with only a small amount of space (7 bits) allocated to the coefficients. Using this model, the residuals (differences) are then recorded and compressed using Huffman encoding.

Sounds, taken in small blocks, are quite regular, allowing most of the waveform to be approximated easily. The goal, however, is lossless encoding, so that the differences are saved and compressed as well.



To put this in perspective, [according to the stats at the shorten tech. paper] gzip compressed a given sound file to 66% of the original size, at only 2.2 times real-time playback. Shorten compressed the file to 42.6% of the original at 13.4 times the speed of real-time playback.

From my experiences, live music usually compressed to about 60% of the original size. Shorten was also developed specifically for speech compression. The Xiphophorus team is working on Ogg Squish, which will hopefully use more CPU power to get a better compression.

pcm vs dsd (5.00 / 1) (#12)
by Defect on Wed Nov 15, 2000 at 11:15:35 AM EST

I know this is a column about digital audio compression but i don't think dvd audio and pcm can be mentioned without referring to sacd (super audio compact disk) or dsd (direct stream digital) at least a little bit.

Sony's/Philips' movement to super audio cd's seems to me the next step in audio evolution, while dvd-audio is just a postponing of the inevitable. SACD uses DSD rather than PCM when converting analog to digital which produces on ungodly amazing conversion. A simple comparison of the sampling frequency for cd, 44.1 khz (and dvd 96 khz) and sacd, 2.8 ghz shows right off the bat that this technology falls under the bigger numbers are better rule.

You can find out more about sacd and dsd from sony's promo site.

I personally think that media companies should push the next best technology, which i see to be sacd, rather than bitch about mp3. The only reason mp3's are popular now is because we now have technology that rivals the mass corporations, so they need to step up and start delivering better quality merchandise, but this is a whole 'nother discussion for a whole 'nother time.
defect - jso - joseth || a link
Wow, never even knew that existed (4.00 / 1) (#13)
by zavyman on Wed Nov 15, 2000 at 01:05:59 PM EST

DSD is definitely amazing, compared to the currently universal PCM format. But sadly, most people cannot tell the difference, and don't really care about the difference.

But screw DVD-A; that format is way to controlled by the MPAA and the RIAA to be useful. SACD at least appears to be open, simple, and unencrypted, bonuses for consumers and producers alike. The only problems are non-computer compliance yet, and no home burners available.

But getting back to the point, compression will be useful as long as bandwidth is limited and demand for quality is low. I think we are seeing the upper limit of audio quality now. Portable devices cannot improve too much anymore (audio quality wise; the formats will certainly change) because of the use of headphones. And only expensive home audio equipment can really tell the difference. Pure hifi sound is marketed to audiophiles. The rest (CD, MD, MP3) is aimed at the consumer.

[ Parent ]

An old format that did not break through... (none / 0) (#14)
by yunga on Wed Dec 20, 2000 at 08:27:56 PM EST

I think it was late in 1997 that I discovered and adopted the "Transform-domain Weighted Interleave Vector Quantization" (take a breath) or TwinVQ for short, and VQF for shorter... That format (supported by most of the mp3 players) gives you easily 20% smaller files than mp3 and at a better quality, please.

For those who the numbers speaks more, I'm used to keep my audio files on cdrs.

  • All the mp3 are recorded at 128kbit/44.1khz joint stereo, on 640MB discs. The files sizes generally vary between 2.9 and 4.7MB (exception made of all those 11MB J.S. Bach files) and there's around 150-170 files per cd.
  • For vqf, at 96kbit/44.1khz joint stereo, sizes vary between 2.1 and 3.7MB, and there's between 220-240 files per cd... at a better quality than mp3!

I don't know if it's due to all the medias noise surroundings the mp3 that it never came through but you'd better give it a try, trust me.


Quidquid latine dictum sit, altum viditur.
Inside: Audio Compression | 14 comments (9 topical, 5 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!