Since I’m into music, it often comes up there is a growing trend: music is sold digitally and as vinyl. Sometimes I’ll hear people mistakenly call the vinyl trend “retro” or “trendy” or “hip” or whatever. But if you actually ask someone why they prefer records, they’ll probably tell you the sound quality is better.
I thought I’d do a series on lossless compression and try to keep everything to general concepts or example. Let’s start with the terminology. First, media files can be large, and back in the day when computers didn’t have basically infinite space, compression was an important tool for reducing the size of a media file.
Compression is basically an algorithm to take the size of a file and makes it smaller. The most obvious method for doing this is lossy compression. This just means you lose information. The goal of such an algorithm is to only lose information that is “unimportant” and “won’t be noticed.”
A far more surprising method of compression is called lossless. At first it seems paradoxical. How can you make the file size smaller, but not lose any information? Isn’t the file size basically the information? We won’t get to this in this post. Teaser for next time!
Now lets talk about why people don’t like lossy compressed audio files. There is one quick and dirty thing you can do to immediately lose information and reduce the size of an audio file. This is dynamic range (DR) compression.
Think of a soundwave. The amplitude basically determines how loud it is. You can literally compress the wave to have a smaller amplitude without changing any other musical qualities. But this is terrible! One of the most important parts of music is the DR. A moving, soaring climax will not have the same effect if the entire build up to it is the same loudness.
This is such a controversial compression technique that many people switch to vinyl purely for DR reasons. There is a whole, searchable online database of albums to find out the DR and whether it is consider good, acceptable, or bad. Go search for your favorite albums. It is kind of fun to find out how much has been squashed out even in lossless CD format vs vinyl! (e.g. System of a Down’s Toxity is DR 11 [acceptable] on vinyl and DR 6 [truly bad] on lossless CD).
The other most common lossy compression technique for audio is a bit more involved, but it actually changes the music, so it is worth thinking about. Let’s actually make a rough algorithm for doing this (there currently exist much better and subtler forms of the following, but it amounts to the same thing).
This is a bit of a silly example, but I went to http://www.wavsource.com to get a raw wav file to work with. I grabbed one of the first ones, an audio sample from the movie 2001: A Space Odyssey. Here is the data visualization of the sound waves and the actual clip:
One thing we can do is the Fast Fourier Transform. This will take these sound waves and get rid of the time component. Normally you’ll want to make a “moving window,” so you keep track of some time. For example, we can see that from 0.5 sec to 1.5 sec is one “packet.” We should probably transform that first, then move to the next.
The FFT leaves us just with the frequencies that occur and how loud they are. I did this with python’s scypy.fftpack:
import matplotlib.pyplot as plt import scipy.fftpack as sfft import numpy as np from scipy.io import wavfile fs, data = wavfile.read('daisy.wav') b=[(ele/2**8.)*2-1 for ele in data] c = sfft.fft(b) d = len(c)/2 plt.plot(abs(c[:(d-1)]),'r') plt.show() compressed =  for ele in c: if abs(ele) > 50: compressed.append(ele) else: compressed.append(0) compressed = np.asarray(compressed) plt.plot(abs(compressed[:(d-1)]),'r') plt.show() e = sfft.ifft(compressed)
Ignore the scales which were changed just to make everything more visible but not normalized. The most crude thing we could do is set a cutoff and just remove all frequencies that we assume will be inaudible anyway:
If we do this too much, we are going to destroy how natural the sound is. As I’ve explained before, all sounds occurring naturally have tons of subtle overtones. You often can’t explicitly hear these, so they will occur below the cutoff threshold. This will bring us towards a “pure” tone which will sound more synthetic or computer generated. This is probably why no one actually compresses this way. This example was just to give an idea of one way it could be done (to finish it off you can now just inverse FFT and write to wav).
A slightly better compression technique would be to take short time intervals and multiply the peak frequency by a bump function. This will shrink all the extraneous frequencies without completely removing the robustness of the sound. This is how some lossy compression is actually done. There are other more fun things with wavelets which would take several posts to describe and the goal is to get to lossless compression.
I hope that helps to see what lossy compression is, and that it can cause some serious harm when done without care. With care, you will still lose enough sound quality that many music aficionados avoid mp3 and digital downloads completely in favor of vinyl.
Next time we’ll tackle the seemingly paradoxical concept of lossless compression.