Ever wonder how it is possible to take a photo with your 8 megapixel camera, see the file referred to as a 24 megaByte file, and then send it to someone by email and the attachment is only around 1 megaByte? How can you squeeze down 24 megapixels to 1 megaByte and not mess up the image?
The whole topic of compression is really important, even these days when bandwidth and disk space are relatively cheap, because the visual objects we create like photos and movies get larger as camera sensor resolution gets better and better. We thought it would be useful to do a blog about compression basics since some digital asset management systems manage mostly photos, and we’ll use the JPEG still photo compression as our example.
You have an 8 megapixel camera, first question is how do we start off with a 24 megapixel image? Pretty much everyone shoots color photos, and a color photo has 3 “planes” of data, one for the red content of your picture, one for blue, and one for green, and when the three planes are overlapped you get your color photo. Using some fancy math (which we’ll get into in a future post) the exposed image from the 8 megapixel sensor is used to create three 8 megapixel “planes” of the image, one red, one blue, one green, and so the output of the camera is a 24 megapixel image (and since most pixels=1 Byte of data, it is a 24 megaByte file-we will clarify that in a future post as well!).
So we start off with a 24 megaByte file, and that is pretty large, and we’d like to squeeze it down. About 20 years ago a consortium of technology companies got together and came up with the JPEG (Joint Photographic Experts Group) standard for image compression (before that every vendor used its own proprietary compression and you could not email a photo to a friend and have them view it unless they used exactly the same software you did). Let’s look at how JPEG works.
Imagine a horizontal photo showing your head and shoulders against a white background. Break it into 8 x 8 pixel squares (there would be about 125,000 of those squares in the whole image, so each is a tiny part) so that the whole image is a mosaic of those squares. Now save the top left square of the image exactly (in this case the top left square would be all white pixels of the white background). Now take the next 8 pixel square, just to the right of the first one, and this time just save the difference between it and the first square. In this case it is just another square of white, so there is no difference, and we can just save a short code to tell us the difference is nothing. As we proceed to the right, there are a whole lot of these white squares, so we only save little codes of how they differ, and save a huge amount of space in our compressed file!
Eventually we will reach the end of the top row of these squares, move down a row, and someplace near the middle we will get to the first square with some hairs in it (in my case gray hairs). Now all of a sudden our JPEG algorithm (rules) will say “Hey, this square is very different from the preceding square, so let’s save a bunch of information about the differences and then start saving the differences of the next bunch of squares from it”. And this way we will work our way through the hair, saving difference codes, and then get back to more white on the other side, where the differences approach zero again.
This same process of coding squares that are significantly identical to the preceding squares will save us space in my blue shirt, for example, so we can squeeze even more redundancy out of the data. Now and then we will get to a square with a LOT of differences from the preceding one, and we will have to save a lot of its data (we call that high frequency data: think of a crowd in a grandstand, or a crisp shot of my hair), but mostly we will hit squares with low frequency data. And on the other end, when you go to expand the file back out to look at it, that software understands JPEG too, so it knows how to interpret our stream of data and recreate the original.
It turns out there is even a quality control knob you can select to adjust the threshold the algorithm uses to determine if adjacent squares are identical or not (they actually analyze each square with a formula called a Discrete Cosine Transform and compare those). You can set the knob for absolute maximum quality, and it turns out that pretty much any natural image can be compressed 2.7 to 1 and be exactly identical to the original (“lossless compression”). Or you can set it to excellent quality, and get an average of 18.5 to 1 compression and the data you “lose” will be not discernible to the human eye. Or you can say “all that matters is a tiny file, and the quality can suffer” and you can get much higher levels of compression, but the image will start to get blocky because you can “see” the 8 x 8 pixel squares. You set the JPEG quality to match your desired output use.
JPEG is actually a two pass algorithm: the first pass does the analysis of the 8×8 squares using the DCT (Discrete Cosine Transform), then a second pass (Huffman coding) finds the most common “difference codes” and represents them with a small number, thus saving a bit more space.
By the way, the use of the 8×8 squares is why JPEG is not a good solution to compressing type on a page: any diagonal line, like the foot of this “R” will look like a stair step if it has to be represented by 8×8 squares!
The conclusion: using some clever math, with some understanding of how humans view images thrown in, we can take really excellent photos and save them in a fraction of the space they would otherwise need, saving time and money!
Posted by David Tenenbaum
Photo by Mike Kullen