Thursday, September 04, 2008

Huffman coding

The first step in Huffman's approach is to create a series of source reductions by ordering the probabilities of the symbols under consideration and combining the lowest probability symbols into a single symbol that replaces them in the next source reduction.

The second step is to code each reduced source, starting with the smallest source and working back to original source.

Huffman coding is variable-length coding.

 character No. of occurrences probability e 3320 30.5119 t 2474 22.7369 o 1749 16.0739 h 1458 13.3995 l 1067 9.8061 p 547 5.0271 w 266 2.4446 Total 10881 100

result

 character Binary cod e 00 t 10 o 010 h 011 l 110 p 1110 w 1111

To decode the stream, start at the root of the encoding tree, and follow a left-branch for a 0, a right branch for a 1. When you reach a leaf, write the character stored at the leaf, and start again at the top of the tree. (Very slow, not used in practice)

If we have an image file like:
20 20 20 30 30
40 40 40 40 40
50 50 50 ......

We get:
40 (40%) ____________________0____
20 (24%) ___________10___ |_____1___
50 (23%) ___110___ |____1___ |
30 (16%) ___111___|--11-- |

The matlab implementation can be found in file exchange.