Random Noise: Huffman coding

The first step in Huffman's approach is to create a series of source reductions by ordering the probabilities of the symbols under consideration and combining the lowest probability symbols into a single symbol that replaces them in the next source reduction.

The second step is to code each reduced source, starting with the smallest source and working back to original source.

Huffman coding is variable-length coding.

character	No. of occurrences	probability
e	3320	30.5119
t	2474	22.7369
o	1749	16.0739
h	1458	13.3995
l	1067	9.8061
p	547	5.0271
w	266	2.4446
Total	10881	100

result

character	Binary cod
e	00
t	10
o	010
h	011
l	110
p	1110
w	1111

To decode the stream, start at the root of the encoding tree, and follow a left-branch for a 0, a right branch for a 1. When you reach a leaf, write the character stored at the leaf, and start again at the top of the tree. (Very slow, not used in practice)

If we have an image file like:
20 20 20 30 30
40 40 40 40 40
50 50 50 ......

We get:
40 (40%) ____________________0____
20 (24%) ___________10___ |_____1___
50 (23%) ___110___ |____1___ |
30 (16%) ___111___|--11-- |

The matlab implementation can be found in file exchange.

Random Noise

Thursday, September 04, 2008

Huffman coding

No comments:

Post a Comment

About Me

Categories

Blog Archive