The first step in Huffman's approach is to create a series of source reductions by ordering the probabilities of the symbols under consideration and combining the lowest probability symbols into a single symbol that replaces them in the next source reduction.

The second step is to code each reduced source, starting with the smallest source and working back to original source.

Huffman coding is variable-length coding.

character | No. of occurrences | probability |

e | 3320 | 30.5119 |

t | 2474 | 22.7369 |

o | 1749 | 16.0739 |

h | 1458 | 13.3995 |

l | 1067 | 9.8061 |

p | 547 | 5.0271 |

w | 266 | 2.4446 |

Total | 10881 | 100 |

character | Binary cod |

e | 00 |

t | 10 |

o | 010 |

h | 011 |

l | 110 |

p | 1110 |

w | 1111 |

To decode the stream, start at the root of the encoding tree, and follow a left-branch for a 0, a right branch for a 1. When you reach a leaf, write the character stored at the leaf, and start again at the top of the tree. (Very slow, not used in practice)

If we have an image file like:

20 20 20 30 30

40 40 40 40 40

50 50 50 ......

We get:

40 (40%) ____________________0____

20 (24%) ___________10___ |_____1___

50 (23%) ___110___ |____1___ |

30 (16%) ___111___|--11-- |

The matlab implementation can be found in file exchange.

## No comments:

## Post a Comment