Niedermayer.ca
Published on Niedermayer.ca (https://niedermayer.ca)

Home > Code > Huffman Encoding Tables

Huffman Encoding Tables

Huffman Encoding trees are a loss-less compression schema that work very well with text or other datasets where elements are frequently repeated. A binary tree is constructed where each node and leaf represents a symbol. The address to a node or leaf is the sequential labels of the edges to that node. Because more frequently used symbols are referenced higher on the tree, they have a shorter address. In this way compression is possible.

Because Huffman Encoding is arbitrary, the encoding table must be transmitted with the compressed document making it inefficient for very small documents or documents with a large number of symbols (like images). The key to the encoding scheme is that each node address can be parsed in an unambiguous manner from the data stream.

The program (written in C++ and using templated classes) involves reading a probability table (as a text file) and then constructing a Huffman encoding tree for file compression.

This project contains three files:

  • symbol.h The header file containing the tree definitions
  • symbol.cpp The source code for the methods and implementations used to build the encoding tree
  • huffman.cpp The main procedure to read the data initiate the tree and output the results.

Usage:

Normally the first step in encoding a source file is to take a pass and count all instances of ASCII or Unicode characters--or bytes--in the file and create a frequency table. This frequency table is then converted to a Huffmann tree. This project omits the first step, leaving that relatively trivial activity to a user to implement on his or her own. This code expects to receive an input file with an ASCII character as the first character of a line, followed by a space, then a floating point number representing the probability of occurrence of the character in a file and ending with a newline code.

  • Log in [1] to post comments
Language: 
C++
Archive: 
Package icon huffman.zip [2]

Huffman main procedure

Project Name: 
Huffman Encoding Trees [3]
  • Log in [4] to post comments
Filename: 
huffman.cpp
Language: 
C++
Summary: 

This program reads a text file of the format: "c0.ffffN" where "c" is a single ASCII character, "0.ffff" is a floating point number, and N is a new-line character.

The program creates a binary tree, called the "List Tree", of Symbol objects for each line/character in the file. Once all lines have been read, and the binary tree constructed, the Huffman table is generated.

This process involves: searching the List Tree for the lowest probability symbol, marking this symbol as read, and then moving it to a new tree called the Huffman Tree. Once all symbols have been marked are moved in this manner, the new tree is outputted to a second file, and the program terminates.

Code: 
#include <assert.h> #include <new.h> #include <stdlib.h> #include <fstream.h> #include <iostream.h> #include <iomanip.h> #include <string.h> #include "Symbol.h" /*=========================================================== * Name: Daryle Niedermayer * Date: November 1, 1997 * Desc: This program reads a text file of the format: "c0.ffffN" where "c" is a single ASCII character, "ffff" is a floating point number, and N is a new- line character. The program creates a binary tree, called the "List Tree", of Symbol objects for each line/character in the file. Once all lines have been read, and the binary tree constructed, the Huffman table is generated. This process involves: searching the List Tree for the lowest probability symbol, marking this symbol as read, and then moving it to a new tree called the Huffman Tree. Once all symbols have been marked are moved in this manner, the new tree is outputted to a second file, and the program terminates. ==============================================================*/ int main() { int Count=0; //Count of symbols to process char Infile[40]; //Name of Input file to read char Outfile[40]; //Name of Output file to write char Buffer[80]; char CurrentCharacter; //Values for the current read line float CurrentProb; Tree List; //Create the ListTree Tree Huffman; //Create the Huffman Table ifstream INPUT; //Declare input and output filestreams ofstream OUTPUT; /*============================================= == SET UP FILE STREAMS FOR READING AND WRITING =============================================*/ cout << "\nEnter an input filename: "; cin >> Infile; cout << "\nEnter an output filename: "; cin >> Outfile; INPUT.open(Infile,ios::in); OUTPUT.open(Outfile,ios::out); /*============================================= == BEGIN CREATING LIST TREE =============================================*/ while (INPUT.eof() == false) { INPUT.getline(Buffer,80,'\n'); CurrentCharacter=Buffer[0]; CurrentProb=atof(Buffer+1); istream &flush(); List.insertSymbol(CurrentCharacter,CurrentProb); } /*============================================= == MOVE SYMBOLS TO HUFFMAN TREE & ENCODE =============================================*/ List.moveSymbols(&Huffman); Huffman.assignBits(); Huffman.printTree(); /*============================================= == WRITE ENCODING TABLE FILE =============================================*/ Huffman.writeTable(&OUTPUT); return 0; }

Symbol header file

Project Name: 
Huffman Encoding Trees [3]
  • Log in [5] to post comments
Filename: 
symbol.h
Language: 
C++
Summary: 

This header file defines the classes, properties and method prototypes for the Huffman tree and symbol tables.

Code: 
#ifndef _SYMBOL_ #define _SYMBOL_ #include <assert.h> class Tree; class Symbol { friend class Tree; private: char Character; Symbol *Parent, *LeftChild, *RightChild; bool Mark; float Probability; char CodeBit[16]; int Depth; public: Symbol(); Symbol(char,float); ~Symbol(); }; //========================================== class Tree { public: Tree(); ~Tree(); void insertSymbol(char, float); int showLength(); void displayTree(); int findLow(); int markSymbol(); void moveSymbols(Tree*); displaySymbol(); void printTree(Symbol*); void printTree(void); void assignBits(); void assignBits(Symbol*, char*); void writeTable(ofstream*); void writeTable(ofstream*,Symbol*); protected: Symbol *Root, *PtrCurrent; int Count; int Depth; Symbol* removeSymbol(); int BranchCounter; void insertSymbol(Symbol*,Symbol*); void insertSymbol(Symbol*); }; #include "Symbol.cpp" #endif

Symbol code file

Project Name: 
Huffman Encoding Trees [3]
  • Log in [6] to post comments
Filename: 
symbol.cpp
Language: 
C++
Code: 
//======================================================== // CHARACTER METHODS //======================================================== Symbol::Symbol() { Parent = NULL; LeftChild = NULL; RightChild = NULL; Probability=0; Character='\0'; Mark = false; CodeBit[0] = '\0'; Depth = 1; } //========================================== Symbol::Symbol(char InputChar, float InputProb) { Parent = NULL; LeftChild = NULL; RightChild = NULL; Mark = false; CodeBit[0] = '\0'; Depth = 1; Character = InputChar; Probability = InputProb; } //========================================== Symbol::~Symbol() {;} //======================================================== // TREE METHODS //======================================================== Tree::Tree() { Root = NULL; Count=0; PtrCurrent=Root; Depth=0; BranchCounter=0; } //========================================== Tree::~Tree() { //Delete all Character objects before //Destruction } //======================================================== //INSERTSYMBOL IS USED BY THE LIST TREE TO CONSTRUCT //A LINKED LIST OF SYMBOLS. FOR THIS PURPOSE, IT USES //THE LEFTCHILD POINTERS TO POINT TO THE NEXT ENTRY AND //PARENT POINTERS TO POINT TO THE LAST ENTRY void Tree::insertSymbol(char MyCharacter, float MyProb){ Symbol *Ptr = new Symbol(MyCharacter,MyProb); assert (Ptr !=NULL); Ptr->LeftChild = Root; Root->Parent = Ptr; Root = Ptr; Count++; //Watch for special case of an empty tree if (Root == NULL) { Root = PtrCurrent = Ptr; return; } return; } //========================================== //THIS OVERLOADED METHOD IS USED TO INSERT A SYMBOL //OBJECT INTO THE HUFFMAN TREE void Tree::insertSymbol(Symbol *Symbol1, Symbol *Symbol2) { Symbol *Nanny = new Symbol(); Nanny->LeftChild = Symbol1; Nanny->RightChild = Symbol2; Symbol1->Parent = Symbol2->Parent = Nanny; Nanny->Probability = Symbol1->Probability + Symbol2->Probability; BranchCounter++; if (BranchCounter==1) Root=Nanny; if (BranchCounter>=2) { Symbol *NewRoot=new Symbol(); NewRoot->LeftChild=Root; NewRoot->RightChild=Nanny; Nanny->Parent = Root->Parent = NewRoot; NewRoot->Probability = Root->Probability + Nanny->Probability; Root=NewRoot; } } //========================================== //THIS OVERLOADED METHOD IS USED TO INSERT A SYMBOL //OBJECT INTO THE HUFFMAN TREE void Tree::insertSymbol(Symbol *Symbol1) { Symbol *NewRoot = new Symbol; NewRoot->LeftChild = Root; NewRoot->RightChild = Symbol1; Root->Parent = Symbol1->Parent = NewRoot; NewRoot->Probability = Symbol1->Probability + Root->Probability; BranchCounter++; Root=NewRoot; } //========================================== int Tree::showLength(void){ return Count; } //========================================== void Tree::displayTree() { for (Symbol *Ptr=Root; Ptr; Ptr=Ptr->LeftChild) cout << Ptr->Character << " " << Ptr->Probability << " "; cout << endl; } //========================================== //FINDLOW FINDS THE LOWEST UNMARKED SYMBOL IN THE LIST. //IF ALL SYMBOLS ARE USED UP (I.E. MARKED), IT RETURNS //-1. OTHERWISE IT RETURNS 0 int Tree::findLow() { Symbol *Ptr = Root; bool FoundFlag = false; float LowValue = 1; while (Ptr != NULL) { if ((Ptr->Probability<=LowValue) && (Ptr->Mark==false)) { FoundFlag = true; PtrCurrent=Ptr; LowValue = Ptr->Probability; } Ptr=Ptr->LeftChild; } if (FoundFlag == true) return 0; else return -1; } //========================================== int Tree::markSymbol(){ if (PtrCurrent == NULL) return -1; else { PtrCurrent->Mark = true; return 0; } } //========================================== Symbol* Tree::removeSymbol(){ Symbol *Tmp; if (Root == NULL) //Is the list empty? return 0; if ((PtrCurrent->Mark != true) || (PtrCurrent == NULL)) return 0; //EXTRICATE SYMBOL FROM LIST TREE Tmp = PtrCurrent; if (PtrCurrent != Root) PtrCurrent->LeftChild->Parent = PtrCurrent->Parent; else { PtrCurrent->LeftChild->Parent=NULL; Root=PtrCurrent->LeftChild; } PtrCurrent->Parent->LeftChild = PtrCurrent->LeftChild; PtrCurrent->Parent=PtrCurrent->LeftChild = NULL; PtrCurrent = NULL; return Tmp; } //========================================== void Tree::moveSymbols(Tree *HuffmanTree){ Symbol *Symbol1=NULL, *Symbol2=NULL; int Length=this->showLength(); while (Length > 1) { if (findLow()) { Symbol1=NULL; cerr << "Error: Lowest value not found" << endl; } else { markSymbol(); Symbol1=removeSymbol(); Length--; } if (findLow()){ Symbol2=NULL; cerr << "Error: Lowest value not found" << endl; } else{ markSymbol(); Symbol2=removeSymbol(); Length--; } HuffmanTree->insertSymbol(Symbol1,Symbol2); } if (Length == 1) { if (!findLow()) cerr << "Error: Last element not found!" << endl; markSymbol(); Symbol1=removeSymbol(); Length--; HuffmanTree->insertSymbol(Symbol1); } return; } //========================================== void Tree::printTree(){ Symbol *Current=Root; if (Root != NULL) { //Print Header cout << "Char Probability Code Location Neighbours" \ << endl; printTree(Root->LeftChild); printTree(Root->RightChild); cout << hex; cout <<setiosflags(ios::fixed | ios::showpoint | ios::internal); cout << setiosflags(ios::unitbuf | ios::right); cout << "\""<< setw(1) << Current->Character<<"\""; if (Current->Character=='\0') cout << " "; cout << setiosflags(ios::left); cout << setw(12) << setprecision(7) << Current->Probability << " "; cout << setw(17) << setfill(' ') << Current->CodeBit; cout << " " << hex << Current; cout << " " << hex << Current->Parent << " " << Current->LeftChild; cout << " " << Current->RightChild << dec << endl; } return; } //========================================== void Tree::printTree(Symbol *Current){ if (Current != NULL) { printTree(Current->LeftChild); printTree(Current->RightChild); cout << hex; cout <<setiosflags(ios::fixed | ios::showpoint | ios::internal); cout << setiosflags(ios::unitbuf | ios::right); cout << "\""<< setw(1) << Current->Character<<"\""; if (Current->Character=='\0') cout << " "; cout << setiosflags(ios::left); cout << setw(12) << setprecision(7) << Current->Probability << " "; cout << setw(17) << setfill(' ') << Current->CodeBit; cout << " " << hex << Current; cout << " " << hex << Current->Parent << " " << Current->LeftChild; cout << " " << Current->RightChild << dec << endl; } return; } //========================================== void Tree::assignBits(){ if (Root != NULL) { assignBits(Root->LeftChild,"0"); assignBits(Root->RightChild,"1"); } return; } //========================================== void Tree::assignBits(Symbol *Current, char *BitSet){ char BitBuffer[16]; BitBuffer[0]='\0'; strcat(BitBuffer,BitSet); if (Current != NULL) { strcpy(Current->CodeBit,BitSet); assignBits(Current->LeftChild,strcat(BitBuffer,"0")); BitBuffer[0]='\0'; strcat(BitBuffer,BitSet); assignBits(Current->RightChild,strcat(BitBuffer,"1")); } return; } //========================================== void Tree::writeTable(ofstream *OUTPUT){ Symbol *Current=Root; if (Root !=NULL) { if (Current->Character!='\0'){ *OUTPUT << Current->Character << " " << Current->CodeBit << endl; } writeTable(OUTPUT,Current->LeftChild); writeTable(OUTPUT,Current->RightChild); } } //========================================== void Tree::writeTable(ofstream *OUTPUT,Symbol* Current){ if (Current !=NULL) { if (Current->Character!='\0') *OUTPUT << Current->Character << " " << Current->CodeBit << endl; this->writeTable(OUTPUT,Current->LeftChild); this->writeTable(OUTPUT,Current->RightChild); } }

Source URL:https://niedermayer.ca/code/huffman

Links
[1] https://niedermayer.ca/user/login?destination=node/229%23comment-form [2] https://niedermayer.ca/sites/default/files/huffman_0.zip [3] http://www.niedermayer.ca/node/229 [4] https://niedermayer.ca/user/login?destination=node/239%23comment-form [5] https://niedermayer.ca/user/login?destination=node/240%23comment-form [6] https://niedermayer.ca/user/login?destination=node/241%23comment-form