If there are x levels in a Huffman tree, read the node values on
each level lx from left to right. When you have exhausted level
lx move up a level to lx-1 and repeat the process. Continue
until you have moved up to the root node (and, thus, have read the
value of every node in the tree).
You should notice an interesting pattern - the values always remain
the same or increase. This is called the sibling property of
Huffman trees. It tell us that given a node n, its sibling s(n)is the node on the same level as n to the right. If n is the
rightmost node on its level, s(n) is the leftmost node on previous
level. If vn is the value of node n, we know that vs(n),
the value of the sibling, will be greater than or equal to vn.
If you understand this property understanding the rest of this
algorithm is a breeze.
One of the drawbacks of static Huffman compression algorithms is the
need to transmit the character frequency tally with the compressed
text. While an intelligently encoded table only adds about 250 bytes
(on average) to a compressed image, it would be nice to get rid of it
all together. This seems to be impossible because without the table's
data the decompression routine would not know the structure of the
Huffman tree used to encode the compressed data and therefore would
not be able to ascertain the proper codeword to character mappings.
However, adaptive Huffman compression algorithms overcome the need to
store the character counts by beginning with a mostly empty tree.
They then build up and fill in the tree as they go. The Huffman trees
generated by adaptive algorithms are dynamic meaning they change in
structure as the statistical tendancies of the text change. The
compression function adds each new character encountered to the tree.
The algorithm does not, however, compress the character the first time
it is seen. Instead the character is passed on to the compressed text
as is (in fact, a special flag character is pre-pended to it, as we
will discuss in the next paragraph). When the compression system sees
a character already present in the tree, it increments that
character's count by one, adjusts the tree accordingly, and uses the
compression code in the output stream.
The complimentary decompression routine operates in much the same
manner - when a new character is encountered it is added to the
dynamic Huffman tree but otherwise left alone. When a code is found
it is decoded and the weight of the cooresponding character is
increased. This increase may cause the tree to be reorganized.
The way that plain (non-compressed) characters and (compressed)
codewords are distinguished in the compressed stream is by use of a
flag or escape character. This symbol, when encountered,
signifies that the next byte is a literal character. All other data
is assumed to be codes. The escape symbol is one of the only items
present in the initial Huffman tree. When the decompression
subroutine runs across the code for an escape symbol it immediately
reads a byte from the input, adds it to the tree, and sends it to
output. Note that I am not talking about the escape character here
(ASCII 27) but rather a made-up symbol that has a node on the Huffman
tree. This symbol, of course, produces no output in the uncompressed
stream - its sole function is to convey a message from the
compression routine to the decompression routine about the character
following it in the stream.
The complicated part of this algorithm, as you might expect, is the
tree manipulation. It is unreasonable to reconstruct the entire
Huffman tree every time a symbol is added to it. But recall from the
opening paragraph of this section that all Huffman trees must obey the
sibling property.
Imagine the steps of incrementing a Huffman node's weight: first,
since all nodes are stored at the leaves of the tree, we will add one
to the count of a leaf. We must now make sure the tree follows the
sibling property. If our incremented leaf, i, has a larger value
then its sibling then the sibling rule is broken.
When the sibling property has been broken matters can be fixed by
swaping the offending incremented node i with its sibling. This
will not always work, though. Imagine that there is a node n with
value 5. Its immediate sibling is node o with value 5 also.
The immediate sibling of p is 5 also. (That is, there are three
leaf nodes in a row, all value 5). Now we increment node n to
6. The sibling property is broken because now n has a larger
value than its right sibling, o (which is still 5). Swapping the
two would be a problem because n is also larger than o's sibling,
p.
The proper way to restore the sibling property when it has been
violated by an incrementation is to loop over the siblings starting
with the immediate sibling of the incremented node. Continue to loop
while the value of the nodes encountered stays the same. Break out of
the loop when the value changes. Swap the incremented node with the
last node in the run of same values:
...
node[i].value++;
if (node[i].value > node[i+1].value)
{
val = node[i+1].value;
j = i+1;
while (node[j].value == val)
{
j++;
}
swap (node[i], node[j-1]);
}
...
The above code assumes, of course, that we can traverse along siblings
by simply moving to adjacent positions in a node array. In order to
implement an adaptive Huffman algorithm it should be easy to find the
sibling of a given node many times in a row.
|