Search
Next: 0.10.1 Run-Length Encoding Up: Algorithm Archive (Work in Previous: 0.9 Data Encryption Algorithms

0.10 Data Compression Algorithms

The goal of data compression is to represent a some set of information as space efficiently as possible. A data compression code is a mapping between some set of source messages and a set of codewords. A source message does not have to be and is not usually an entire string being compressed. Rather, it is the set of symbols or strings into which the data being compressed is partitioned for processing. These basic units may be single symbols from the source string's alphabet, or they may be strings of such symbols. The process of converting from a source stream into a coded (hopefully compressed) message is called encoding while the inverse operation is called decoding. A lossless encoding method is one in which the process of decompression results in no loss of original data whereas lossy encoding method is one in which the original data cannot be fully recovered.

Codes can be of the types block-block or variable-variable. Codes of the block-block variety operate on static, fixed-length codewords and source messages while their counterparts operate on dynamic length codewords and source messages. One example of a block-block type code is the ASCII code. It is of the block-block variety because it maps fixed-length source messages (characters) into fixed-length codewords (their ASCII values).

Because variable-variable type codes produce codewords that do not have a fixed length, when processing the output of a variable-variable code it is vital to be able to differentiate between codewords in the stream. Fixed length codewords are easy to distinguish due to their regular spacing pattern. However, we do not have this luxury when dealing with variable-variable codes.

The sequence of codewords or source messages in a stream is called an ensemble.

A coding function is called distinct if its mapping from source messages to codewords is one-to-one. Such a code is called uniquely decodable if every codeword is recognizable even when immersed in a stream of other codewords. A uniquely decodable code is known as a prefix code if it has the property that no codeword in the code is a prefix of any other codeword.

Data compression schemes are said to be either static or dynamic (or adaptive). A static function is one in which the mapping from the input source messages to the set of codewords is fixed before the data compression begins. In such a system a given message is always represented by the same codeword regardless of where it appears in the ensemble. In contrast, a dynamic (or adaptive) algorithm may change the mapping for a particular source message during the compression process.

Scott Gasch
1999-07-09