|
Index O'Stuff
Home Compression Arithmetic CodingBurrows-Wheeler Transform Delta Coding Frequency Substitution Huffman Coding LZSS Coding LZW Coding Rice (Golomb) Coding Run Length Encoding Misc. Programming School ProjectsThesis Project crypt(3) Source Hamming Codes Bit Manipulation Libraries Square Root Approximation Library Sort Library Trailing Space Trimmer and Tab Remover Command Line Option Parser Humor Dictionary O'Modern TermsThe Ten Commandments of C Style Other Stuff TOPS Success StoryFree Win32 Software External Links SPAN (Spay/Neuter Animal Network)Los Angeles Pet Memorial Park Obligatory Links
|
Adaptive Delta Coding Discussion and Implementationby Michael DippersteinMy experience with Rice (Golomb) coding and a challenge that a student sent me had thinking about creating a library that uses a small number of bits to encode the differences between consecutive symbols, but it didn't seem useful or all that interesting. It became interesting and useful after I attended BodyNets 2009 and saw a presentation on "Adaptive Lossless Compression in Wireless Body Area Sensor Networks". The presenter concluded that adaptive delta encoding was a code space and time efficient method for compressing data sampled from heart and gait monitors. So with renewed enthusiasm, I threw together a quick hack that implements adaptive delta encoding. As with my other compression implementations, my intent is to publish an easy to follow ANSI C implementation of the adaptive delta coding algorithm. Anyone familiar with ANSI C and the adaptive delta coding algorithm should be able to follow and learn from my implementation. There's a lot of room for improvement of compression ratios, speed, and memory usage, but this project is about learning and sharing, not perfection. Click here for information on downloading my source code. The rest of this page discusses adaptive delta coding and the results of my efforts so far. Algorithm OverviewDelta encoding represents streams of compressed symbols as the difference between the current symbol and the previous symbol. This will result in compression for any data streams where it takes fewer bits to represent the differences between consecutive symbols then it does to represent the symbols themselves. If you happen to have data that is well suited to delta encoding (like audio or heart monitor data), you will find the simplicity of this algorithm attractive. However, delta encoding is not well suited for all data types. EncodingDelta coding is fairly straightforward. The first symbol of a delta
encoded stream is always written out unecoded. After that, every
symbol is replaced with an Using Keeping The Delta encoding algorithm does not specify an escape symbol. I
used the smallest signed value that can be represented in that in
Based on the discussion above, encoding input consists of the following steps:
That's it. I told you it was straightforward. DecodingDecoding isn't any harder than encoding. Since first symbol of a
delta encoded stream is always written out unecoded, it be read back
and written without modification. After that, read in an
Continue this process until the end of file is reached. Based on the discussion above, decoding encoded input consists of the following steps:
Implementation IssuesAdapting to Delta RangesIf you choose a value for the code word size ( I'm not aware of any literature that provides details on different adaptation algorithms, so I invented my own. My goals for the algorithm and it's implementation were for it to be code space and time efficient, and easy to modify. It also had to produce results that were improvements over the results of the non-adaptive algorithm. I incorporated an overflow counter to determine if the code word
size ( The source code used to adjust the code word size is all contained
in the file Handling End-of-File (EOF)The EOF is of particular importance, because it is likely that an
encoded file will not have a number of bits that is an integral multiple
of bytes. Most file systems require that files be stored in bytes, so
it's likely that encoded files will have spare bits. If you don't know
where the There are at least three solutions to the
I choose the third option for my implementation. I write an escape symbol at the end of an encoded file. The decoder expects an unencoded symbol to follow the escape, but file can always be ended in less bits than it takes to write an unencoded symbol, so the decoder will get an EOF before it gets the symbol it was expecting. EffectivenessSo how well does delta coding work for generic data? Not very well at all. Delta coding only works well when consecutive symbols have similar values. The greater the difference between two consecutive symbols, the more bits are required to encode them. One way to improve the compression obtained by delta coding on generic data is to apply a reversible transformation on the data that reduces the average value of a symbol as well as the difference between consecutive symbols. The Burrows-Wheeler Transform (BWT) with Move-To-Front (MTF) encoding is such a transform. I used the
Calgary Corpus
as a data set to test the effectiveness of delta coding compared
to my implementations of Rice coding,
Huffman coding and
LZSS. The executive summary is that
even with the help of BWT and MTF, delta coding couldn't match the
compression ratios of Huffman coding or LZSS. Rice coding with BWT and
MTF even has better compression ratios if the proper value for
The results of my test appear in the following tables:
Actual SoftwareI am releasing my implementations of adaptive delta encoding and decoding under the LGPL. At this time I only have two revisions of the code to offer. As I add enhancements or fix bugs, I will post them here as newer versions. The larger the version number, the newer the version. I will retain the older versions for historical reasons. I recommend that most people download the newest version unless there is a compelling reason to do otherwise. Each version is contained in its own zipped archive which includes the source files and brief instructions for building an executable. None of the archives contain executable programs. A copy of the archives may be obtained by clicking on the links below.
PortabilityAll the source code that I have provided is written in strict ANSI C. I would expect it to build correctly on any machine with an ANSI C compiler. I have tested the code compiled with gcc on Linux and mingw on Windows XP. The software makes no assumptions about the endianess of the machine that it is executing on. However, it does make some assumptions about the size of data types. The software makes use of the #if and #error pre-processor directives as an attempt to check for violations of my assumptions at compile time. If you have any further questions or comments, you may contact me by e-mail. My e-mail address is: mdipper@alumni.engr.ucsb.edu For more information on compression algorithms, visit DataCompression.info.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home
Last updated on December 30, 2009