|
Index O'Stuff
Home Compression Arithmetic CodingBurrows-Wheeler Transform Huffman Coding LZSS Coding LZW Coding Rice (Golomb) Coding Run Length Encoding Misc. Programming School ProjectsThesis Project crypt(3) Source Hamming Codes Bit Manipulation Libraries Square Root Approximation Library Sort Library Trailing Space Trimmer and Tab Remover Command Line Option Parser Humor Dictionary O'Modern TermsThe Ten Commandments of C Style Other Stuff TOPS Success StoryFree Win32 Software External Links SPAN (Spay/Neuter Animal Network)Los Angeles Pet Memorial Park Mirrors
Pages at dipperstein.comPages at geocities Obligatory Links
|
Rice (Golomb) Coding Encoding Discussion and Implementationby Michael DippersteinI had to scratch yet another compression itch. I know very little about compressing wave audio files, but some questions I received made me want to find out slightly more. It turns out that it's possible to convert wave files into a format where the bulk of the data can be represented by low value numbers. Once that is done, Rice Encoding (a special case of Golomb Coding) can be applied to reduce the bits required to represent the lower value numbers. Rice's algorithm seemed easy to implement, so I felt compelled to give it a try. I was right (sometimes that happens). A rough cut of an implementation of the algorithm didn't take that long to implement. If you're going to try your own implementation, you'll need a way to read and write individual bits, the rest is not that big of a deal. I already have a bitfile library that handles the reading and writing of bits, so I was left with the stuff that is not a big deal. As with my other compression implementations, my intent is to publish an easy to follow ANSI C implementation of the Rice encoding and decoding algorithms. Anyone familiar with ANSI C and Rice (or Golomb) Encoding should be able to follow and learn from my implementation. I'm sure that there's room for improvement of compression ratios, speed, and memory usage, but this project is about learning and sharing, not perfection. Click here for a link to my Rice source. The rest of this page discusses Rice Encoding and my implementation. Algorithm OverviewGiven a constant If Rather than representing both For those not familiar with unary notation, a value Note: The following is true for binary
values, if
EncodingRice coding is fairly straightforward. Given a bit length,
That's it. I told you it was straightforward. DecodingDecoding isn't any harder than encoding. As with encoding, given a bit length,
EffectivenessSo how well does Rice coding work for generic data? Not very well
at all. Rice coding only works well when symbols are encoded with
small values of One way to improve the compression obtained by Rice coding on generic data is to apply reversible transformation on the data that reduces the average value of a symbol. The Burrows-Wheeler Transform (BWT) with Move-To-Front (MTF) encoding is such a transform. I used the Calgary Corpus as a data set to test the effectiveness of Rice coding compared to my implementations of Huffman coding and LZSS. The executive summary is that even with the help of BWT and MTF, Rice coding couldn't match the compression ratios of Huffman coding or LZSS. However BWT and MTF allowed Rice coding to actually reduce the size of the data sets. The results of my test appear in the following table:
Implementation IssuesSize of SymbolRice's algorithm does not place any restrictions on the size of
symbols being encoded. However the size of the encoded symbols must
be known when the algorithm is actually being applied to data. My
implementation of Rice's algorithm only encodes bytes. This places the
additional restriction that Handling End-of-File (EOF)The EOF is of particular importance, because it is likely that an
encoded file will not have a number of bits that is a integral multiple
of bytes. Most file systems require that files be stored in bytes, so
it's likely that encoded files will have spare bits. If you don't know
where the There are at least three solutions to the
I choose the third option for my implementation. My decoder will
not produce another symbol until it sees the 0 indicating the end of
the unary portion followed by Further InformationFurther discussion of Rice Encoding may be found at Wikipedia and Monkey's Audio - a fast and powerful lossless audio compressor. Actual SoftwareI am releasing my implementations of Rice encoding/decoding under the LGPL . At this time I only have one revision of the code to offer. As I add enhancements or fix bugs, I will post them here as newer versions. The larger the version number, the newer the version. I will retain the older versions for historical reasons. I recommend that most people download the newest version unless there is a compelling reason to do otherwise. Each version is contained in its own zipped archive which includes the source files and brief instructions for building an executable. None of the archives contain executable programs. A copy of the archives may be obtained by clicking on the links below.
PortabilityAll the source code that I have provided is written in strict ANSI C. I would expect it to build correctly on any machine with an ANSI C compiler. I have tested the code compiled with gcc on Linux and mingw on Windows XP. The software makes no assumptions about the endianess of the machine that it is executing on. However, it does make some assumptions about the size of data types. The software makes use of the #if and #error pre-processor directives as an attempt to check for violations of my assumptions at compile time. If you have any further questions or comments, you may contact me by e-mail. My e-mail address is: mdipper@alumni.engr.ucsb.edu For more information on Rice and other compression algorithms, visit DataCompression.info.
|
Home
Last updated on June 7, 2008