|
Index O'Stuff
Home Compression Arithmetic CodingBurrows-Wheeler Transform Huffman Coding LZSS Coding LZW Coding Rice (Golomb) Coding Run Length Encoding Misc. Programming School ProjectsThesis Project crypt(3) Source Hamming Codes Bit Manipulation Libraries Square Root Approximation Library Sort Library Trailing Space Trimmer and Tab Remover Command Line Option Parser Humor Dictionary O'Modern TermsThe Ten Commandments of C Style Other Stuff TOPS Success StoryFree Win32 Software External Links SPAN (Spay/Neuter Animal Network)Los Angeles Pet Memorial Park Mirrors
Pages at dipperstein.comPages at geocities Obligatory Links
|
ANSI C and C++ Bit Manipulation LibrariesFor some reason I keep finding myself dabbling in the worlds of compression and encryption. I'm not an expert in either of these areas, nor do I aspire to become one. It's just something that catches my interest from time to time. On computers, both compression and encryption usually take bit patterns with a given meaning and translate them to other patterns intended to have the same meaning. This typically means having to read, write, and manipulate arbitrary groups of bits. To save myself from reinventing the wheel every time I played with another compression or encryption algorithm, I developed two libraries: one for bitwise file reading and writing (bitfile), and the other for manipulating arbitrary length arrays of bits (bitarray). More recently I was asked to modify my LZSS implementation for a SEGA Genesis without a file system, so I developed a bitwise array reading and writing library (arraystream). Arraystream is very similar to bitfile, with the major exception being that it operates on arrays. I originally wrote the bitfile and bitarray libraries in ANSI C, because I used C for my compression algorithms. However, these libraries were one of just a few things that I ever wrote (I've written a lot) where I thought that I could do a better job with C++. Now I provide both ANSI C and C++ implementations of my bitfile (C, C++) and bitarray (C, C++) libraries. For the time being, the arraystream library is exclusively available in C. I am publishing both of these libraries under the GNU LGPL in hopes that they will be of use to other people. The rest of this page discusses each of my libraries. Michael Dipperstein Bitfile LibraryImplementationThe bitfile library is provides a wrapper around the ANSI C file I/O functions. Every bit file is referenced by a structure which includes a FILE pointer, a character to buffer bits, and count of the number of bits in the character buffer. The arraystream library uses a similar structure, replacing the FILE pointer with a pointer to an array of unsigned characters and an array index. Arraystream operations are analogous to bitfile operations in almost all respects and will not be discussed further. Reading bits from a bitfile works as follows:
To read an additional bit, repeat the process from Step 3. Once all bits are read from the character buffer (the count equals 0) the process starts over from Step 1. Writing bits to a bitfile works as follows:
Repeat the process from Step 1 for each additional bit. Once 8 bits have been written to the character buffer, the buffer is written to the underlying file and the bit count is set to 0. I have incorporated some short cuts that bypass the character buffer in the functions that read/write characters or bytes. PortabilityAll the source code that I have provided is written in strict ANSI C. I would expect it to build correctly on any machine with an ANSI C compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP. The code assumes that there are 8 bits in characters that can be written to files. I believe that library will function in file systems using characters larger than 8 bits, but the file sizes will be suboptimal. The bitfile library includes routines which read and write an arbitrary number of bits from integer types: short, int, long, .... These routines are dependant on the endianess of the machines they run on. The bitfile library should handle the endianess issues correctly but I don't have access to big endian hardware. The big endian functions were tested using simulated big endian integers. UsageRather than writing lengthy man pages for each of the functions in the bitfile library, I have taken a cheap cop-out. The bitfile source includes detailed headers preceding each function, and I have included a file named sample.c which demonstrates the usage of each function in the bitfile library. DownloadA zipped archive containing the ANSI C source for my bitfile library may be downloaded by clicking here. My source has been released under the GNU LGPL. A zipped archive containing the ANSI C source for my arraystream library may be downloaded by clicking here. My arraystream source has also been released under the GNU LGPL. Bitfile ClassImplementationThe bitfile class makes use of (but does not inherit from) the ifstream and ofstream classes. Every bit file object contains an ifstream pointer, an ofstream pointer, a character to buffer bits, and count of the number of bits in the character buffer. The bitfile class implements reading and writing, the just like the ANSI C bitfile library does. There was no need to reinvent the wheel here. PortabilityAll the source code that I have provided is written in ISO C++. The code uses the iostream, fstream, ifstream, and ofstream libraries. I would expect it to build correctly on any machine with an ISO C++ compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP. The code assumes that there are 8 bits in characters that can be written to files. I believe that library will function in file systems using characters larger than 8 bits, but the file sizes will be suboptimal. The bitfile class includes methods which read and write an arbitrary number of bits from integer types: short, int, long, .... These methods are dependant on the endianess of the machines they run on. The bitfile class should handle the endianess issues correctly but I don't have access to big endian hardware. The big endian methods were tested using simulated big endian integers. UsageRather than writing lengthy man pages for each of the methods in the bitfile class, I have taken a cheap cop-out. The bitfile source includes detailed headers preceding each function, and I have included a file named sample.cpp which demonstrates the usage of each method in the bitfile class. DownloadA zipped archive containing the C++ source for my bitfile class may be downloaded by clicking here. My source has been released under the GNU LGPL. Bitarray LibraryImplementationThe bitarray library provides a collection of functions that create and operate on arrays of bits. Bitarrays may be of any size and are implemented as arrays of unsigned char. Bit 0 is the most significant bit (MSB) of unsigned char 0, and the last bit is the least significant (non-spare) bit (LSB) of the last unsigned char.
The array data is contained inside structure which includes a count of the number of bits in the array, and a pointer to the memory storing the array. Since arrays may be of arbitrary size, the memory storing the array is dynamically allocated using the malloc function. The implementation of the bitarray structure is hidden from routines linked to the library. The data in bitarrays may only be accessed through function calls. I have written the bitarray library so that functions requiring multiple bit arrays (such as BitArrayAnd), will not do anything if they are given arrays of differing sizes to operate on. PortabilityAll the source code that I have provided is written in strict ANSI C. I would expect it to build correctly on any machine with an ANSI C compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP. The library includes the function BitArrayDump, which is intended for debug and assumes that unsigned chars are 8 bits. This function can easily be written to support any specific size unsigned character. Writing BitArrayDump to handle arbitrary size unsigned char seems more difficult than it is worth to me. Especially since I only have access to machines with 8 bit unsigned chars. UsageRather than writing lengthy man pages for each of the functions in the bitarray library, I have taken a cheap cop-out. The bitarray source includes detailed headers preceding each function, and I have included a file named sample.c which demonstrates the usage of each function in the bitarray library. DownloadA zipped archive containing the ANSI C source for my bitarray library may be downloaded by clicking here. My source has been released under the GNU LGPL. Bitarray ClassImplementationThe bitarray class provides a an object that contains an array of bits and methods that operate on the array. Bitarrays may be of any size and are implemented as a std::vector of unsigned char. Since std::vector is used, your compiler must include the standard template library. Bit arrays are implemented such that bit 0 is the most significant bit (MSB)of the unsigned char in vector index 0. The last bit is the least significant (non-spare) bit (LSB) of the last unsigned char in the vector.
The vector storing the array data is a member of the bitarray class. It is allocated at time of construction and may not be resized. The class also contains a member variable indicating the number of bits in the array. The bitarray class includes overrides for all comparison, bitwise assignments (eg. &=), increment and decrement, and bitwise operators. Operators requiring two bit array operands only work when both arrays contain the same number of bits. Bitwise operators that return bitarrays (eg. &) return values, not actual bitarray objects. With native arrays, square brackets ([]) may be used to either obtain the value of an array element 1, or a pointer to an array location 2.
case 1: Unfortunately I have not found a way to overload square brackets ([]) to behave both ways. Consequently square brackets ([]) returns a bit value and parenthesis (()) returns a class that behaves as a pointer to a bit in the array. The class returned by parenthesis (()) may only be used for assigning bit values. PortabilityAll the source code that I have provided is written in ISO C++. The code uses the iostream, and the vector standard template library. I would expect it to build correctly on any machine with an ISO C++ compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP. UsageRather than writing lengthy man pages for each of the methods in the bitarray class, I have taken a cheap cop-out. The bitarray source includes detailed headers preceding each function, and I have included a file named sample.cpp which demonstrates the usage of each method in the bitarray class. DownloadA zipped archive containing the C++ source for my bitarray class may be downloaded by clicking here. My source has been released under the GNU LGPL. My latest implementations of Huffman, LZSS, LZW, and arithmetic encoding all provide additional examples of how to use the C version of these libraries. If you still have any questions or comments feel free to e-mail me at mdipper@alumni.engr.ucsb.edu. You might also want to visit DataCompression.info for additional information on all things related to compression.
|
Home
Last updated on June 8, 2008