Index O'Stuff

Home

Compression

Arithmetic Coding
Burrows-Wheeler Transform
Huffman Coding
LZSS Coding
LZW Coding
Run Length Encoding

Misc. Programming

School Projects
Thesis Project
crypt(3) Source
Hamming Codes
Bit Manipulation Libraries
Square Root Approximation Library
Sort Library
Trailing Space Trimmer and Tab Remover
Command Line Option Parser

Humor

Dictionary O'Modern Terms
The Ten Commandments of C Style

Other Stuff

TOPS Success Story
Free Win32 Software

External Links

SPAN (Spay/Neuter Animal Network)
Los Angeles Pet Memorial Park

Mirrors
Pages at dipperstein.com
Pages at Sonic.net
Pages at geocities

Obligatory Links
NoteTab Credits
Quanta Credits
Linux User Number

Visitors Since I Started Counting:
Hit Counter

ANSI C and C++ Bit Manipulation Libraries

For some reason I keep finding myself dabbling in the worlds of compression and encryption. I'm not an expert in either of these areas, nor do I aspire to become one. It's just something that catches my interest from time to time.

On computers, both compression and encryption usually take bit patterns with a given meaning and translate them to other patterns intended to have the same meaning. This typically means having to read, write, and manipulate arbitrary groups of bits. To save myself from reinventing the wheel every time I played with another compression or encryption algorithm, I developed two libraries: one for bitwise file reading and writing (bitfile), and the other for manipulating arbitrary length arrays of bits (bitarray).

More recently I was asked to modify my LZSS implementation for a SEGA Genesis without a file system, so I developed a bitwise array reading and writing library (arraystream). Arraystream is very similar to bitfile, with the major exception being that it operates on arrays.

I originally wrote the bitfile and bitarray libraries in ANSI C, because I used C for my compression algorithms. However, these libraries were one of just a few things that I ever wrote (I've written a lot) where I thought that I could do a better job with C++. Now I provide both ANSI C and C++ implementations of my bitfile (C, C++) and bitarray (C, C++) libraries. For the time being, the arraystream library is exclusively available in C.

I am publishing both of these libraries under the GNU LGPL in hopes that they will be of use to other people.

The rest of this page discusses each of my libraries.

Michael Dipperstein
mdipper@alumni.engr.ucsb.edu


Bitfile Library

Implementation

The bitfile library is provides a wrapper around the ANSI C file I/O functions. Every bit file is referenced by a structure which includes a FILE pointer, a character to buffer bits, and count of the number of bits in the character buffer.

The arraystream library uses a similar structure, replacing the FILE pointer with a pointer to an array of unsigned characters and an array index. Arraystream operations are analogous to bitfile operations in almost all respects and will not be discussed further.

Reading bits from a bitfile works as follows:

Step 1. Read a character from the underlying file and store it in the buffer character.
Step 2. Set the count of bits in the buffer character to 8.
Step 3. Report the least significant bit (lsb) in the buffer as the bit read.
Step 4. Shift the buffer right by one bit.
Step 5. Decrement the count of bits in the buffer.

To read an additional bit, repeat the process from Step 3. Once all bits are read from the character buffer (the count equals 0) the process starts over from Step 1.

Writing bits to a bitfile works as follows:

Step 1. Left shift the buffer character by one.
Step 2. Set the least significant bit (lsb) of the character buffer to the bit being written.
Step 3. Increment the count of bits in the character buffer.

Repeat the process from Step 1 for each additional bit. Once 8 bits have been written to the character buffer, the buffer is written to the underlying file and the bit count is set to 0.

I have incorporated some short cuts that bypass the character buffer in the functions that read/write characters or bytes.

Portability

All the source code that I have provided is written in strict ANSI C. I would expect it to build correctly on any machine with an ANSI C compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP.

The code assumes that there are 8 bits in characters that can be written to files. I believe that library will function in file systems using characters larger than 8 bits, but the file sizes will be suboptimal.

The bitfile library includes routines which read and write an arbitrary number of bits from integer types: short, int, long, .... These routines are dependant on the endianess of the machines they run on. The bitfile library should handle the endianess issues correctly but I don't have access to big endian hardware. The big endian functions were tested using simulated big endian integers.

Usage

Rather than writing lengthy man pages for each of the functions in the bitfile library, I have taken a cheap cop-out. The bitfile source includes detailed headers preceding each function, and I have included a file named sample.c which demonstrates the usage of each function in the bitfile library.

Download

A zipped archive containing the ANSI C source for my bitfile library may be downloaded by clicking here. My source has been released under the GNU LGPL.

A zipped archive containing the ANSI C source for my arraystream library may be downloaded by clicking here. My arraystream source has also been released under the GNU LGPL.

Bitfile Class

Implementation

The bitfile class makes use of (but does not inherit from) the ifstream and ofstream classes. Every bit file object contains an ifstream pointer, an ofstream pointer, a character to buffer bits, and count of the number of bits in the character buffer.

The bitfile class implements reading and writing, the just like the ANSI C bitfile library does. There was no need to reinvent the wheel here.

Portability

All the source code that I have provided is written in ISO C++. The code uses the iostream, fstream, ifstream, and ofstream libraries. I would expect it to build correctly on any machine with an ISO C++ compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP.

The code assumes that there are 8 bits in characters that can be written to files. I believe that library will function in file systems using characters larger than 8 bits, but the file sizes will be suboptimal.

The bitfile class includes methods which read and write an arbitrary number of bits from integer types: short, int, long, .... These methods are dependant on the endianess of the machines they run on. The bitfile class should handle the endianess issues correctly but I don't have access to big endian hardware. The big endian methods were tested using simulated big endian integers.

Usage

Rather than writing lengthy man pages for each of the methods in the bitfile class, I have taken a cheap cop-out. The bitfile source includes detailed headers preceding each function, and I have included a file named sample.cpp which demonstrates the usage of each method in the bitfile class.

Download

A zipped archive containing the C++ source for my bitfile class may be downloaded by clicking here. My source has been released under the GNU LGPL.


Bitarray Library

Implementation

The bitarray library provides a collection of functions that create and operate on arrays of bits. Bitarrays may be of any size and are implemented as arrays of unsigned char. Bit 0 is the most significant bit (MSB) of unsigned char 0, and the last bit is the least significant (non-spare) bit (LSB) of the last unsigned char.

Example: An array of 20 bits (0 through 19) with 8 bit unsigned chars requires 3 unsigned chars (0 through 2) to store all the bits.
 
        char       0       1         2
              +--------+--------+--------+
              |        |        |        |
              +--------+--------+--------+
        bit     01234567 8911111111111XXXX
                          012345 6789
        

The array data is contained inside structure which includes a count of the number of bits in the array, and a pointer to the memory storing the array. Since arrays may be of arbitrary size, the memory storing the array is dynamically allocated using the malloc function.

The implementation of the bitarray structure is hidden from routines linked to the library. The data in bitarrays may only be accessed through function calls.

I have written the bitarray library so that functions requiring multiple bit arrays (such as BitArrayAnd), will not do anything if they are given arrays of differing sizes to operate on.

Portability

All the source code that I have provided is written in strict ANSI C. I would expect it to build correctly on any machine with an ANSI C compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP.

The library includes the function BitArrayDump, which is intended for debug and assumes that unsigned chars are 8 bits. This function can easily be written to support any specific size unsigned character. Writing BitArrayDump to handle arbitrary size unsigned char seems more difficult than it is worth to me. Especially since I only have access to machines with 8 bit unsigned chars.

Usage

Rather than writing lengthy man pages for each of the functions in the bitarray library, I have taken a cheap cop-out. The bitarray source includes detailed headers preceding each function, and I have included a file named sample.c which demonstrates the usage of each function in the bitarray library.

Download

A zipped archive containing the ANSI C source for my bitarray library may be downloaded by clicking here. My source has been released under the GNU LGPL.

Bitarray Class

Implementation

The bitarray class provides a an object that contains an array of bits and methods that operate on the array. Bitarrays may be of any size and are implemented as a std::vector of unsigned char. Since std::vector is used, your compiler must include the standard template library.

Bit arrays are implemented such that bit 0 is the most significant bit (MSB)of the unsigned char in vector index 0. The last bit is the least significant (non-spare) bit (LSB) of the last unsigned char in the vector.

Example: An array of 20 bits (0 through 19) with 8 bit unsigned chars requires a vector with 3 unsigned chars (0 through 2) to store all the bits.
 
        char       0       1         2
              +--------+--------+--------+
              |        |        |        |
              +--------+--------+--------+
        bit     01234567 8911111111111XXXX
                          012345 6789
        

The vector storing the array data is a member of the bitarray class. It is allocated at time of construction and may not be resized. The class also contains a member variable indicating the number of bits in the array.

The bitarray class includes overrides for all comparison, bitwise assignments (eg. &=), increment and decrement, and bitwise operators. Operators requiring two bit array operands only work when both arrays contain the same number of bits.

Bitwise operators that return bitarrays (eg. &) return values, not actual bitarray objects.

With native arrays, square brackets ([]) may be used to either obtain the value of an array element 1, or a pointer to an array location 2.

case 1:
if (array[index] == value) ...

case 2:
array[index] = value;

Unfortunately I have not found a way to overload square brackets ([]) to behave both ways. Consequently square brackets ([]) returns a bit value and parenthesis (()) returns a class that behaves as a pointer to a bit in the array. The class returned by parenthesis (()) may only be used for assigning bit values.

Portability

All the source code that I have provided is written in ISO C++. The code uses the iostream, and the vector standard template library. I would expect it to build correctly on any machine with an ISO C++ compiler. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows 98 and XP.

Usage

Rather than writing lengthy man pages for each of the methods in the bitarray class, I have taken a cheap cop-out. The bitarray source includes detailed headers preceding each function, and I have included a file named sample.cpp which demonstrates the usage of each method in the bitarray class.

Download

A zipped archive containing the C++ source for my bitarray class may be downloaded by clicking here. My source has been released under the GNU LGPL.


My latest implementations of Huffman, LZSS, LZW, and arithmetic encoding all provide additional examples of how to use the C version of these libraries. If you still have any questions or comments feel free to e-mail me at mdipper@alumni.engr.ucsb.edu.

You might also want to visit DataCompression.info for additional information on all things related to compression.

DataCompression.Info

Home
Last updated on January 26, 2008