For some reason I keep finding myself dabbling in the worlds of compression and encryption. I'm not an expert in either of these areas, nor do I aspire to become one. It's just something that catches my interest from time to time.

On computers, both compression and encryption usually take bit patterns with a given meaning and translate them to other patterns intended to have the same meaning. This typically means having to read, write, and manipulate arbitrary groups of bits. To save myself from reinventing the wheel every time I played with another compression or encryption algorithm, I developed two libraries: one for bitwise file reading and writing (bitfile), and the other for manipulating arbitrary length arrays of bits (bitarray).

Some time ago I was asked to modify my LZSS implementation so it could be used on a SEGA Genesis without a file system, so I developed a bitwise array reading and writing library (arraystream). Arraystream is very similar to bitfile, with the major exception being that it operates on arrays.

I originally wrote the bitfile and bitarray libraries in ANSI C, because I used C for my compression algorithms. However, these libraries were one of just a few things that I ever wrote (I've written a lot) where I thought that I could do a better job with C++. So I developed C++ implementations of my bitfile and bitarray libraries. For the time being, the arraystream library is exclusively available in C.

Recently (since 2008 or so) I've been using Python when I've needed quick hacks to do something on a PC (vs an embedded system). After doing some searching of PYPI, the Python Package Index, I noticed there were a number of Python packages that were similar to my bitarray library, but there wasn't anything that provided all the functionality of my bitfile library. I had to fix that problem by providing a pure Python version of my bitfile library too.

I am publishing all of these libraries under the GNU LGPL in hopes that they will be of use to other people.

The rest of this page discusses each of my libraries.

Michael Dipperstein
mdipper@alumni.engr.ucsb.edu


Bitfile Libraries

Implementation

Each version of the bitfile library provides a wrapper around the language's native file I/O. The ANSI C version uses file I/O functions and every bitfile is referenced by a structure which includes a FILE pointer.

The arraystream library uses a similar structure, replacing the FILE pointer with a pointer to an array of unsigned characters and an array index. Arraystream operations are analogous to bitfile operations in almost all respects and will not be discussed further.

The C++ version of the bitfile library makes use of (but does not inherit from) the ifstream and ofstream classes. Every bit file object contains an ifstream pointer and ofstream pointer.

The Python version implements a class containing a Python file object.

In addition to a reference to a native file, each library includes an 8-bit buffer, and counter responsible for tracking the number of bits in the 8-bit buffer. The C and C++ versions of the bitfile library use an unsinged char for the 8-bit buffer.

Reading Bits

Reading bits from a bitfile works as follows:
Step 1. Read a byte from the underlying file and store it in the 8-bit buffer.
Step 2. Set the count of bits in the buffer to 8.
Step 3. Report the least significant bit (lsb) in the buffer as the bit read.
Step 4. Shift the buffer right by one bit.
Step 5. Decrement the count of bits in the buffer.

To read an additional bit, repeat the process from Step 3. Once all bits are read from the 8-bit buffer (the count equals 0) the process starts over from Step 1.

Writing Bits

Writing bits to a bitfile works as follows:
Step 1. Left shift the 8-bit buffer by one bit.
Step 2. Set the least significant bit (lsb) of the 8-bit buffer to the value of the bit being written.
Step 3. Increment the count of bits in the 8-bit buffer.

Repeat the process from Step 1 for each additional bit. Once 8 bits have been written to the 8-bit buffer, the buffer is written to the underlying file and the bit count is set to 0.

I have incorporated some short cuts that bypass the 8-bit buffer in the functions that read/write characters or bytes.

Usage

Rather than writing lengthy man pages for each of the functions in the bitfile library, I have taken a cheap cop-out. The bitfile source includes detailed headers preceding each function. The Python version of the bitfile library includes comments in docstring format.

I have also included a file named sample.[c|cpp|py] which demonstrates the usage of each function in the bitfile library and serves as a test to verify the correctness of the code.

Download

An archive containing the source for each bitfile library may be downloaded by clicking on the links below. My source has been released under the GNU LGPL.

Language Archive
ANSI C bitfile-0.8.zip, arraystream-0.2.zip
ISO C++ bitfile_cpp-0.8.zip
Python bitfile-0.2.tar.gz

Bitarray Library

Implementation

The ANSI C bitarray library provides a collection of functions that create and operate on arrays of bits. The ISO C++ bitarray library provides a class with methods that perform similar functions.

Bitarrays may be of any size and are implemented as arrays of unsigned char. Bit 0 of the most significant unsigned char (char 0) is the most significant bit (msb) of the bit array. The last (non-spare) bit of the last unsigned char is the least significant bit (lsb).

Example:
An array of 20 bits (0 through 19) with 8 bit unsigned chars requires 3 unsigned chars (0 through 2) to store all the bits.

char       0       1         2
      +--------+--------+--------+
      |        |        |        |
      +--------+--------+--------+
bit    01234567 8911111111111XXXX
                  012345 6789
    

The array data is contained inside a structure/class which includes a count of the number of bits in the array, and a pointer to the memory storing the array. Since arrays may be of arbitrary size, the memory storing the array is dynamically allocated on the heap.

The C++ bitarray class overloads bitwise operators (&, |, ^, ...), providing the expected results on bitarray objects. The C bitarray library provides functions (BitArrayAnd, BitArrayOr, BitArrayXor, ...) for similar functionality.

I have written the bitarray library so that functions and methods requiring multiple bit arrays (such as BitArrayAnd or &), will not do anything if they are given arrays of differing sizes to operate on.

With native arrays, square brackets ([]) may be used to either obtain the value of an array element 1, or to obtain a pointer to an array location 2.

case 1:
if (array[index] == value) ...

case 2:
array[index] = value;

Unfortunately I have not found a way to do anything close to this with bitarrays in C.

In C++ it's not possible to overload square brackets ([]) to behave both ways. Consequently square brackets ([]) returns a bit value and parenthesis (()) returns a class that behaves as a pointer to a bit in the array. The class returned by parenthesis (()) may only be used for assigning bit values.

Usage

Rather than writing lengthy man pages for each of the functions in the bitarray library, I have taken a cheap cop-out. The bitarray source includes detailed headers preceding each function, and I have included a file named sample.[c|cpp] which demonstrates the usage of each function in the bitarray library.

Portability

All the source code that I have provided is written in strict ANSI C or ISO C++. I would expect it to build correctly on any machine with ANSI C/ISO C++ compilers. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows XP.

The library includes the routines intended for debugging which dump the array contents to a display. These routines assume that unsigned chars are 8 bits. These routines can easily be written to support any specific size unsigned character. Writing the dump routines to handle arbitrary size unsigned char seems more difficult than it is worth to me. Especially since I only have access to machines with 8 bit unsigned chars.

Download

An archive containing the source for each bitarray library may be downloaded by clicking on the links below. My source has been released under the GNU LGPL.

Language Archive
ANSI C bitarray-0.3.zip
ISO C++ bitarray_cpp-0.4.zip

My latest implementations of Huffman, LZSS, LZW, and arithmetic encoding all provide additional examples of how to use the C version of these libraries. If you still have any questions or comments feel free to e-mail me at mdipper@alumni.engr.ucsb.edu .

You might also want to visit DataCompression.info for additional information on all things related to compression.

DataCompression.Info

Home
Last updated on October 4, 2010