Bit Manipulation Libraries
For some reason I keep finding myself dabbling in the worlds of compression and encryption. I'm not an expert in either of these areas, nor do I aspire to become one. It's just something that catches my interest from time to time.
On computers, both compression and encryption usually take bit patterns with a given meaning and translate them to other patterns intended to have the same meaning. This typically means having to read, write, and manipulate arbitrary groups of bits. To save myself from reinventing the wheel every time I played with another compression or encryption algorithm, I developed two libraries: one for bitwise file reading and writing (bitfile), and the other for manipulating arbitrary length arrays of bits (bitarray).
Some time ago I was asked to modify my LZSS implementation so it could be used on a SEGA Genesis without a file system, so I developed a bitwise array reading and writing library (arraystream). Arraystream is very similar to bitfile, with the major exception being that it operates on arrays.
I originally wrote the bitfile and bitarray libraries in ANSI C, because I used C for my compression algorithms. However, these libraries were one of just a few things that I ever wrote (I've written a lot) where I thought that I could do a better job with C++. So I developed C++ implementations of my bitfile and bitarray libraries. For the time being, the arraystream library is exclusively available in C.
Recently (since 2008 or so) I've been using Python when I've needed quick hacks to do something on a PC (vs an embedded system). After doing some searching of PYPI, the Python Package Index, I noticed there were a number of Python packages that were similar to my bitarray library, but there wasn't anything that provided all the functionality of my bitfile library. I had to fix that problem by providing a pure Python version of my bitfile library too.
I am publishing all of these libraries under the GNU LGPL in hopes that they will be of use to other people.
The rest of this page discusses each of my libraries.
Each version of the bitfile library provides a wrapper around the
language's native file I/O. The ANSI C version uses file I/O
functions and every bitfile is referenced by a structure which includes
The arraystream library uses a similar structure, replacing the
FILE pointer with a pointer to an array of unsigned characters
and an array index. Arraystream operations are analogous to bitfile
operations in almost all respects and will not be discussed further.
The C++ version of the bitfile library makes use of (but does not
inherit from) the
Every bit file object contains an
ifstream pointer and
The Python version implements a class containing a Python file object.
In addition to a reference to a native file, each library includes
an 8-bit buffer, and counter responsible for tracking the number of
bits in the 8-bit buffer. The C and C++ versions of the bitfile
library use an
unsinged char for the 8-bit buffer.
Reading bits from a bitfile works as follows:
Step 1. Read a byte from the underlying file and store it in the 8-bit buffer.
Step 2. Set the count of bits in the buffer to 8.
Step 3. Report the least significant bit (lsb) in the buffer as the bit read.
Step 4. Shift the buffer right by one bit.
Step 5. Decrement the count of bits in the buffer.
To read an additional bit, repeat the process from Step 3. Once all bits are read from the 8-bit buffer (the count equals 0) the process starts over from Step 1.
Writing bits to a bitfile works as follows:
Step 1. Left shift the 8-bit buffer by one bit.
Step 2. Set the least significant bit (lsb) of the 8-bit buffer to the value of the bit being written.
Step 3. Increment the count of bits in the 8-bit buffer.
Repeat the process from Step 1 for each additional bit. Once 8 bits have been written to the 8-bit buffer, the buffer is written to the underlying file and the bit count is set to 0.
I have incorporated some short cuts that bypass the 8-bit buffer in the functions that read/write characters or bytes.
Rather than writing lengthy man pages for each of the functions in the bitfile library, I have taken a cheap cop-out. The bitfile source includes detailed headers preceding each function. The Python version of the bitfile library includes comments in docstring format.
I have also included a file named sample.[c|cpp|py] which demonstrates the usage of each function in the bitfile library and serves as a test to verify the correctness of the code.
An archive containing the source for each bitfile library may be downloaded by clicking on the links below. My source has been released under the GNU LGPL.
|ANSI C||bitfile-0.9.zip, arraystream-0.2.zip|
The ANSI C bitarray library provides a collection of functions that create and operate on arrays of bits. The ISO C++ bitarray library provides a class with methods that perform similar functions.
Bitarrays may be of any size and are implemented as arrays of
unsigned char. Bit 0 of the most significant
unsigned char (char 0) is the most significant bit (msb) of
the bit array. The last (non-spare) bit of the last
unsigned char is the least significant bit (lsb).
An array of 20 bits (0 through 19) with 8 bit
unsigned chars (0 through 2) to store all the bits.
char 0 1 2 +--------+--------+--------+ | | | | +--------+--------+--------+ bit 01234567 8911111111111XXXX 012345 6789
The array data is contained inside a structure/class which includes a count of the number of bits in the array, and a pointer to the memory storing the array. Since arrays may be of arbitrary size, the memory storing the array is dynamically allocated on the heap.
The C++ bitarray class overloads bitwise operators (&, |, ^, ...), providing the expected results on bitarray objects. The C bitarray library provides functions (BitArrayAnd, BitArrayOr, BitArrayXor, ...) for similar functionality.
I have written the bitarray library so that functions and methods requiring multiple bit arrays (such as BitArrayAnd or &), will not do anything if they are given arrays of differing sizes to operate on.
Unfortunately I have not found a way to do anything close to this with bitarrays in C.
In C++ it's not possible to overload square brackets (
to behave both ways. Consequently square brackets (
returns a bit value and parenthesis (
()) returns a class that
behaves as a pointer to a bit in the array. The class returned by
()) may only be used for assigning bit values.
Rather than writing lengthy man pages for each of the functions in the bitarray library, I have taken a cheap cop-out. The bitarray source includes detailed headers preceding each function, and I have included a file named sample.[c|cpp] which demonstrates the usage of each function in the bitarray library.
All the source code that I have provided is written in strict ANSI C or ISO C++. I would expect it to build correctly on any machine with ANSI C/ISO C++ compilers. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows XP.
The library includes the routines intended for debugging which dump
the array contents to a display. These routines assume that
unsigned chars are 8 bits. These routines can easily be
written to support any specific size unsigned character. Writing
the dump routines to handle arbitrary size
unsigned char seems
more difficult than it is worth to me. Especially since I only have
access to machines with 8 bit
An archive containing the source for each bitarray library may be downloaded by clicking on the links below. My source has been released under the GNU LGPL.
My latest implementations of Huffman, LZSS, LZW, and arithmetic encoding all provide additional examples of how to use the C version of these libraries. If you still have any questions or comments feel free to e-mail me at email@example.com .
You might also want to visit DataCompression.info for additional information on all things related to compression.