TinyCDB - a Constant DataBase

by Michael Tokarev, mjt+cdb {at} tls {dot} msk {dot} ru.

Quick links


Introduction

TinyCDB is a very fast and simple package for creating and reading constant data bases, a data structure introduced by Dan J. Bernstein in his cdb package. It may be used to speed up searches in a sequence of (key,value) pairs with very big number of records. Example usage is indexing a big list of users - where a search will require linear reading of a large /etc/passwd file, and for many other tasks. It's usage/API is similar to ones found in BerkeleyDB, gdbm and traditional *nix dbm/ndbm libraries, and is compatible in great extent to cdb-0.75 package by Dan Bernstein.

CDB is a constant database, that is, it cannot be updated at a runtime, only rebuilt. Rebuilding is atomic operation and is very fast - much faster than of many other similar packages. Once created, CDB may be queried, and a query takes very little time to complete.

Programming interface

There are two interfaces provided by a library, -lcdb, -- create interface which is used to create CDB file, and two variants of query interface. A program using any routines should #include <cdb.h> header file which holds all required definitions of a library. More information together with detailed description of every routine is available in manual page inside TinyCDB package.

TinyCDB is different from Dan's cdb-0.75 in the following ways:

Create interface

Create interface is built around struct cdb_make structure which is opaque type. The following is a sequence of action which should be performed in order to create CDB file (error handling is omitted):

Query interface

There are two variants of query interface, one as found in cdb-0.75, and another as found in earlier versions of cdb (cdb-0.6x).

Query interface 1

This interface is built around struct cdb structure which is opaque to the application. This interface designed to be efficient for many queries, for a single query second variant may be more efficient. The following is a sequence of calls needed to perform a query of a value in a CDB file:

and here is what is needed to enumerate all values assotiated with a given key:
Query interface 2

Another, simpler query interface exists which is sutable for a single query. Two routines provided works with a single filedescriptor opened for reading:

Format of CDB file

To be written. Meanwhile, consult Dan Bernstain's cdb manual.

Terms of usage

The code is in public domain, that is, you may do anything you want with it.

Download

Latest version is 0.78, released 11 May 2012, and can be found here. It can be built on systems using RedHat Package Manager (rpm) with -tb option to create installable .rpm package. On a Debian GNU/Linux system, the preferred way to install it is to use standard apt repository. For other versions of the package and pre-built rpms look here.


Enjoy. Michael Tokarev, mjt+cdb {at} tls {dot} msk {dot} ru.