[rbldnsd] ipv6 support: beginnings of an ip6trie dataset

Mon Apr 8 19:02:02 MSK 2013

On Mon, Apr 08, 2013 at 09:27:21AM +0400, Michael Tokarev wrote:
> Ok. We have a rather busy period for some reason not under our control,
> so it took a bit too long.

I figured as much.  No problem.

> I merged everything from you into the main branch, added a few very
> minor changes on top, and tested the whole thing a bit - mainly on
> several different architectures.
> 
> In particular, I ran - just the trie selftest! - on armhf, armel,
> sparc and powerpc machines, and it all worked fine.  I had no way
> to run python tests on there, unfortunately, so if we want to
> perform more serious testing, it is still in the TODO list ofcourse.

I would guess that the trie self-test by itself is sufficient to
expose architecture-dependent bugs in btrie.c.  (I'm a bit surprised
that none show up.)

The python tests serve more as overall integration tests, and tests
of the changes I made in the rbldnsd_*.c dataset drivers.  If it's
a pain to run them on all architectures, I wouldn't worry about it too
much.  (That said, with proper attention to Build-Depends:, it should
be possible, without much trouble, to get them to run as part of the
debian build process.)

> Now I want to perform a few tests for loading speed, memory usage
> and query speed, comparing various options.  For that I want to
> grab some real DNSBL data (such as CBL and Spamhaus), because, as
> it turned out, "random" data tend to show different results than
> reality :)  I want to make these tests this week.

Very good. Let me know the results.

I think the memory efficiency of btrie.c should only get better when
the data is less random (i.e. more clustered data => greater fill
ratio in the TBM nodes).  It will be interesting to see whether that
bears out in practice.

> Also, It'd be interesting to instrument the code to perform some
> stats wrt reallocs and "wasted" memory, I also want to do it this
> week.

There's already some of that in place. The fields in the status line
which is logged after a dataset is loaded are:

  ents - total number of entries (prefixes) in the dataset
  tbm  - number of TBM nodes in the trie
  lc   - number of LC nodes in the trie
  mem  - total memory obtained from mp_alloc() (in kbytes)
  free - total number of bytes in the internal free pool
  waste- number of bytes "wasted" due to the fact that we round
         allocation sizes up to a multiple of sizeof(node_t).
         (For each TBM node with a data array of odd length,
         we "waste" one void* worth of space.)

So (free + waste) gives the total number of bytes which were allocated
(from mp_alloc) but are not actually used in the trie.

Additionally, if one defines BTRIE_DEBUG_ALLOC at compile time, a
histogram of hunk sizes will be kept by btrie's internal allocator.
A table which includes the number of free and allocated hunks of each
size will be dumped to stdout after the dataset is loaded.

If you have further questions, or would like help on anything, let me
know.

Cheers,
Jeff