[rbldnsd] ipv6 support: beginnings of an ip6trie dataset

Tue Mar 19 12:26:14 MSK 2013

13.03.2013 01:44, Jeff Dairiki wrote:
> On Sat, Mar 02, 2013 at 11:26:55AM -0800, Jeff Dairiki wrote:
[]
> Okay, it took longer than two or three days, but I've completely
> redone the trie implementation (several times, at this point...) I
> think it was worth it. The current implementation uses less memory
> than the previous one by roughly a factor of three.  For details on
> the implementation, see the essay-like comments at the top of btrie.c.
> 
> The code is at:
> 
>    https://github.com/dairiki/rbldnsd
> 
> The actual code is in the 'ipv6' branch:
> 
>    https://github.com/dairiki/rbldnsd/tree/ipv6
> 
> If you don't want to deal with the git, you can fetch a tarball from:
> 
>    https://github.com/dairiki/rbldnsd/tarball/ipv6/rbldnsd.tar.gz

This is really _excellent_ stuff.  I never really bothered to
understand how these tries work, I just borrowed the trie
implementation from somewhere else, still not understanding
the inner workings.  To my shame.

But your code looks very good.  I have a few minor comments,
but these are just that - minor, the core thing is good.

Now looking at it all, maybe we should change all the types
used in the code to always be in network byte order, the
same way your trie works?  I mean, the ip4addr_t &friends?

On the other hand, how do you think, if we change the trie
prefix to use words instead of bytes?  I *guess* using native
word comparisons instead of memcmp should work better?

But I'm not sure how significant this speed difference is,
to start with.  When I wrote rbldnsd initially, computers
were much slower, and it was really important to get every
CPU cycle out of it, -- for example, my mirror of CBL&DSBL&SH,
which were delivering about 10000PPS (packets per second)
was run on a PIII-class 600MHz celeron CPU.  Nowadays it
should be much much faster...

> Some notes, in order of decreasing significance:
> 
> Untested on Big-Endian Systems
> ------------------------------
> 
> The new trie code does some hairy bit-field packing.  As of yet, it
> has only been tested in an Intel, little-endian environment.  I've
> attempted to ensure that it should work on big-endian machines, but
> that is currently untested.  (Please send me reports of success or
> failure.)  (Also see 'tests' below.)

In a few days I'll test on several architectures, including big and
little endian and 32&64 bits.  These are Debian porter boxes.

> Other Changes
> -------------
> 
> I've also plugged the new trie code into the ip4trie (and acl)
> datasets.  For the ip4trie, with 64k random 32-bit IP4 addresses, the
> new code uses less memory by a factor of roughly three.  That means
> that the ip4trie dataset (containing only 32-bit addresses) now uses
> only 50% more memory than the ip4set dataset.

With this, maybe we should just drop all other ip4* things altogether,
how do you think?  Maybe ip4tset can be kept, but ip4set (with its
very confusing semantic) may go.  Ip4tset is okay with ip4addr_t
being in network byte order too (it does not matter how to order
things there).  With nowadays memory sizes it does not matter anymore
that new ip4trie uses 50% more memory than ip4set.

Did you try comparing _speed_ of these?  Especially the load speed?
What is very nice about ip4*set is their very fast loading, due to
inline qsort implementation and native comparisons.  Ip4trie was quite
a bit slower here.

Lookup speed shouldn't be an issue here, or at least not LARGE issue.
But dataset loading time is important.

> The ACL dataset now allows both IP4 and IP6 entries.
> 
> The NEWS and debian/changelog files, and the rbldnsd.8 man page have
> been updated to reflect the changes I've made.  Refer to them for
> slightly more detailed descriptions of what has been done.

There's no need to update debian/changelog really.  It is specific
to debian and doesn't even belong to the main source tarball.  But
that doesn't hurt either :)

> Tests
> -----
> 
> There are a number of tests for the new trie code.  Some are written
> in C; there are also now some python scripts which test rbldnsd by
> running it as a foreground daemon and sending it DNS queries.  Both
> the C self-tests and the python tests provide near 100% coverage of
> the trie insertion and lookup code. (The test coverage of the "zone
> dump" code is spotty at present.)

I'm not sure we really need this "zone dump" for the larger datasets
anymore.  The v6 datasets may just print some comment saying the
operation isn't supported, and that's all.  This whole code was more
or less just a compatibility hack, when rbldnsd was new, so it was
easy for a DNSBL to provide master-format data for secondaries who
is running BIND, so it was possible to combine rbldnsd and BIND
nameservers for one zone.  I don't know how relevant this is today,
plus the fact that rbldnsd does not support any way of zone transfers..
dunno.

> You can run the tests like this:
> 
>   ./configure
>   make
>   make check
> 
> To run the python test scripts, you'll need to have python 2.6 or 2.7
> installed, as well as the pydns library (which in debian is in the
> python-dns package.)
> 
> The new trie code also makes use of run-time assertions (or "sanity
> checks").  By default these are disabled, to compile them in, run
> configure with the --enable-asserts argument.

Yeah, that's an example of minor things which I don't like -- this
your game with asserts.  You redefine standard meaning of DEBUG/NDEBUG
slightly, and make it confusing a bit.  Especially with your own
assert-like constructs.

Another such example is the "realloc" code on top of mempool, -- maybe
it is better to use real malloc/realloc here, and use mempool just for
some memory objects whcih dont need realloc?  The idea of mempool was
to simplify memory deallocation when the object isn't needed anymore,
to free it all very fast in one go without worrying about memory leaks.
But it really wasn't designed for realloc.  If we can walk over whole
tree freeing all objects while we go, that should be better than to
hack realloc on top of mempool, I think, but sure thing I may be wrong.

Now, the question is: are you okay to integrate this stuff into main
code?  We've a few difference in licensing terms here, which may be
problematic a bit.

Thank you very much for all this hard work.  I really apprecate it!

/mjt