[rbldnsd] Removing dups

Wed, 02 Apr 2003 17:26:14 +0400

furio ercolessi wrote:
> Hi all,
> 
> I just subscribed, lurked into the archive and saw that the issue
> of duplicates has been discussed.
> 
> This is how we handle it:
> 
> ... | sort -n -t. -k 1,1 -k 2,2 -k 3,3 -k 4,4 | aggregate -i prefix | ...

Still, I don't see a good reason of doing this.  Any preprocessing will
take additional time and difficulties of use.  Even if there are many
dups on input, both memory usage and query time will be comparable to
the case when dups will be removed.  Especially with rbldnsd 0.80, which
allows to construct zones from several files.  For example, here is how
I reconstruct osirusoft data here:

  rbldnsd ... \
    dialups.relays.osirusoft.com:ip4set:osirus.dialups \
    spews.relays.osirusoft.com:ip4vset:osirus.spews \
    socks.relays.osirusoft.com:ip4vset:osirus.socks \
    ...
    relays.osirusoft.com:ip4set:osirus.dialups \
    relays.osirusoft.com:ip4vset:osirus.spews \
    relays.osirusoft.com:ip4vset:osirus.socks \
    ...
    relays.osirusoft.com:generic:osirus.misc \
    ...

So every subzone is reused twice.  Also, I duplicate all the zones
(and several others too) in a single zone that is used to block email
(so that only one dns query is required instead of many).

Note that sort here requires quite a few CPU cycles and memory - much more
than necessary to sort list of ip addresses, since sort deals with text,
not with 4byte octets.

BTW, how about final pass in rbldnd, after sorting, to remove dups?  I just
a bit lazy to implement that properly... ;)

[]
> "aggregate" is at http://www.vergenet.net/linux/aggregate/
> 
> Hope it helps

Yes!..  I'm trying to teach rbldnsd to understand net ranges... ;)

/mjt