[rbldnsd] Re: rbldnsd-0.995 and duplicate entries
Sami Farin
safari-rbldnsd at safari.iki.fi
Thu Oct 6 22:20:00 MSD 2005
On Thu, Oct 06, 2005 at 08:55:40PM +0400, Michael Tokarev wrote:
> [Cc'ing rbldnsd mailinglist, for future reference.
> If you want to discuss rbldnsd, please use the mailinglist.
> Thanks]
>
> Sami Farin wrote:
> >hi.
> >
> >I wanted to sort and remove duplicates from my list of
> >dynamic hosts, which I serve with rbldnsd.
> >There are some duplicates and some stupidity
> >like listing a /16 netblock by listing all of the 256
> >/24 netblocks. First I thought I could modify rbldnsd
> >to spit out human-readable text file with -D option
> >(dumphumanzone()) but I noticed rbldnsd does not remove
> >duplicated entries, when the network sizes differ:
> >for example if both 82.22 and 82.22.33 are listed, rbldnsd does
> >not remove 82.22.33. That causes extra /24 to be listed
> >in the stats:
> >
> >2005-10-06 17:53:21.915621500 rbldnsd:
> >ip4set:easynet.nl/rbldnsd.dynablock.txt: 20051006 145254:
> >e32/24/16/8=579103/319512/2071/0
> >
> >and it also gives too many useless (duplicated) lines with -d option.
> >82.22 and 82.22.33 have the same TXT record, so there's no point
> >in including 82.22.33.
>
> Looks like you somewhat misunderstood what rbldnsd is for.
I know what it's for.
...
> Note also that master-format data dump (-d) was implemented
> as a side feature, like a quick hack, sort of, without much
> optimisations, because I don't think bind is suitable to
I hoped I could do my own quick hack and sort out that
file once and for all (only one time needed)...
> serve large DNSBL zones (or else why rbldnsd exists?), and
> for small zones, it's basically irrelevant whenever data
> is compact/optimized or not, while it is correct.
>
> >When this bug gets fixed, I start coding the -D feature...
>
> Thanks. I consider it to be somewhat rude to name something a
> bug without understanding it.
In my example there was only one TXT record I like to give out.
But I can call it a feature I don't like, then :-O
> >Any free tips and tricks? -d is line-oriented, but -D should
> >print sequential /16 /24 and /32 networks in one line,
> >e.g.
> >80.237.53.225-230
> >instead of
> >80.237.53.225
> >80.237.53.226
> >80.237.53.227
> >80.237.53.228
> >80.237.53.229
> >80.237.53.230
>
> If you want to create an optimizer for network ranges (I think
> there are several already available, but I haven't looked), you'd
I know verge.net.au's aggregate, but for the example shown
above it would give this:
80.237.53.225/32
80.237.53.226/31
80.237.53.228/31
80.237.53.230/32
and when I convert that again to "range" type list:
80.237.53.225 - 80.237.53.225
80.237.53.226 - 80.237.53.227
80.237.53.228 - 80.237.53.229
80.237.53.230 - 80.237.53.230
I rather edit and maintain human-readable data.
Those example /31 and /32 are easy, but if there are loads
of them to maintain by hand, it's no fun.
> better use somewhat different data structures internally, at least
> not the same as in ip4set. Well, if you want to write optimal
> master-format dump, ip4set structures may be of some use... For
> CIDR optimisation, take a look at ip4trie - it's optimized for
> single-IP lookups, but with minor modification it can be used
> for range sets optimisation (but the whole structure is quite
> trivial anyway).
>
> You can take some infrastructure from rbldnsd, like address
> parsing etc, but that's basically it - such optimizer should
> be a separate application, without all the DNS/network/etc
> baggage of rbldnsd. IMHO ofcourse.
Surely. I just happen to maintain a list whose format rbldnsd eats,
and I haven't found an optimizer for the ip4set format, yet.
If I hack -D option which spits out CIDRs, maybe I can
pipe it to modified aggregate which makes saner "ranges".
Yes. I probably do that, if nobody knows of a program
which optimizes the file like I wanted..
> >I guess this is the hardest part of the coding.
> >I am still studying the code to see where to start.
>
> I think the hardest part is to try to mix-n-match two quite
> different tasks in one application, basing them on the same
> data structures.
>
> And speaking of dynablock lists, I have several separate
> points, most important of which is:
>
> Usually, if you maintain such a list, you know where each
> entry come from and why (using appropriate comments etc),
> so that it will be possible to find what's wrong after
> receiving a complaint and so on. When you optimize it,
> you lose this info, and make maintenance of the list more
> difficult. For such a list, maintenance is the #1 problem,
> efficiency/optimisation of data does not really matter -
Now when it's not optimized, there can be many matches
for any one IP address, so it turns into a maintenance problem.
the file available via rsync of zone dul.dnsbl.sorbs.net
is 139284 lines, and when optimized[1] 86975 lines.
but I don't know what's their "master" input file format -- maybe
they have secret comments for each line.
[1] not perfect optimization, but: lines not starting
with ! and lines starting with ! were aggregated and
86975 was the sum of the two.
> tools such as rbldnsd will do their best to keep it running
> without spending much CPU cycles or memory - +/- several
> millisecounds on each reload and several kilo-bytes of
> memory is nothing compared to maintenance costs...
That's right. I don't care much about a couple of kilobytes
extra mem used and I am happy with rbldnsd,
but not with the maintenance of the data.
> Or something like that, anyway.
--
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.corpit.ru/pipermail/rbldnsd/attachments/20051006/f59d381d/attachment-0001.pgp
More information about the rbldnsd
mailing list