[rbldnsd] enhanced dnset

David Landgren david at landgren.net
Sun Nov 27 13:13:20 MSK 2005


And Sami Farin did write:
> On Sat, Nov 26, 2005 at 12:07:17AM +0300, Michael Tokarev wrote:
> ....
> 
>>>if you don't want to give TXT record, use
>>>dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
>>>(PCRE pattern is always the latest field delimeted by ':').
>>>
>>>Does this sound sane?
>>>Free tips'n'tricks?
>>
>>Well, I don't think it's necessary to invent new complicated syntax
>>for such stuff.  Plain regexps (maybe modified a bit to be more easy
>>for domains where a dot (.) is commonly used) are just fine, ie,
> 
> 
> I mentioned PCRE because I have used it before... 
> (I added PCRE support for qmail).
> But one pcre_exec takes only around 3000 CPU cycles for
> the patterns like those mentioned in this email.
> 
> 2GHz CPU could do those 666666 a second.

I have written a Perl module (Regexp::Assemble [1]) that will allow you 
to take an arbitrary number of individual regexps and compile them down 
into one expression. You could then feed that single regexp to 
pcre/rbldnsd for use.

I do this already for DNS hostnames for residential addresses, like 
1-2-3-4.bogo.example.com. I currently have about 3500 distinct regexps 
(the above example would become \d+-\d+-\d+-\d+\.bogo\.example\.com) 
which compile down into a single pattern of about 80 kb.

You do have to recompile pcre and up the LINK_SIZE define if you want to 
play around with big patterns, though.

Later,
David

1. http://search.cpan.org/dist/Regexp-Assemble/

> 
>>instead of
>>
>>  dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
>>
>>it's sufficient to use just
>>
>> \d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
>>
>>It's trivial to parse the regexp and extract a fixed ending part
>>(.dsl.net in this example).
> 
> 
> Okay, that sounds sane.
> 
> 
>>I had a working prototype of similar stuff long time ago (it will
>>not compile anymore as rbldnsd changed since), using shell-style
>>wildcards (?*[]) instead of regexps, with highly optimized matcher.
>>But the problem was -- it wasn't deterministic in speed.  I know
>>which stuff people will try to use (shell-style):
>>
>>  *dsl*
>>  *[0-9][0-9][0-9]*
>>
>>etc.  Ie, everything will sort into top-level domain, without
>>any suffix whatsoever. 
> 
> 
> This I do not like.
> 
> 
>>Which, among the speed issue, has another
>>problem: what to do if a name matches *several* patterns like
>>that?  Do we want to invent a "weight" for a regexp/pattern
>>(like, more wildcard characters = less weight etc) and try
>>to match every pattern we have, choosing the "best" one, or
>>pick a random (which?) one?  (Well, here, another approach
>>can be used: "Order Matters", ie, first match found wins.)
>>
>>What I'd really like to see, and I already mentioned that
>>(probably even in the TODO file) is some sort of "finite
>>automata" implementation, like the one used in tools like
>>lex or re2c, but run-time (as opposed to compile-time)
>>changeable.  For some reason I wasn't able to find such
>>a library anywhere on the 'net...
>>
>>This approach guarantees near-constant response time for
>>any number of (complex) expressions, and it will solve
>>"which match to choose" problem as well (longest match
>>wins).
>>
>>Yet there are more (albiet small and probably non-real-life)
>>issues.  Like, what to do with domain labels containing
>>some "funny" characters like dots or \0s.  Note that a
>>domain name isn't really a string of characters, it's a
> 
> 
> Yes.  Maybe it's best just to send NXDOMAIN (+SOA if possible)
> in those cases.
> 
> 
>>structured entity (sequence of labels), pretty similar to
>>filenames; and shell-style wildcards normally does not work
>>"across" directory separators: /some/where/file does not
>>match against /some*file.
>>
>>To summarize: it isn't difficult to add support of regexps
>>into rbldnsd, but usually, that will be at least O(N) complexity,
>>where N is the number of entries found in the data file.  Which
> 
> 
> How so?  If you get query for
> 66-117-164-146.dls.net
> then rbldnsd has to do these lookups for our new and nice dnsetenh:
> net
> dls.net
> 
> that makes two.
> (Maybe dnsetenh could have configurable max number of labels,
> in case someone wants to DoS (CPU-time-usage) it.)
> 
> I thought rbldnsd could do the lookup for each label...
> so that you could match *dhcp*.edu, for example,
> without having to find every edu domain.
> 
> 
>>I don't like...
>>
>>/mjt
> 
> 


-- 
"It's overkill of course, but you can never have too much overkill."



More information about the rbldnsd mailing list