[rbldnsd] enhanced dnset

Sat Nov 26 00:55:12 MSK 2005

On Sat, Nov 26, 2005 at 12:07:17AM +0300, Michael Tokarev wrote:
....
> >if you don't want to give TXT record, use
> >dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
> >(PCRE pattern is always the latest field delimeted by ':').
> >
> >Does this sound sane?
> >Free tips'n'tricks?
> 
> Well, I don't think it's necessary to invent new complicated syntax
> for such stuff.  Plain regexps (maybe modified a bit to be more easy
> for domains where a dot (.) is commonly used) are just fine, ie,

I mentioned PCRE because I have used it before... 
(I added PCRE support for qmail).
But one pcre_exec takes only around 3000 CPU cycles for
the patterns like those mentioned in this email.

2GHz CPU could do those 666666 a second.

> instead of
> 
>   dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
> 
> it's sufficient to use just
> 
>  \d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
> 
> It's trivial to parse the regexp and extract a fixed ending part
> (.dsl.net in this example).

Okay, that sounds sane.

> I had a working prototype of similar stuff long time ago (it will
> not compile anymore as rbldnsd changed since), using shell-style
> wildcards (?*[]) instead of regexps, with highly optimized matcher.
> But the problem was -- it wasn't deterministic in speed.  I know
> which stuff people will try to use (shell-style):
> 
>   *dsl*
>   *[0-9][0-9][0-9]*
> 
> etc.  Ie, everything will sort into top-level domain, without
> any suffix whatsoever. 

This I do not like.

> Which, among the speed issue, has another
> problem: what to do if a name matches *several* patterns like
> that?  Do we want to invent a "weight" for a regexp/pattern
> (like, more wildcard characters = less weight etc) and try
> to match every pattern we have, choosing the "best" one, or
> pick a random (which?) one?  (Well, here, another approach
> can be used: "Order Matters", ie, first match found wins.)
> 
> What I'd really like to see, and I already mentioned that
> (probably even in the TODO file) is some sort of "finite
> automata" implementation, like the one used in tools like
> lex or re2c, but run-time (as opposed to compile-time)
> changeable.  For some reason I wasn't able to find such
> a library anywhere on the 'net...
> 
> This approach guarantees near-constant response time for
> any number of (complex) expressions, and it will solve
> "which match to choose" problem as well (longest match
> wins).
> 
> Yet there are more (albiet small and probably non-real-life)
> issues.  Like, what to do with domain labels containing
> some "funny" characters like dots or \0s.  Note that a
> domain name isn't really a string of characters, it's a

Yes.  Maybe it's best just to send NXDOMAIN (+SOA if possible)
in those cases.

> structured entity (sequence of labels), pretty similar to
> filenames; and shell-style wildcards normally does not work
> "across" directory separators: /some/where/file does not
> match against /some*file.
> 
> To summarize: it isn't difficult to add support of regexps
> into rbldnsd, but usually, that will be at least O(N) complexity,
> where N is the number of entries found in the data file.  Which

How so?  If you get query for
66-117-164-146.dls.net
then rbldnsd has to do these lookups for our new and nice dnsetenh:
net
dls.net

that makes two.
(Maybe dnsetenh could have configurable max number of labels,
in case someone wants to DoS (CPU-time-usage) it.)

I thought rbldnsd could do the lookup for each label...
so that you could match *dhcp*.edu, for example,
without having to find every edu domain.

> I don't like...
> 
> /mjt

--