[rbldnsd] enhanced dnset

Sat Nov 26 00:07:17 MSK 2005

Sami Farin wrote:
> one feature I am going to add into rbldnsd is PCRE or similar 
> support into dnset (probably dstype named as "dnsetenh" or something).
> 
> I maintain dynamic/dhcp etc IP address list and list of dynamic/generic
> domains.  []

> So, I thought I could modify rbldnsd this way:
> in dnsetenh configuration file I give
> dls.net optionaltextrecordhere:\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
> 
> when rbldnsd gets query about 216-145-224-209.dls.net.dnsetenhdomainhere
> it finds that dls.net has TXT record "optionaltextrecordhere" and
> PCRE pattern "\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net"
> and it tries PCRE match for "216-145-224-209.dls.net".
> Pattern matches and rbldnsd gives out the TXT record optionaltextrecordhere.
> If it does not match, it gives out NXDOMAIN.
> 
> if you don't want to give TXT record, use
> dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
> (PCRE pattern is always the latest field delimeted by ':').
> 
> Does this sound sane?
> Free tips'n'tricks?

Well, I don't think it's necessary to invent new complicated syntax
for such stuff.  Plain regexps (maybe modified a bit to be more easy
for domains where a dot (.) is commonly used) are just fine, ie,
instead of

   dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net

it's sufficient to use just

  \d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net

It's trivial to parse the regexp and extract a fixed ending part
(.dsl.net in this example).

I had a working prototype of similar stuff long time ago (it will
not compile anymore as rbldnsd changed since), using shell-style
wildcards (?*[]) instead of regexps, with highly optimized matcher.
But the problem was -- it wasn't deterministic in speed.  I know
which stuff people will try to use (shell-style):

   *dsl*
   *[0-9][0-9][0-9]*

etc.  Ie, everything will sort into top-level domain, without
any suffix whatsoever.  Which, among the speed issue, has another
problem: what to do if a name matches *several* patterns like
that?  Do we want to invent a "weight" for a regexp/pattern
(like, more wildcard characters = less weight etc) and try
to match every pattern we have, choosing the "best" one, or
pick a random (which?) one?  (Well, here, another approach
can be used: "Order Matters", ie, first match found wins.)

What I'd really like to see, and I already mentioned that
(probably even in the TODO file) is some sort of "finite
automata" implementation, like the one used in tools like
lex or re2c, but run-time (as opposed to compile-time)
changeable.  For some reason I wasn't able to find such
a library anywhere on the 'net...

This approach guarantees near-constant response time for
any number of (complex) expressions, and it will solve
"which match to choose" problem as well (longest match
wins).

Yet there are more (albiet small and probably non-real-life)
issues.  Like, what to do with domain labels containing
some "funny" characters like dots or \0s.  Note that a
domain name isn't really a string of characters, it's a
structured entity (sequence of labels), pretty similar to
filenames; and shell-style wildcards normally does not work
"across" directory separators: /some/where/file does not
match against /some*file.

To summarize: it isn't difficult to add support of regexps
into rbldnsd, but usually, that will be at least O(N) complexity,
where N is the number of entries found in the data file.  Which
I don't like...

/mjt