[rbldnsd] enhanced dnset
David Landgren
david at landgren.net
Sun Nov 27 13:13:20 MSK 2005
And Sami Farin did write:
> On Sat, Nov 26, 2005 at 12:07:17AM +0300, Michael Tokarev wrote:
> ....
>
>>>if you don't want to give TXT record, use
>>>dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
>>>(PCRE pattern is always the latest field delimeted by ':').
>>>
>>>Does this sound sane?
>>>Free tips'n'tricks?
>>
>>Well, I don't think it's necessary to invent new complicated syntax
>>for such stuff. Plain regexps (maybe modified a bit to be more easy
>>for domains where a dot (.) is commonly used) are just fine, ie,
>
>
> I mentioned PCRE because I have used it before...
> (I added PCRE support for qmail).
> But one pcre_exec takes only around 3000 CPU cycles for
> the patterns like those mentioned in this email.
>
> 2GHz CPU could do those 666666 a second.
I have written a Perl module (Regexp::Assemble [1]) that will allow you
to take an arbitrary number of individual regexps and compile them down
into one expression. You could then feed that single regexp to
pcre/rbldnsd for use.
I do this already for DNS hostnames for residential addresses, like
1-2-3-4.bogo.example.com. I currently have about 3500 distinct regexps
(the above example would become \d+-\d+-\d+-\d+\.bogo\.example\.com)
which compile down into a single pattern of about 80 kb.
You do have to recompile pcre and up the LINK_SIZE define if you want to
play around with big patterns, though.
Later,
David
1. http://search.cpan.org/dist/Regexp-Assemble/
>
>>instead of
>>
>> dls.net :\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
>>
>>it's sufficient to use just
>>
>> \d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}\.dls\.net
>>
>>It's trivial to parse the regexp and extract a fixed ending part
>>(.dsl.net in this example).
>
>
> Okay, that sounds sane.
>
>
>>I had a working prototype of similar stuff long time ago (it will
>>not compile anymore as rbldnsd changed since), using shell-style
>>wildcards (?*[]) instead of regexps, with highly optimized matcher.
>>But the problem was -- it wasn't deterministic in speed. I know
>>which stuff people will try to use (shell-style):
>>
>> *dsl*
>> *[0-9][0-9][0-9]*
>>
>>etc. Ie, everything will sort into top-level domain, without
>>any suffix whatsoever.
>
>
> This I do not like.
>
>
>>Which, among the speed issue, has another
>>problem: what to do if a name matches *several* patterns like
>>that? Do we want to invent a "weight" for a regexp/pattern
>>(like, more wildcard characters = less weight etc) and try
>>to match every pattern we have, choosing the "best" one, or
>>pick a random (which?) one? (Well, here, another approach
>>can be used: "Order Matters", ie, first match found wins.)
>>
>>What I'd really like to see, and I already mentioned that
>>(probably even in the TODO file) is some sort of "finite
>>automata" implementation, like the one used in tools like
>>lex or re2c, but run-time (as opposed to compile-time)
>>changeable. For some reason I wasn't able to find such
>>a library anywhere on the 'net...
>>
>>This approach guarantees near-constant response time for
>>any number of (complex) expressions, and it will solve
>>"which match to choose" problem as well (longest match
>>wins).
>>
>>Yet there are more (albiet small and probably non-real-life)
>>issues. Like, what to do with domain labels containing
>>some "funny" characters like dots or \0s. Note that a
>>domain name isn't really a string of characters, it's a
>
>
> Yes. Maybe it's best just to send NXDOMAIN (+SOA if possible)
> in those cases.
>
>
>>structured entity (sequence of labels), pretty similar to
>>filenames; and shell-style wildcards normally does not work
>>"across" directory separators: /some/where/file does not
>>match against /some*file.
>>
>>To summarize: it isn't difficult to add support of regexps
>>into rbldnsd, but usually, that will be at least O(N) complexity,
>>where N is the number of entries found in the data file. Which
>
>
> How so? If you get query for
> 66-117-164-146.dls.net
> then rbldnsd has to do these lookups for our new and nice dnsetenh:
> net
> dls.net
>
> that makes two.
> (Maybe dnsetenh could have configurable max number of labels,
> in case someone wants to DoS (CPU-time-usage) it.)
>
> I thought rbldnsd could do the lookup for each label...
> so that you could match *dhcp*.edu, for example,
> without having to find every edu domain.
>
>
>>I don't like...
>>
>>/mjt
>
>
--
"It's overkill of course, but you can never have too much overkill."
More information about the rbldnsd
mailing list