[rbldnsd] regular expression support for rbldnsd

David Landgren david at landgren.net
Fri Aug 14 15:00:43 MSD 2009


Steven Champeon wrote:
> on Wed, Aug 12, 2009 at 06:21:11PM +0200, Per Jessen wrote:
>> Steven Champeon wrote:
>>
>>> We have a patch against rbldnsd 0.996b that provides support for
>>> regular expression-based fast lookups of HELO and PTR strings, in an
>>> rbldnsd zone, that return our classifications for hostnames for use in
>>> scoring or blocking bot-originated email.
>> Interesting idea.  We have a list of such patterns which is evaluated by
>> Postfix.  I can't immediately see if a DNS-based solution instead would
>> improve things.   
> 
> It depends on whether your list is short or long; sendmail handled
> inline regex maps just fine until we hit around 10K-15K, at which point
> it became a matter of avoiding the hassle of recompiling the .cf file
> every time there was an update. The DNSBL approach simplified the process
> or managing updates tremendously. I've had reports that Postfix with a
> policy daemon works rather well, but again you're just shifting the load
> from one server to another, and the policy daemon needs to have a local
> copy of the patterns, etc. Exim, at least anecdotally, fell over quite
> hard when dealing with large flat files containing the patterns.
> 
> We used to use a set of "compact" (left-anchored) hostname-only (not
> including domain) patterns for a while, but there were too many idiot
> setups sending mail from hosts named "^host[0-9]+\." and the like, so
> we have stuck with just fully qualified patterns and "right anchor"
> strings (such as "dynamic.example.net"), but we're thinking of even
> abandoning the latter as we see an occasional mail server set up for
> use by residential customers, for example, that uses the residential
> keyword/token as part of its name :-/

Coming late into the conversation here, it's summer...

You really want to go with left-anchored hostnames. You might want to 
look at my Perl library Regexp::Assemble. The idea is that you don't 
care which one of 45000 patterns matched, just that one of them did. So 
you assemble them all into a mega-gigantic single pattern and let the 
regexp engine loose on that.

   http://search.cpan.org/dist/Regexp-Assemble/

(You build the pattern off-line, and feed the compiled result to rbldnsd 
afterwards).

This has some interesting and desirable properties. Legitimate hosts 
tend to fail extremely quickly. If they are named 'mail*', 'mx*' or 
'relay*' the pattern will fail after examining the first two or three 
characters of the hostname. Secondly, left-anchored means no 
backtracking. Once the engine has exhausted the possibilities at a given 
point in the pattern, it fails.

You will need to recompile PCRE and up various compile-time #defines to 
allow the engine to handle 100k+ pattern lengths.

Hoping this is of interest to you,
David

--
I lit it up with an engine, now it's rolling for you.


More information about the rbldnsd mailing list