[rbldnsd] regular expression support for rbldnsd
David Landgren
david at landgren.net
Fri Aug 14 15:00:43 MSD 2009
Steven Champeon wrote:
> on Wed, Aug 12, 2009 at 06:21:11PM +0200, Per Jessen wrote:
>> Steven Champeon wrote:
>>
>>> We have a patch against rbldnsd 0.996b that provides support for
>>> regular expression-based fast lookups of HELO and PTR strings, in an
>>> rbldnsd zone, that return our classifications for hostnames for use in
>>> scoring or blocking bot-originated email.
>> Interesting idea. We have a list of such patterns which is evaluated by
>> Postfix. I can't immediately see if a DNS-based solution instead would
>> improve things.
>
> It depends on whether your list is short or long; sendmail handled
> inline regex maps just fine until we hit around 10K-15K, at which point
> it became a matter of avoiding the hassle of recompiling the .cf file
> every time there was an update. The DNSBL approach simplified the process
> or managing updates tremendously. I've had reports that Postfix with a
> policy daemon works rather well, but again you're just shifting the load
> from one server to another, and the policy daemon needs to have a local
> copy of the patterns, etc. Exim, at least anecdotally, fell over quite
> hard when dealing with large flat files containing the patterns.
>
> We used to use a set of "compact" (left-anchored) hostname-only (not
> including domain) patterns for a while, but there were too many idiot
> setups sending mail from hosts named "^host[0-9]+\." and the like, so
> we have stuck with just fully qualified patterns and "right anchor"
> strings (such as "dynamic.example.net"), but we're thinking of even
> abandoning the latter as we see an occasional mail server set up for
> use by residential customers, for example, that uses the residential
> keyword/token as part of its name :-/
Coming late into the conversation here, it's summer...
You really want to go with left-anchored hostnames. You might want to
look at my Perl library Regexp::Assemble. The idea is that you don't
care which one of 45000 patterns matched, just that one of them did. So
you assemble them all into a mega-gigantic single pattern and let the
regexp engine loose on that.
http://search.cpan.org/dist/Regexp-Assemble/
(You build the pattern off-line, and feed the compiled result to rbldnsd
afterwards).
This has some interesting and desirable properties. Legitimate hosts
tend to fail extremely quickly. If they are named 'mail*', 'mx*' or
'relay*' the pattern will fail after examining the first two or three
characters of the hostname. Secondly, left-anchored means no
backtracking. Once the engine has exhausted the possibilities at a given
point in the pattern, it fails.
You will need to recompile PCRE and up various compile-time #defines to
allow the engine to handle 100k+ pattern lengths.
Hoping this is of interest to you,
David
--
I lit it up with an engine, now it's rolling for you.
More information about the rbldnsd
mailing list