[rbldnsd] $TIMESTAMP - is per dataset expiry possible?

Thu Apr 27 07:26:09 MSD 2006

Let me make my case in more detail how and why I wish to use $TIMESTAMP on a
per dataset basis or perhaps, as you say, "SUBdataset". Bear with me here.
Take your time with this. Feel free to ignore me, if you wish, because this is
going to be long-winded.

I currently run a mailserver that uses blacklists and whitelists. My setup
includes rblsmtpd and rbldnsd in conjunction with BIND 9. It's a standard
configuration for this ... something like what is described here:
http://njabl.org/rsync.html . I like both rbldnsd and rblsmtpd *a lot*. These
are robust programs and they work *really* well, as you know.

Now, I want to start using a greylist scheme. In its simplest form, a greylist
rejects the first attempt to deliver email from a source listed on an RBL.
However, it accepts the mail on subsequent attempts. The rational being that
mass spam mailing methods generally only try to deliver an email once, but
legitimate email servers will continue to try until successful. Fine. I see
two problems with this very simple greylist configuration: Firstly, a spammer
simply needs to modify the method slightly to defeat the greylist -- just send
two copies of every email and both or one will be accepted. Secondly, this
simple greylist is ineffective against some virus generated mail traffic. For
example, I often see one ip address hammering my server maybe 100 times per
minute for 2 or 3 minutes. My blacklists usually stop these, but a standard
greylist would reject only the very first attempt (unless used in conjunction
with blacklists).

Here's the modified greylist scheme I am going to build: After the first hit
on the RBL being used as a greylist, the ip address gets blacklisted for a
short time (maybe 5 minutes) then after the blacklist status expires, that ip
address gets whitelisted for a longer time (maybe 24 hours). Here's how I
propose to accomplish this: a hit on the RBL will immediately cause the ip
address to be added to *both* a blacklist and a whitelist. The ip address will
somehow "expire" from the blacklist in a short time (5 minutes), but will
remain on the whitelist for a much longer time (24 hours). In the rblsmtpd
configuration, the blacklist would be listed ahead of the whitelist (usually
you see whitelists ahead of blacklists, but not for this scheme). So, the
rblsmtpd configuration would look something like this (an excerpt):

           rblsmtpd \
                  -r local-blacklist \
                  -a local-whitelist \
                  -g dnsbl.sorbs.net \

Notice the -g option. What is that? It does not exist ... yet. I don't want to
monkey with any of the existing rblsmtpd functionality, so I am thinking of
simply adding another option which will be a modified version of the -r
option. The -g option will send a temporary 451 fail, just like -r, but it
will also write the ip address to both the local-blacklist and
local-whitelist. Then, I was going to have the -g option tag the TXT record on
each list with an expiry time (5 minutes for BL, 24 hours for WL). I could
write this code relatively easily. However, there needs to be some method to
clear out the expired records from the local-blacklist and local-whitelist.
Perhaps, a cron job that ran some script. Or, I could modify the -r and -a
options to ignore expired records. That would work too, but the lists would
grow and grow and probably still require some system to clear them regularly
(cron job?). These solutions are not very elegant.

Then ... when looking at rbldnsd, I stumbled on the $TIMESTAMP directive. I
started to think maybe this could accomplish half of what I needed. Perhaps,
rbldnsd already contained the tools needed to handle the expiry of records
(namely, $TIMESTAMP). In this case, all I would need to do to rblsmtpd is add
the -g option to write a new dataset (or SUBdataset?) to the local-blacklist
and local-whitelist, complete with $TIMESTAMP-type expiry times. Then rbldnsd
would handle the expiries without me needing to write any code for this. I
suppose periodically, the local-white/blacklists will need to be refreshed to
get rid of expired datasets -- cron job rotating old files out, I guess.

I sense you are unlikely to indulge me here, but I'll take a shot at it anyway
;) . Ideally, I'd like another directive, say the $DTIMESTAMP directive.
$DTIMESTAMP would apply to individual datasets, just like $TTL can, for
example. Unlike $TIMESTAMP, an expired $DTIMESTAMP would never prevent an
entire zone from loading. $DTIMESTAMP would apply just to its own dataset.

haha, how's that? Am I naive to think you would do this for me? Seriously, I
think others would find valuable uses for this functionality. If I try to code
it myself it will be a terrible hack job. Don't make me do that ;-P

Cheers.
Paul

Michael Tokarev wrote:

> Paul McClure wrote:
>>>From a recent thread:
>> "The $TIMESTAMP directive works per-datafile, not per-dataset or per-zone."
>
> This is not exactly correct.  See below.
>
>> Would it be possible to make this work per dataset, or at least the option to
>> do so?
>
> The thing is.  If *any* part of a *zone* is expired/invalid/unavailable, the
*whole* zone becomes unavailable.  Yes I can think of situations where
partially loaded zone is useful (if it only contains "blacklisted" entries),
but I'd better not to risk (with additional confusing options etc) to reject
some valid email (as primary role for rbldnsd is to be used together with a
mailserver) instead of letting some spam in.
>
> Right now rbldnsd does not know what's the role of the data it is missing,
whenever it's used as a black- or white-list, whenever it has some
"exclusions"
> etc.  In case of data expiration, it knows at least whenever there was some
exclusions, but still does not know whenever it's a white- or black-list or
something entirely different.  May be it is possible to specify all this,
but
> it becomes way too clumsy and non-deterministic, I'm afraid.  BTW, the whole
this 'timestamp' options already is too clumsy because of possible local
clock
> differences.
>
> So I prefer to keep the logic simple: any part of data is unavailable, for
whatever reason => so is all the zones where this data is used, and rbldnsd
starts returning SERVFAIL to all queries to those zones.  Simple and clear,
and with a reason.
>
> And referring to the above "not entirely correct" statement - as you see,
expire time is really "per zone", with smallest timestamp choosen, not per
dataset or datafile or...
>
>> I want to leverage off of rbldnsd for a greylisting method in
>> qmail/rblsmtpd.
>> I won't provide details, but my problem is solved if I can use $TIMESTAMP per
>> dataset within the same datafile (using "combined" dataset type,
obviously).
>
> Ahh.. this is about "SUBdataset", not "normal dataset".  Quite strange setup
I'd say, -- looks like you're trying to solve the wrong problem using the
wrong
> methods... ;)  To be fair, this combination (subdataset expiration) is
something
> I didn't think of, and at a first glance it looks like it might be a useful
feature...
> But I'd not do it still.  You can expire a "subzone" by pulling it off the
combined dataset.   Depending on your usage scenario ofcourse... ;)
>
> But either way, maybe better to look at modifying the client software (for
rbldnsd, rblsmtpd is a client, right? :) to do all the calculations
internally,
> based on the data returned by rbldnsd?  Or even drop rbldnsd entirely and
use
> some eg mysql backend to store the data... I dunno...
>
> /mjt
>
> _______________________________________________
> rbldnsd mailing list
> rbldnsd at corpit.ru
> http://www.corpit.ru/mailman/listinfo/rbldnsd
>