[rbldnsd] RFC: Data expire support

Michael Tokarev mjt at tls.msk.ru
Sat Dec 31 21:45:28 MSK 2005


Amos Jeffries wrote:
> Comments after each point.
> 
> ----- Original Message ----- From: "Michael Tokarev" <mjt at tls.msk.ru>
> 
> <snip people complaining>
> 
>>
>> So, after several small discussions here and there, we come across
>> this simple idea: a data file may contain an 'expire marker', ie,
>> a timestamp indicating when the data becomes invalid.
>>
>> Yes there is `expire' field in the SOA record, which is not used
>> currently, but I'm for another way to specify this expire time,
>> because of several reasons:
>>
> <snip good points, I agree with all>
> 
> Additionally. With a seperate created timestamp in the zone file and a 
> TTL set for each dataset you could now actually start decrementing the 
> various SOA times as some of them are supposed to. Starting from the 
> initial created timestamp not from file reload or anything arbitrary to 
> the local system.

Well, we *still* depend on local system's time alot here, because we don't
know real relation between the timestamp(s) specified in zone datafiles
and the local time.

I see your point.   When data comes from different sources, each can have
its own timestamp and expire time, and even TTL value may be set for each
individual record...  Note for this, I'd add another option in a $TIMESTAMP
record, like, 'proposed-ttl' or something like that, which should be BEFORE
the expire time.  Expire time is when it really is invalid.

But note that other fields in the SOA record are useless for anything but
NAMED and AXFR for zone transfers.  Only useful values are SERIAL and MINTTL,
the rest are (correctly) ignored by all non-AXFR clients (and we don't have
any AXFR clients by definition).

>> So, I think it's best to introduce another $-special directive, like
>> $EXPIRE, to specify expire time.
>>
>> The only (probably quite serious) problem here is that local time may
>> be inaccurate/wrong.
> 
> Not that serious from a maintainers viewpoint, if you prohibit/error on 
> zones created in the future. Zones expiring early only cause a 
> default-accept in the end-systems, which in this case is ironicly better 
> than default-deny.

Think about exclusions... ;)

I don't want to go "if there's no single 'exclude' entry in the file, we
will treat it differently" route.  It will be non-deterministic.

> <snip explanation of source for time checks>
> 
>> So, after some more thoughts, current version of this feature looks like
>> this:
>>
>>  o there's a new directive introduced,
>>      $TIMESTAMP when-created when-expires [...probably some more fields]
>>    where both timestamps are either in unix time_t format (secounds since
>>    Epoch) or in form yyyy-mm-dd[:hh24[:mi[:ss]]] (hours, minutes and
>>    secounds are optional), in GMT/UTC.
> 
> UTC please.
> GMT has english daylight savings problems in some systems.
> UNIX time_t gets unweild for manual maintainers (yes we are still out 
> there, at least for some zones).

Ok.  In reality it makes no difference -- more, I think I will only check
date (yyyy-mm-dd), not time (hh:mi:ss).. unless there's a demand to expire
in less than a day... ;) (Ok ok, there shure may be such a demand).

>> When-created must be the current
>>    timestamp, and when-expires must be a time when the data will expire
>>    and must be reloaded.  When-created is here in order to detect the
>>    situation when local time is set to the past, before when-created.
> 
> And has other useful benefit side-effects as well. (SOA cycling, etc.)
> 
>>  o data expiration is checked during a time when rbldnsd checks for
>>    data updates (normally every minute, controlled by -c option).
>>    So if you disabled automatic checking for new data (-c0), it will
>>    not verify whenever the data has expired during runtime, only during
>>    reloads when data has actually changed, which is sorta pointless.
> 
> Does anyone set -c0 for CPU savings? if not, the timer could still check 
> expiry, just not file times.

-c option controls the timer interval. With -c0, no timer is started at all.

>>  o (optional, I like it but it's somewhat clumsy in implementation,
>>    when there are several timestamps per dataset) when expire time
>>    is about to come (eg, 90% of time between 'created' and 'expire'
>>    has passed), rbldnsd produces a warning on reload.
> 
> If its seriously clumsy, leave it for later. It might be useful, but we 
> don't know for sure yet.
> If the zone compiler ever gets done, this should probably be a feature 
> of that.
> 
>>  o If there are several expire dates per dataset/zone, the smallest one
>>    will be used.
> 
> Likewise, multiple created lines the _latest_ will need to be used.
> So that mixed zones 1hr + 48hr data will be based on the latest 1hr 
> data, not the older 48hr by chance ordering.

Hmm.
Now I'm confused.
/me *thinks*

> more below on whole process implications....
> 
>>  o there's an option introduced, to tell rbldnsd to ignore all those
>>    'expire' directives, in order to be able to force it to run "in case
>>    of emergency" (current time is wrong, etc).  I dislike introducing
>>    this option, but unfortunately there WILL be alot of demand for it
>>    once major dnsbls will start using this feature and people upgrade
>>    their rbldnsd installs to the version supporting this stuff.
>>
>>  o Or maybe, an ability to turn this feature on/off per-dataset, by
>>    specifying `$TIMESTAMP 0 0'.  Also ugly but demanding...
> 
> - not in the same format as a timestamp which is expected.
> 
> How so on the 'WILL be a lot of demand for it'?
> When the data is old it is invalid and should not be served. End of story.
> Even re-purposing the data doesn't change that.
> A responsible mirror will adhere to that, an irresponsible one is what 
> this feature is designed to avoid.
> 
> The only use I could see for keeping old files is someone compiling list 
> stats and that doesn't involve rbldnsd.

Other usages are possible too.  Also, while eg cbl or dsbl *official*
nameservers are *required* to enforce expire time accurately, local
cache used for slightly different purpose (like for scoring, not for
outright rejection) may have less strict rules.  For example, a one-
day-old dsbl data on official dsbl nameserver is not acceptable, while
even several-days-old dsbl data on a local system still IS acceptable.
I think anyway.

Another option is to force normal operations in case of incorrect local
time.

Yet another option is - as you mentioned - to keep old/historic data
and be able to compare etc with current data.  I used rbldnsd for
this very purpose in the past, lacking any other 'better' tools ;)
For example, to see which records has been added, I queried every
IP listed in 'current' data against 'old' data, and printed those
which does not exist in 'old' data.  Yes there are tools like diff
but diff is useless on a *set* (as opposed to ordered list), and
dsbl data is a set, and sorting it (with ol'good sort utility) is
almost impossible on at least some of my systems.

> If the $TIMESTAMP <created> <expires> indicates that data is invalid. 
> Are you planning on stopping the file load? or just dumping all the 
> entries until another $TIMESTAMP indicates a section of valid data?
> 
> I would like to see that behaviour. With rather than the _absolute_ 
> smallest expiry time being used, the smallest for the valid data served.
> 
> Back to my mixed zone example:
>    The 1hr data has failed to regenerate but the 48hr is still valid. 
> Will it dump the old 1hr data but still serve the 48hr IPs saying 24 
> hours to expiry?

No no no no.  No partially-loaded data please.  Oh well.
We either load it all, OR do not load at all (resulting in SERVFAIL in
all queries).  See also above, my comment about exclusions.

I think about this new 'expire' mechanism as about a safety measure.
Ie, if there's a problem updating the data, don't propagate it to
other places (like rejecting email etc) - it's a time to fix the
first problem (with updating), instead of trying to work around it,
going some non-deterministic hard-to-understand way.  IMHO ofcourse.

[..]

Ok.

I don't think it's currently possible to derive SOA and TTL values
from file timestamps, at least not now.

What I want is something like:

  new directive `$TIMESTAMP when-created when-expires' (when-expires
  may be a relative time, like +3d).

  a command-line option to set 'local time delta' (see below).
  Set it to, say, 10m or 1h by default.

  when loading a data file, process every $TIMESTAMP line and:

   o compare when-created with local time - when-created should not
     be greather than now.  If it is, warn or better abort loading.

   o compare when-expires with current time - if when-expires is less
     than current time, declare the file 'expired'.

   o when doing the above comparisions, take 'local time delta' into
     account, and allow `now' to be 'a bit' (to the 'local time delta')
     less than when-created, or 'a bit' greather than when-expires.

   o (probably) compare file timestamp with when-created, and require
     the two to be the same (+/- local time delta again).  Abort loading
     if not.

  As usual, if there was an error loading any part of a dataset, all zones
  based on it will be non-functional (returning SERVFAIL).

  The $TIMESTAMP directive works per-datafile, not per-dataset or per-zone.
  This is the main reason we can't use real $SOA stuff.

  Each zone has when-expires value computed as a minimum of all when-expires
  values from each datafile of each dataset the zone is based on.  On every
  reload-check cycle, current time is compared with that when-expires timestamp
  and if (again, with `local time delta' taken into account) the zone is expired,
  it will be marked as such and will stop processing queries, returning SERVFAIL
  (with proper logging ofcourse).  On next reload (if we detect a file has changed),
  per-zone timestamps are recalculated and all 'expired' zones which are ok now
  are released.

  If datafilecheck is disabled (-c0), no zone expiration takes place.

  Timestamps are specified as unix_time, or as yyyy-mm-dd-hh-mi-ss UTC
  (or GMT?  I mean, to use gmtime() and friends), with ':' accepted in
  place of '-', and [-hh-mi-ss] part is optional.  When-expires may be
  relative, starting with a plus sign (+), with optional unit specifier
  like +2d, +10h etc.

  It's probably a good thing to omit seconds altugether here.

Or something like that anyway.

Any objections to this?

Also, how about comparing file timestamp with $TIMESTAMP's when-created value --
does it look ok?

Thanks.

/mjt


More information about the rbldnsd mailing list