[rbldnsd] Re: SIGSEGV in rbldnsd

Mon Oct 17 11:36:55 MSD 2005

[Cc'ing rbldnsd mailinglist.  Short summary of the issue:
Chris encountered a crash in rbldnsd running under freebsd
when it hits per-process memory limit.  As it turned out,
the problem is in final realloc() after data load, when
rbldnsd shrinks the data arrays to the final size which
is smaller than allocated amount.  Freebsd's realloc()
may fail (return NULL) even if newsize is smaller than
oldsize, in realloc(oldptr, newsize) call, -- and this
final realloc is the only place in rbldnsd where I didn't
verify realloc() return value, assuming it will always
be != NULL.  As a result, rbldnsd dereferences NULL
pointer and crashes...  Another point around is that under
memory pressure, routines like syslog() etc, which also
requires some memory to do their work, fails, hence no
diagnostics is available.
]

Chris Gabe wrote:
> 
[]
> Freebsd 4.10.  Note that there are configuration options to the stock  
> freebsd malloc() that force realloc to always re-allocate; the  
> reasoning being to defragment and ultimately reduce total memory  
> usage.  I'm not sure that option is enabled here, but it does say the  
> assumption cannot be made.

Ok I got it, looking at the freebsd malloc.c source.
The thing is - realloc() is equivalent to

   newptr = malloc(newsize);
   mcmcpy(newptr, oldptr);
   free(oldptr);

in "too many" cases, in particular, when newsize differs from
oldsize for more than a page_size, which is true in this case.
So, with freebsd malloc implementation, under memory pressure,
it's not possible to *reclaim* unsued memory by reallocating
large memory chunk(s) to be smaller.  I consider this a bug
in freebsd malloc implementation, but that's entirely different
story...

[]
> One suggestion (maybe rbldnsd already does this), on the original  
> malloc, allocate a bit extra.  This way a 100MB zone won't be wasted  
> forever when a few ip's are added to the zone file and realloc forces  a 
> larger alloc.

Well, this is an interesting and quite old question - which
"algorithm" to use for memory allocations for the large arrays
of "unknown" size.  Standard technique is to double the size
each time we hit the number of entries allocated and new entry
is needed -- this is what's used in rbldnsd.  I also implemented
some "optimisation", -- on next reload of the same dataset,
rbldnsd remembers previous size of the array and allocates this
amount of entries in first go, to reduce number of reallocations.
And you're right, this techinique sometimes requires quite large
amount of memory to be allocated and "wasted" during reloads.

Two scenarios:

"Classical" double-size-each-time, with, say, 4M+1 entries --
rbldnsd will realloc(8M) to store the last entry, and right
after that it'll realloc(4M+1) - so, during reload, it'll
waste memory needed for (4M-1) entries.

And with my "optimisation", the same happens when next reload
will hit 4M+2 entries - rbldnsd will allocate the same 4M+1,
and reallocate that to be (4M+1)*2, with similar realloc right
after loading...

Note this is contiguous memory of quite large size (for 4M
entries on 32bit platform, we need (for ip4set) 4M*8 = 32
megabytes of memory -- in comparision, dsbl.org main data
is 7.3M entries now; for 64bit platform, the memory requiriment
is twice of that).

And I'm not sure how to "beautify" this algorithm.  Number of
realloc()s *must* be "small enouth", because between two calls
to realloc(), we allocate another chunks of memory to store
other things (like txt templates), and the more reallocs we
perform, the more fragmented memory will be, and the less
chances to find large contiguous chunk of memory we will
have.  So using "less aggressive" (compared to doubling
memory size on each realloc - like, say, using constant
increment each time) reallocation technique is as risky
as this aggressive approach.

Another way is to try to guess the final dataset size based
on the size of files in the dataset and current "percent
loaded".  Perhaps this is what I'm after, finally, but this
will require some mods in many places of the code - to
compute and pass that "current percentage" into loading
functions and use it there.

And for some quick fix, before I'll implement the above,
I'll add some "rounding" to the "next allocation hint",
so that instead of using 4M+1 next time, it'll use some
larger value like 4M+(something more)...  Or to remember
the last power of two used and start from there, not from
really required amount of entries - which will be even
simpler.

Anyone have other ideas? ;-P

Note this whole problem with "difficult" allocations only
happens on freebsd so far... ;)

>> instead of:
>> #define SHRINK_ARRAY(type, arr, needed, allocated)      \
>>   if ((allocated) > (needed)) {                         \
>>      (arr) = trealloc(type, (arr), (needed));           \
>>      (allocated) = (needed);                            \
>>   }
>>
>> try something like:
>> #define SHRINK_ARRAY(type, arr, needed, allocated)      \
>>   if ((allocated) > (needed)) {                         \
>>      type *a = trealloc(type, (arr), (needed));         \
>>      if (!a) { \
>>        free(arr); \
>>        (arr) = NULL; \
>>        (needed) = (allocated) = 0; \
>>        oom(); \
>>      } \
>>      else {
>>        (allocated) = (needed);                          \
>>        (arr) = a; \
>>      } \
>>   }

And this macro definitely needs to be fixed too, with
an extra check, to be something like

#define SHRINK_ARRAY(type, arr, needed, allocated)      \
   if ((allocated) > (needed)) {                         \
      type *new = trealloc(type, (arr), (needed));       \
      if (new) {                                         \
        (arr) = trealloc(type, (arr), (needed));         \
        (allocated) = (needed);                          \
      }                                                  \
   }

[]
The following is for the mailinglist readers, to show where
the original problem came from...  Keep reading!.. ;-P

>>> somewhere else, it didn't load or connect up to satisfy that   
>>> assumption.  The full invocation is
>>> rbldnsd -p /var/run/rbldnsd.pid -r /var/config/dnsbl -w rbldnsd -b
>>> 127.0.0.1 -b 213.52.227.83 -c 1m -s stats
>>> dnsbl3.dnsbl.borderware.com:ip4set:spam.dnsbl.sorbs.net,smtp.dnsbl.sorbs.net,safe.dnsbl.sorbs.net,problems.dnsbl.sorbs.net
>>> dnsbl1.dnsbl.borderware.com:ip4set:sbl,xbl
>>> dnsbl2.dnsbl.borderware.com:ip4set:dnsbl.njabl.org.auto
>>> dul.dnsbl.borderware.com:ip4set:dul.dnsbl.sorbs.net,dulx
>>> dnsbl12.dnsbl.borderware.com:ip4set:sbl,xbl,dnsbl.njabl.org.auto
>>> dnsbl123.dnsbl.borderware.com:ip4set:sbl,xbl,dnsbl.njabl.org.auto,spam.dnsbl.sorbs.net,smtp.dnsbl.sorbs.net,safe.dnsbl.sorbs.net,problems.dnsbl.sorbs.net
>>
>> Ohhhhhhhhhh...
>>
>> Don't do that.  You're wasting *alot* of memory
>> for no good reason.
>>
>> Instead of:
>>
>>  bl1:ip4set:l1,l2
>>  bl2:ip4set:l2,l3
>>  ...
>>
>> use:
>>
>>  bl1:ip4set:l1 bl1:ip4set:l2
>>  bl2:ip4set:l2 bl2:ip4set:l3
>>  ...
> 
> DOH!  RTFM!  I noticed the distinction in the man page, but thought I  
> had read the opposite sense; that seemed to be a logical assumption  to 
> me, that combining them in one ip4set *saved* on space.  Silly.
> 
> So, I guess the boost to 1GB memory limit isn't going to be  required.  
> Thanks.
> 
>> *if* semantics(*) of this variant is ok with you.  This way,
>> each of the l1, l2, l3 etc will be loaded only once.
>>
>> (*) the difference are "excluded entries", if any.  Well, it's
>> possible to add "global" exclusions too, specifying additional
>> file for every dataset, like
>>
>>  bl1:ip4set:ex,l1 bl1:ip4set:ex,l2
>>  bl2:ip4set:ex,l2 bl2:ip4set:ex,l3
>>  ...
>>
>> (where the file 'ex' specifies your exclusions and maybe your
>> local metadata like $NS and $SOA entries).
> 
> I don't use exclusions, though the thought had occurred, and it may  be 
> required yet.  Thanks for that pointer.  If we need it, "ex" will  
> probably be small so that won't be an issue.
> 
>> For huge lists, it's better to specify separate dataset per
>> "data source", instead of combining all sources in one large
>> dataset -- even if you want to get only one zone.  This way,
>> it's more maintainable, and requires less resources to (re)load
>> data.  YMMV ofcourse ;)
> 

Ok.  Note please that, no matter how we will try to optimize
memory allocation strategy in rbldnsd, if we don't have enouth
memory, it will not work.  With your datasets, at least with
the way you combined all the data the first time, you need big
amounts of memory.

I'll release new version is a few days, probably trying out
some patches first.  Expect a followup to this email shortly... ;)

And oh, Chris, thank you very much for your excellent bugreports,
with all the information necessary to diagnose the problem.  *Much*
apprecated, -- I wish all bug/problem reports will always be like
yours... ;)

/mjt