[rbldnsd] Memory allocation questions

Thu Mar 31 16:40:43 MSD 2005

Amos Jeffries wrote:
[]
>> When loading the data, rbldnsd can't know how much
>> entries it will need for the large arrays.  So it
>> starts with some number of entries (say, 16), and
>> grows the array (realloc) multiplying current count
>> to 2 each time it needs more entries.  So, for example,
>> for 4M entries it will allocate array of size 4M*8
>> (or *16 for 64bits), and for 4M+1 it will be 8M.
>>
>> When the whole file is read, rbldnsd does final realloc,
>> reclaiming extra memory.
> 
> <snip discussion of present allocation method and fragmentation.>
>>
>> Well..  I thought we may run into this problem sooner or later,
>> and here we go, it seems.
>>
>> I *guess* you aren't running linux (nothing wrong with that
>> really ;) -- because on linux and with default malloc(),
>> things don't work the way as per above.  Instead, for
>> larger objects, malloc() uses mmap(), which in turn uses
>> powers of virtual memory to avoid the above fragmentation
>> entirely.
> 
> I am thinking we have reached a point where it becomes necessary to 
> check the file size before allocating memory. If you know how much data 
> is being loaded you can estimate a close figure to the final memory usage.
> 
> The algorithm could then be: open file, check file length, allocate n 
> bytes, on array exhaustion allocate n+n/2 bytes (or similar).
> slightly tricky extra location of file length, but much reduced memory 
> usage for large zones.

The problem here isn't the fact it allocates space for 8M entries if
4M+1 is needed, but the fragmentation - the place in my email you
[snipped].  That extra memory for 4M-1 entries is never touched, so
on nowadays systems it's not as bad (it is still couted towards
memory limits ofcourse).

In this worst case (4M+1 or any ower power of two and "some more"),
AND with memory fragmentation described by me, it will need 3x more
memory, and 2/3 of which will actually be written to.

Proposed solution helps with 2x memory increase, but not help with
3x one.  Also, it isn't quite trivial to guess real memory needs
(even close to) based on the file size, without actually reading
it -- it might be just list of IP addresses (alot of them), or
a few lines with long TXT templates (which are repeated multiple
times), or may have alot of comments...

What really helps with this 3x increase is different malloc
implementation, as I already mentioned in my email.  I also
mentioned that linux with default malloc() does not suffer
from this problem due to accurate usage of mmap() for large
allocations.  I think there should be similar malloc impl.
for other systems too - maybe it's worth to try it.

Or, alas, rbldnsd may try to use mmap(/dev/zero) directly,
which is even more interesting because rbldnsd will know
which objects will grow to large size and which aren't.
Dunno whenever it's easy to implement that in a portable
way...

/mjt