[rbldnsd] Memory allocation questions

Michael Tokarev mjt at tls.msk.ru
Thu Mar 17 23:29:57 MSK 2005


Scott Knight wrote:
> Hello,
> 
> I am trying to get a better understanding of how
> rbldnsd allocates memory – both on initial startup and
> during a reload.

The allocation "algorithm" is very simple and isn't
much different for initial load of data and subsequent
reloads.

There are two different "classes" of memory being
allocated for ip4set: one is a "TOC", large arrays
to hold the IP addresses (4 bytes) and pointers to
the resulting A+TXT strings (another 4 bytes on 32bit
platform); and another large "class" is the strings,
many small objects.

When loading the data, rbldnsd can't know how much
entries it will need for the large arrays.  So it
starts with some number of entries (say, 16), and
grows the array (realloc) multiplying current count
to 2 each time it needs more entries.  So, for example,
for 4M entries it will allocate array of size 4M*8
(or *16 for 64bits), and for 4M+1 it will be 8M.

When the whole file is read, rbldnsd does final realloc,
reclaiming extra memory.

When performing reloads, it (after freeing all the
memory it allocated for the data previously) starts
with previous number of records instead of 16 (as in
example above).  If after reload number of entries is
smaller, extra memory will be free()'d.

This is for large objects - arrays.  Strings are allocated
as usual (pretty like doing one malloc() for every new
string - a bit more complicated than that but close).

That's basically it.

The only additional complication which is unique for ip4set
is that it "expands" single entries to multiple entries,
eg, one /32 listing occupes single entry in the /32 array,
but one /31 occupes two /32 entries, and so on up to /25
which is expanded into 128x/32.  And so on, one /24 entry
occupes one array element in /24 array, /23 - two, /17 -
128 in /24.. the same is for /16 and /8.  So, if you have
alot of "bad" entries like /25, /26, or /17, /18 etc,
rbldnsd may use more memory than your original data file.

This is different for ip4trie - if you have large amount of
large netblocks not "byte-aligned" (good example is SORBS
DUHL list or SPEWS data), it may be a good idea to try
ip4trie instead of ip4set.

In ip4trie, each entry is represented as it is, occuping
IPv4 address, IPv4 netmask, two "left" and "right" pointers
and a pointer to resulting A+TXT string - on 32 bit platform
it is 2*4 + 2*4 + 4 = 20 bytes (plus the string ofcourse),
and it isn't "expanded" as in ip4set.  Extra nodes are
still being allocated - the "reloaded" line will tell you
how many.

> Our setup is as follows:
> 
> We use the ip4set format to load two separate zones. 
> One zone is very small (~150K) and the other is very
> large (i.e. > 400MB).  The majority of the entries are
> individual, not consecutive IPs (/32's).

Oh, it is indeed large.

> We also have the “-f” option specified so that rbldnsd
> continues processing requests during a reload.

-f tells rbldnsd to fork a child during reloads, it has
no effect at all on the allocation strategy and per-process
memory limits/usage.  Sure, summary memory usage by two
processes may be up to 2x of one process (when at the
end of reload, both processes have their own copies of
the data in memory - one old and one new), but not for
every process individually.

> The system has 1GB of physical RAM and 1GB of swap.

What OS is it?  If it have mallinfo() routine, you will
see most information about memory usage in syslog
(linux have one; freebsd does not).

> We initially ran into an OS limitation where by a
> single process was limited on the amount of memory it
> could allocate.  This limit was set to 512MB, and a
> 445 MB zone file was running out memory while loading.
> We have gotten past that obstacle by increasing that
> OS limit.
> 
> My questions revolve around the amount of memory
> allocated however.  For example, once that 445MB file
> was loaded completely, the amount of memory being used
> dropped down to ~137MB.  So on initial startup, does
> it allocate much more memory than might be needed and
> then give it back after it’s fully loaded?

That's.. interesting.  I already described how it loads
the data, by reallocating size for arrays in powers of
two, and truncating "extra" memory at the end.  But I
can't describe this large difference "while loading" and
after.  How you know it dropped to 137MB -- how you
measure memory usage?  Maybe the tool you used and
operating system per-process memory limits operates
differently?

Note that on modern systems with modern malloc() implementation,
malloc() uses mmap(/dev/zero) or equivalent for large chunks
of memory.. Maybe your measurement tool just does not take
mmap'ed memory into account?

> Also, how does this differ on a reload?  The example I
> have is that I took an ~800MB file and put in place
> while rbldnsd was already running (with the 445MB file
> in memory).  The reload took place successfully and
> the ~800MB was loaded.  However, I stopped the process
> and restarted it, and the ~800MB file caused rbldnsd
> to run out of memory.

Well.. I can see how it may happen.  It is called "fragmentation".
Suppose we're loading the large data the first time, without
knowing how many entries we will need.  Memory layout is
as follows:

  [...large array ...][string][string]...[string]

Now, at the very end of file, we run out of allocated array
size and have to realloc the "large array"..  and the memory
layout now looks like:

  .. unused memory ...[string]...[string][..2x larger array..]

And at this point, at 4M+1 entry, we reached the end of the
file.  So total memory usage is almost 3x (12M entries) of
the required size.

But in contrast, during reload, rbldnsd frees everything up,
and allocates that "2x larger array" in one go, so we now
don't have that "unused memory" at the beginning.

Well..  I thought we may run into this problem sooner or later,
and here we go, it seems.

I *guess* you aren't running linux (nothing wrong with that
really ;) -- because on linux and with default malloc(),
things don't work the way as per above.  Instead, for
larger objects, malloc() uses mmap(), which in turn uses
powers of virtual memory to avoid the above fragmentation
entirely.

I also noticied on FreeBSD, rbldnsd process grows with time
(I've seen up to twice of the initial size), while I never
noticied that on linux or solaris.

That all to say -- looks like the default malloc() implementation
on your OS isn't optimal for the sort of load rbldnsd shows.
Maybe there's an alternative implementation available too --
something like
  LIBS=-lmalloc ./configure ...
may work.

> So, basically, we are trying to find our “limit” and
> trying to plan the system accordingly, but any insight
> into the allocation process would be most helpful.

/mjt


More information about the rbldnsd mailing list