<DIV>Michael,</DIV>
<DIV> </DIV>
<DIV>Thank you for the detailed response. The OS we are running is indeed FreeBSD, so that explains some things based on your comments below.</DIV>
<DIV> </DIV>
<DIV>Just for those who may run into it, the parameter we had to change to increase the allowable memory per process was the MAXDSIZ (data segment size). It defaults to 512MB. You can change it in the kernel params or you can add a tunable kernel param to /boot/loader.conf (i.e. kern.maxdsiz="<size in bytes>"). You can verify the limit increased by typing "ulimit -a".</DIV>
<DIV> </DIV>
<DIV>The rest of the information is very helpful as well...</DIV>
<DIV> </DIV>
<DIV>Thank you again,</DIV>
<DIV> </DIV>
<DIV>SK<BR><BR></DIV>
<BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"><BR>Date: Thu, 17 Mar 2005 23:29:57 +0300<BR>From: Michael Tokarev <MJT@TLS.MSK.RU><BR>Subject: Re: [rbldnsd] Memory allocation questions<BR>To: rbldnsd@corpit.ru<BR>Message-ID: <4239E8C5.10106@tls.msk.ru><BR>Content-Type: text/plain; charset=windows-1252; format=flowed<BR><BR>Scott Knight wrote:<BR>> Hello,<BR>> <BR>> I am trying to get a better understanding of how<BR>> rbldnsd allocates memory – both on initial startup and<BR>> during a reload.<BR><BR>The allocation "algorithm" is very simple and isn't<BR>much different for initial load of data and subsequent<BR>reloads.<BR><BR>There are two different "classes" of memory being<BR>allocated for ip4set: one is a "TOC", large arrays<BR>to hold the IP addresses (4 bytes) and pointers to<BR>the resulting A+TXT strings (another 4 bytes on 32bit<BR>platform); and another large "class" is the strings,<BR>many small
objects.<BR><BR>When loading the data, rbldnsd can't know how much<BR>entries it will need for the large arrays. So it<BR>starts with some number of entries (say, 16), and<BR>grows the array (realloc) multiplying current count<BR>to 2 each time it needs more entries. So, for example,<BR>for 4M entries it will allocate array of size 4M*8<BR>(or *16 for 64bits), and for 4M+1 it will be 8M.<BR><BR>When the whole file is read, rbldnsd does final realloc,<BR>reclaiming extra memory.<BR><BR>When performing reloads, it (after freeing all the<BR>memory it allocated for the data previously) starts<BR>with previous number of records instead of 16 (as in<BR>example above). If after reload number of entries is<BR>smaller, extra memory will be free()'d.<BR><BR>This is for large objects - arrays. Strings are allocated<BR>as usual (pretty like doing one malloc() for every new<BR>string - a bit more complicated than that but close).<BR><BR>That's basically it.<BR><BR>The only additional
complication which is unique for ip4set<BR>is that it "expands" single entries to multiple entries,<BR>eg, one /32 listing occupes single entry in the /32 array,<BR>but one /31 occupes two /32 entries, and so on up to /25<BR>which is expanded into 128x/32. And so on, one /24 entry<BR>occupes one array element in /24 array, /23 - two, /17 -<BR>128 in /24.. the same is for /16 and /8. So, if you have<BR>alot of "bad" entries like /25, /26, or /17, /18 etc,<BR>rbldnsd may use more memory than your original data file.<BR><BR>This is different for ip4trie - if you have large amount of<BR>large netblocks not "byte-aligned" (good example is SORBS<BR>DUHL list or SPEWS data), it may be a good idea to try<BR>ip4trie instead of ip4set.<BR><BR>In ip4trie, each entry is represented as it is, occuping<BR>IPv4 address, IPv4 netmask, two "left" and "right" pointers<BR>and a pointer to resulting A+TXT string - on 32 bit platform<BR>it is 2*4 + 2*4 + 4 = 20 bytes (plus the string ofcourse),<BR>and
it isn't "expanded" as in ip4set. Extra nodes are<BR>still being allocated - the "reloaded" line will tell you<BR>how many.<BR><BR>> Our setup is as follows:<BR>> <BR>> We use the ip4set format to load two separate zones. <BR>> One zone is very small (~150K) and the other is very<BR>> large (i.e. > 400MB). The majority of the entries are<BR>> individual, not consecutive IPs (/32's).<BR><BR>Oh, it is indeed large.<BR><BR>> We also have the “-f” option specified so that rbldnsd<BR>> continues processing requests during a reload.<BR><BR>-f tells rbldnsd to fork a child during reloads, it has<BR>no effect at all on the allocation strategy and per-process<BR>memory limits/usage. Sure, summary memory usage by two<BR>processes may be up to 2x of one process (when at the<BR>end of reload, both processes have their own copies of<BR>the data in memory - one old and one new), but not for<BR>every process individually.<BR><BR>> The system has 1GB of physical RAM
and 1GB of swap.<BR><BR>What OS is it? If it have mallinfo() routine, you will<BR>see most information about memory usage in syslog<BR>(linux have one; freebsd does not).<BR><BR>> We initially ran into an OS limitation where by a<BR>> single process was limited on the amount of memory it<BR>> could allocate. This limit was set to 512MB, and a<BR>> 445 MB zone file was running out memory while loading.<BR>> We have gotten past that obstacle by increasing that<BR>> OS limit.<BR>> <BR>> My questions revolve around the amount of memory<BR>> allocated however. For example, once that 445MB file<BR>> was loaded completely, the amount of memory being used<BR>> dropped down to ~137MB. So on initial startup, does<BR>> it allocate much more memory than might be needed and<BR>> then give it back after it’s fully loaded?<BR><BR>That's.. interesting. I already described how it loads<BR>the data, by reallocating size for arrays in powers of<BR>two, and
truncating "extra" memory at the end. But I<BR>can't describe this large difference "while loading" and<BR>after. How you know it dropped to 137MB -- how you<BR>measure memory usage? Maybe the tool you used and<BR>operating system per-process memory limits operates<BR>differently?<BR><BR>Note that on modern systems with modern malloc() implementation,<BR>malloc() uses mmap(/dev/zero) or equivalent for large chunks<BR>of memory.. Maybe your measurement tool just does not take<BR>mmap'ed memory into account?<BR><BR>> Also, how does this differ on a reload? The example I<BR>> have is that I took an ~800MB file and put in place<BR>> while rbldnsd was already running (with the 445MB file<BR>> in memory). The reload took place successfully and<BR>> the ~800MB was loaded. However, I stopped the process<BR>> and restarted it, and the ~800MB file caused rbldnsd<BR>> to run out of memory.<BR><BR>Well.. I can see how it may happen. It is called "fragmentation".<BR>Suppose
we're loading the large data the first time, without<BR>knowing how many entries we will need. Memory layout is<BR>as follows:<BR><BR>[...large array ...][string][string]...[string]<BR><BR>Now, at the very end of file, we run out of allocated array<BR>size and have to realloc the "large array".. and the memory<BR>layout now looks like:<BR><BR>.. unused memory ...[string]...[string][..2x larger array..]<BR><BR>And at this point, at 4M+1 entry, we reached the end of the<BR>file. So total memory usage is almost 3x (12M entries) of<BR>the required size.<BR><BR>But in contrast, during reload, rbldnsd frees everything up,<BR>and allocates that "2x larger array" in one go, so we now<BR>don't have that "unused memory" at the beginning.<BR><BR>Well.. I thought we may run into this problem sooner or later,<BR>and here we go, it seems.<BR><BR>I *guess* you aren't running linux (nothing wrong with that<BR>really ;) -- because on linux and with default malloc(),<BR>things don't work the way as
per above. Instead, for<BR>larger objects, malloc() uses mmap(), which in turn uses<BR>powers of virtual memory to avoid the above fragmentation<BR>entirely.<BR><BR>I also noticied on FreeBSD, rbldnsd process grows with time<BR>(I've seen up to twice of the initial size), while I never<BR>noticied that on linux or solaris.<BR><BR>That all to say -- looks like the default malloc() implementation<BR>on your OS isn't optimal for the sort of load rbldnsd shows.<BR>Maybe there's an alternative implementation available too --<BR>something like<BR>LIBS=-lmalloc ./configure ...<BR>may work.<BR><BR>> So, basically, we are trying to find our “limit” and<BR>> trying to plan the system accordingly, but any insight<BR>> into the allocation process would be most helpful.<BR><BR>/mjt<BR><BR><BR></BLOCKQUOTE><p>
                <hr size=1>Do you Yahoo!?<br>
Yahoo! Small Business - <a href="http://us.rd.yahoo.com/evt=31637/*http://smallbusiness.yahoo.com/resources/">Try our new resources site!</a>