lots of useless recvfrom() calls

Lennert Buytenhek buytenh at wantstofly.org
Tue May 8 12:12:54 MSK 2018


On Mon, May 07, 2018 at 05:37:34PM +0300, Michael Tokarev wrote:

> > I am using the patch below to avoid one useless recvfrom() call (that
> > returns EAGAIN) for every call to dns_ioevent().  Under low load, where
> > every call to dns_ioevent() processes one DNS reply, this saves half of
> > the total number of recvfrom() calls.
> 
> Is it really a problem under low load? If the load is low, it doesn't
> really matter how many recvfrom() system calls are made, I think.

'low load' here is 'a low expected number of reply packets per POLLIN
cycle', but even for a large number of queries per second and multiple
parallel queries, I would not expect there to be more than one reply
packet per POLLIN cycle very often.

Is it a 'problem' per se, well, probably not, since it's not a
correctness issue.  But it looks ugly in strace, and the difference in
(system) CPU time used is visible in a microbenchmark (kPTI woohoo),
and it's basically just stealing some CPU cycles from more useful tasks.

(Unfortunately, my only benchmark is synthetic, but I've attached it
below.  It needs ivykis, which should be in your distro, and iv_udns
which I CCd you on.  Run it as:

	$ ./bench <parallel_queries> <seconds_to_run_for>

)


> > Since I use level-triggered event notification, it's fine if
> > dns_ioevent() doesn't process all packets pending for this dns_ctx,
> > because if the fd is still active when dns_ioevent() returns, it
> > will be called again at some point.
> 
> Note that here, under high load, there will be several extra system
> calls to check if the new data is pending.

Even under a large query load, I would not expect multiple replies per
poll cycle, unless perhaps you are on a very slow machine, much slower
than your resolver, and doing DNS queries from a single thread is the
only thing your application does.


> > Also, looping until EAGAIN makes your application somewhat susceptible
> > to being DoSed.  (The event polling framework I use spends a lot of
> > effort to make sure that all event sources are handled fairly, and that
> > no one event source can starve out processing of other event sources.)
> 
> This is a good point. Maybe limiting number of packets to process to some
> "reasonable" number, such as 5? :) Still, edge-triggering notification is
> interesting here.
> 
> > If udns should work with edge-triggered event notification frameworks,
> > then perhaps we could add a new function, say, dns_ioevent_single(),
> > that only processes a single packet for this dns_ctx, and then have
> > dns_ioevent() just call the _single() version in a loop until EAGAIN,
> > or something like that?
> 
> A new function sounds good.
> 
> Also, if the number of system calls really is a concern, maybe using
> recvmmsg() instead of recvfrom(), on linux, will be a good idea.

Sounds good!  But it's probably worth microbenchmarking to see how
much slower recvmmsg() is compared to recvfrom() for the case where
there is one reply, I'll do this if you don't beat me to it.


> > diff -up ./udns_resolver.c.orig ./udns_resolver.c
> > --- ./udns_resolver.c.orig	2018-05-04 15:52:58.368856384 +0300
> > +++ ./udns_resolver.c	2018-05-04 16:11:48.159375836 +0300
> > @@ -976,7 +976,6 @@ void dns_ioevent(struct dns_ctx *ctx, ti
> >  
> >    if (!now) now = time(NULL);
> >  
> > -again: /* receive the reply */
> >  
> >    slen = sizeof(sns);
> >    r = recvfrom(ctx->dnsc_udpsock, (void*)pbuf, ctx->dnsc_udpbuf,
> > @@ -1197,6 +1196,8 @@ again: /* receive the reply */
> >    }
> >    goto again;
> >  
> > +again: /* receive the reply */
> > +  dns_request_utm(ctx, now);
> >  }
> 
> So you're effectively cancelling the loop, forcing it to be a single-step.
> 
> Admittedly, error handling here in my version is also "interesting", -
> it actually does not exist at all :)

I think this is fair. :)  Things like ICMP unreachables can probably
just be ignored, and there's not much else you can do here anyway?
Are there fatal errors which would cause infinite error returns for
recvfrom() for a UDP socket?  (For TCP, obviously, but for UDP?)


> How about adding a function, say, udns_handle_reply(ctx, buf, buflen),
> so that all the functionality of receiving data and handling errors will
> belong to the caller?

Sounds good to me, but then will there be a way to signal to udns to
not allocate ctx->dnsc_pbuf?  (Or will it only allocate it on the first
use of a udns function that does recv*() itself?)

(Or just put it on the stack, this is not the kernel. ;-) )


> Also I think it's a good idea to use recvmmsg(), - at the time when
> udns library has been written there were no such syscall, now it is
> widespread already, and is easy to use. Receive, say, 5 packets at
> once, and loop if we received all 5. Again, if it is edge-triggered
> notification, we should absorb everything, so the loop should be
> cycling until there's really nothing to proceed (think about fairness),
> and in case of level-triggering this is not necessary.
> 
> Maybe I can export all 3 of them - current eat-everything function
> (maybe with switching to recvmmsg() to optimize things), a new function
> (or a new setting/flag of a ctx - in which case there will be no need
> for a separate function) that process just one packet, and actual packet
> processing routine without the receiving part.

Sounds good to me!

Thanks!


Cheers,
Lennert


=== bench.c
#include <stdio.h>
#include <stdlib.h>
#include <iv.h>
#include <netinet/in.h>
#include <udns.h>
#include "iv_udns.h"

static struct iv_timer benchmark_timer;
static int stop_bench;
static int completed;

static void done(void *_dummy)
{
	stop_bench = 1;
}

static void submit(void);

static void qfn(struct dns_ctx *ctx, void *result, void *data)
{
	completed++;

	if (result != NULL) {
		free(result);
	} else {
		int status;

		status = dns_status(ctx);
		printf("error %d[%s]\n", status, dns_strerror(status));
	}

	if (!stop_bench)
		submit();
}

static void submit(void)
{
	dns_submit_p(iv_udns_ctx(), "www.bbc.com", DNS_C_IN, DNS_T_A,
		     0, &dns_parse_a4, qfn, NULL);
}

int main(int argc, char *argv[])
{
	int parallel;
	int duration;
	struct timeval start;
	int i;
	struct timeval end;
	long long usec;

	parallel = 16;
	if (argc > 1)
		sscanf(argv[1], "%d", &parallel);
	if (parallel < 0)
		parallel = 0;

	duration = 10;
	if (argc > 2)
		sscanf(argv[2], "%d", &duration);
	if (duration < 1)
		duration = 1;

	iv_init();

	IV_TIMER_INIT(&benchmark_timer);
	iv_validate_now();
	benchmark_timer.expires = iv_now;
	benchmark_timer.expires.tv_sec += duration;
	benchmark_timer.handler = done;
	iv_timer_register(&benchmark_timer);

	gettimeofday(&start, NULL);

	for (i = 0; i < parallel; i++)
		submit();

	iv_main();

	gettimeofday(&end, NULL);

	iv_deinit();

	usec = 1000000 * (end.tv_sec - start.tv_sec) +
		(end.tv_usec - start.tv_usec);

	printf("%d %f\n", parallel, (1000000.0 * completed) / usec);

	return 0;
}


More information about the udns mailing list