[Avcheck] what if avp hangs

Michael Tokarev mjt@tls.msk.ru
Tue, 28 Aug 2001 13:26:35 +0400


Piotr Klaban wrote:
> 
> Hi,
> 
> Do you have any solution at your sites for the situation,
> when AVP (or drweb etc.) hangs? I have found one file,
> that causes AVP child process exits with SIGBUS.
> This is development avp release and I have contacted the
> right man.

Oh, ma, how many bugs it contains already, and how many
to be written... ;)  Btw, look to avcheck archives,
you'll find some "interesting" avpdaemon's straces and
explanations, together with a link to another "42.zip"
mailbomb caused it to crash almost in the same way as
in your case.

>   But the problem exists - avcheck waits for the reply,
> and is stopped with the message:
> 
> deferred (Command time limit exceeded: "/var/spool/avp/avcheck")

Do you by any chance have soft_bounce=yes set in main.cf?
If yes, then any bounce will be turned into deferral...

> next messages with:
> 
> deferred (temporary failure. Command output: avcheck: \
>  unable to connect to antivirus daemon: No such file or directory )

Wow, that's something new:  i never saw *main* avpdaemon crashed,
only it's childs.

> Resterting avp and issuing 'postfix flush' does not help because
> the problemmatic message is sent again to the avcheck/avp
> and hangs the viruscheck process again.
> 
> I see the following solutions:
> 
> - 2 or more AVP daemons (redundacy), something like
>   several databases in e.g. postfix' mysql.cf file;
> - avp sources would be altered, avcheck would
>   know that something is wrong with AVP checking phase,
>   then mail should be dropped/copied with message,
>   that mail could not be parsed by avp;

Umm...  *both* of this points, yes?  I see no reason to have
only one of them, only both at the same time.  Ok.

In theory, it is relatively simple to "know" that something
is wrong with avdaemon.  A disconnect (that will not happen
in case of avp, thanks to "clever" things it does), some
set of return codes (not any unexpected one, but a limited
set of), and a (reasonable) timeout.  I added timeout "feature"
to avcheck-0.3 just for this purpose -- to detect the situation
you described (before postfix will report "command time limit
exceeded").  Currently, avcheck turns this timeout into
EX_TEMPFAIL (unlike postfix's limit that causes mail to bounce).
Might be this is worse than bounce, I don't know.

About redundrand av daemon -- looking to "architecture"
of two daemons I have here (avp and drweb), this should
be unneeded: main parent daemon only accepts connections
and forks for every request, in theory it should not
crash.  After all, there is cron etc exists that can
monitor it's presence.

Concerning mails that can't be processed by antivirus --
well, this one is interesting question.  Again, look
to archives (and on amavis too) -- I posted a list
of problems around this some time ago to both lists.
I think it's time to invent another handler for such
mails that can't be processed for some reason, just
like `infected' called for mails with viruses now.
I can't decide to reject/bounce/drop/whatether such
mails myself.  Note that e.g. drweb (that currently
runs here on our servers) have some other controls
around this issue, like max decompressed file size,
max compression ratio, max archive depth, and internal
timeouts -- and in case of any of that it will return
corresponding error code.  Currently, all those codes
treated as EX_TEMPFAIL by avcheck.  I know this is bad,
but I can't know what other admins will prefer to do
with such mails...

Regards,
 Michael.