Hi,

        I asked about qmail-remote processes hanging in read() on this list
a few days ago. It appears that this has been reported before, but no
conclusion seemed to have been made.

        The problem appears to be in timeoutread() which uses select() to
prevent read() from blocking. For whatever reason, during heavy load, this
fails and the read() call blocks. The TCP connection stays in the
established state and therefore the process never terminates, leading to a
reduction in the number of available concurrent remote deliveries.

        One suggestion (from MarkD) was to set a large-value alarm signal to
terminate the process, which would work (qmail would see the qmail-remote
process as crashed and try it again) but I don't particularly like this
method. Potentially you could cut a large message sent over a slow
connection off for one thing.

        Another solution, which I have been trying over the last few days is
to turn on socket keep alives. This has the effect of closing the socket if
no data has been sent over it for a fixed period (usually 2 or 3 hours.) The
read() call will end as if the remote host dropped the connection and
qmail-remote will terminate normally.

        It all seems to be working, so if anyone else is having the same
problem, you may like to try this fix too. I've included a patch for
qmail-remote.c - it's not exactly beautiful code, but it works for me.

        Good luck,

                Richard

*** qmail-1.03/qmail-remote.c   Mon Jun 15 11:53:16 1998
--- qmail-1.03.patched/qmail-remote.c   Fri Aug  3 14:34:27 2001
***************
*** 338,344 ****
    int flagallaliases;
    int flagalias;
    char *relayhost;
!  
    sig_pipeignore();
    if (argc < 4) perm_usage();
    if (chdir(auto_qmail) == -1) temp_chdir();
--- 338,345 ----
    int flagallaliases;
    int flagalias;
    char *relayhost;
!   int s_opt;
! 
    sig_pipeignore();
    if (argc < 4) perm_usage();
    if (chdir(auto_qmail) == -1) temp_chdir();
***************
*** 415,420 ****
--- 416,423 ----
      if (smtpfd == -1) temp_oserr();
   
      if (timeoutconn(smtpfd,&ip.ix[i].ip,(unsigned int)
port,timeoutconnect) == 0) {
+       s_opt=1;
+       setsockopt(smtpfd,SOL_SOCKET,SO_KEEPALIVE,&s_opt,sizeof(int));
        tcpto_err(&ip.ix[i].ip,0);
        partner = ip.ix[i].ip;
        smtp(); /* does not return */

Reply via email to