Dear friends, Nadav, my son, sent the enclosed message to the Apache's users mailing list, and drew blank. I resend it here, hoping the top Apache gurus participating in the discussions here may give some insight. We are really puzzled by the described behavior.
Best, Zvi. -- Dr. Zvi Har'El mailto:[EMAIL PROTECTED] Department of Mathematics tel:+972-54-227607 Technion - Israel Institute of Technology fax:+972-4-8324654 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942) Thursday, 25 Shevat 5762, 7 February 2002, 1:39PM ---------- Forwarded message ---------- Date: Sun, 20 Jan 2002 13:28:09 +0200 From: Nadav Har'El <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: unexplained phenomenon: hanging apache processes Resent-Date: Thu, 7 Feb 2002 11:00:23 +0200 Resent-From: [EMAIL PROTECTED] Resent-To: "Zvi Har'El" <[EMAIL PROTECTED]> Recently I've stumbled a puzzling problem when trying to measure Apache's performance using Microsoft's WAS (Web Application Stress Tool) or Radview's WebLOAD. I was wondering if anyone ever noticed this phenomenon, or can suggest an explanation, and any guess on whether this is a bug in Apache (or Linux), or what. The problem is that something in the measurement client, the server OS (I tried Linux 2.2.16, 2.2.19 and 2.4.3), Apache (I tried 1.3.20), or modssl (I tried both without it and with it) - something causes one or more of the httpd processes to "hang", blocking on reading input from the client which will never come. Such a blocking process will remain blocked for 300 seconds (or another period defined by the Apache "Timeout" directive), so over time more and more processes can get hung; On WebLOAD measurements with very high loads, starting apache with 250 processes, we consistently got them all blocked after roughly 10 minutes, at which point Apache's throughput obviously dropped down to zero. But it is even easier to recreate this problem when Apache is limited (with MaxClients) to a smaller number of processes: For example with 9 processes Webload will hang them all in a few minutes, and with 2 processes Microsoft WAS will hang them both (if you configure it to do SSL requests) almost immediately. I tried several experiments to understand what is going on, but so far without being able to fully explain it, so I was hoping maybe someone else noticed this problem and can shed some light on it (or just say "I've seen it too!"). For example, when I measure Apache with one process (http -X) with MS-WAS and SSL requests, and run a sniffer to see what's going on, I see the first request being handled perfectly, but already on the second request the client (MS-WAS) suddenly stops sending the proper SSL protocol in the middle of a session, so the server (Apache) hangs on read. There is no RST or anything else sent by the client indicating that it wanted to close this session... This could have been downplayed as a bug in MS-WAS or the Windows it runs on, if it weren't for the fact that as I said I also see a similar problem with Radview's Webload, and that both of these tools seem (at least as far as I heard) to be rather respected in the industry. So, has anyone else ever noticed such a problem? Can anyone perhaps shed some light on it? Thanks in advance, Nadav. -- Nadav Har'El | Sunday, Jan 20 2002, 7 Shevat 5762 [EMAIL PROTECTED] |----------------------------------------- Phone: +972-53-245868, ICQ 13349191 |Don't be irreplaceable. If you can't be http://nadav.harel.org.il |replaced, you can't be promoted.