We are seeing a very intermittent problem where a request hangs in the server 
for exact multiples of 30 secs ( we have seen hangs of 30 – 180 secs). This 
usually involves 2 threads on the same pid hanging. It is in the server, or 
maybe between the server and mod_jk, but it is happening in the server tier. At 
first we suspected connection pool/connection re-use in mod_jk so we eventually 
disabled both. Connections are now created and dismantled for every request. We 
can see the packets coming in at time t and then the request from apache being 
forwarded to the app server tier at t+n*30. We see no issues with sockets 
establishing connections, indeed we see the hang before the initial socket SYN 
is sent. So we now believe that it is not a network issue

This is not a busy server, and so unlikely related to load. vmstat and sar both 
show nothing going on on the system when this hang occurs. Garbage collection 
is also not an issue.

With apache logging on level trace8 for the tid with the problem, we see (*** 
marks removed text) the clear delay between request/response.

[Wed Sep 25 10:51:00.188705 2013] [http:trace4] [pid 28203:tid 140532468623104] 
http_request.c(305): [client ***:43707]   Connection: keep-alive, referer: ****
[Wed Sep 25 10:51:00.188712 2013] [http:trace4] [pid 28203:tid 140532468623104] 
http_request.c(305): [client ***:43707]   If-Modified-Since: Wed, 11 Sep 2013 
18:34:32 GMT, ****
[Wed Sep 25 10:51:00.188720 2013] [http:trace4] [pid 28203:tid 140532468623104] 
http_request.c(305): [client ***:43707]   If-None-Match: 
W/\\"2610-1378924472000\\", referer: ****
[Wed Sep 25 10:51:00.188727 2013] [http:trace4] [pid 28203:tid 140532468623104] 
http_request.c(305): [client ***:43707]   Cache-Control: max-age=0, referer: ***
[Wed Sep 25 10:51:00.188734 2013] [http:trace4] [pid 28203:tid 140532468623104] 
http_request.c(305): [client ***:43707]   X-Forwarded-For: ****
[Wed Sep 25 10:51:00.188767 2013] [rewrite:trace2] [pid 28203:tid 
140532468623104] mod_rewrite.c(468): [client ***:43707] 206.108.42.133 - -**** 
init rewrite engine with requested uri ***
[Wed Sep 25 10:51:00.188784 2013] [rewrite:trace1] [pid 28203:tid 
140532468623104] mod_rewrite.c(468): [client ***:43707] 206.108.42.133 - - 
[****][rid#7fd030072280/initial] pass through ***, referer: ***
[Wed Sep 25 10:51:00.188812 2013] [authz_core:debug] [pid 28203:tid 
140532468623104] mod_authz_core.c(828): [client ***:43707] AH01628: 
authorization result: granted (no directives), referer: ****
[Wed Sep 25 10:51:00.188821 2013] [core:trace3] [pid 28203:tid 140532468623104] 
request.c(227): [client ***:43707] request authorized without authentication by 
access_checker_ex hook: ***
[Wed Sep 25 10:51:00.188841 2013] [charset_lite:trace3] [pid 28203:tid 
140532468623104] mod_charset_lite.c(211): [client***:43707] uri: *** file: *** 
method: 0 imt: text/css flags:  (null)->(null), referer: ***
[Wed Sep 25 10:51:00.188852 2013] [charset_lite:debug] [pid 28203:tid 
140532468623104] mod_charset_lite.c(219): [client ***:43707] AH01448: 
incomplete configuration: src unspecified, dst unspecified, referer: ***
[Wed Sep 25 10:52:35.693776 2013] [core:trace6] [pid 28172:tid 140532468623104] 
core_filters.c(525): [client ***:10778] core_output_filter: flushing because of 
FLUSH bucket
[Wed Sep 25 10:53:30.202645 2013] [deflate:trace1] [pid 28203:tid 
140532468623104] mod_deflate.c(490): [client ***:43707] Not compressing very 
small response of 0 bytes, referer: 
*<https://connect.intuit.com/view/3e52c5bc-5f0d-46c7-89dc-2799ecd7266e?cta=viewinvoicenow>**
[Wed Sep 25 10:53:30.202686 2013] [http:trace3] [pid 28203:tid 140532468623104] 
http_filters.c(963): [client ***:43707] Response sent with status 304, 
headers:, referer: ***


mod_jk logging at debug: We see evidence of the hang there but no smoking gun. 
Again just a delay of some multiple of 30 second.

Obviously the multiples of 30 seconds should be a clue (feels like a timeout?) 
but have not yet been able to tie it out with anything.

We are running on VMWare – apache server 2.4.4 (yes, I know we are going to 
upgrade to the latest….)

My question is whether anyone else has seen this issue before?


Reply via email to