hi, Christophe,

well, i still dont find the reason about what is my problem, but some things
that help me to avoid the problem to occurs frequently,

i checked the limit for open files on linux, you can check yous with ulimit
-a, here i set to 4096,

how the machines are connected? its with gigabit ethernet?

please show us all your configuration on /proc/sys/net/ipv4/ the most
important is:

cat /proc/sys/net/ipv4/tcp_keepalive_time

but the what more impact on the performance was the right configuration of
JDBC driver on the pool, i use MaxDB and the driver have a problem, that
when getting new physical connections, the drive have a singleton pattern,
that we cant get connections in parallel (really parallel, with multi-core
processors) and when this parallel attempts to get connections occurs, we
get stuck threads, but note, maybe the problem isnt with driver, its just a
suspect, since i dont have a solution for this.
another suspect is that, for some strange reason the socket on java exists,
but the socket on linux (inode) dont exists anymore, and until java know
this, the system stucks, until timeout, but i cant figure or simulate this,
since its really a rare case, and for me just occurs only one time, and i
dont have a way to prove that, i'm try to check my problem using the follow
script:

#!/bin/bash
today=`date +%Y%m%d%H%M%S`
psId=`/opt/java/jdk1.6.0_06/bin/jps | grep Bootstrap |  cut -d' ' -f1`
/opt/java/jdk1.6.0_06/bin/jstack -l $psId >
/mnt/logs/stack/stack${today}.txt
echo "--- pstack ---" >> /mnt/logs/stack/stack${today}.txt
pstack $psId >> /mnt/logs/stack/stack${today}.txt
echo "--- lsof ---" >> /mnt/logs/stack/stack${today}.txt
lsof >> /mnt/logs/stack/stack${today}.txt
echo "--- ls -l /proc/${psId}/fd/ ---" >> /mnt/logs/stack/stack${today}.txt
ls -l /proc/${psId}/fd/ >> /mnt/logs/stack/stack${today}.txt
echo "stack do processo $psId gravado em /mnt/logs/stack/stack${today}.txt"

when users reports a stuck, i run this script manually, ten times, then
compare outputs, IBM have a tool that you can use to read better the jstack
output, i dont remember the name right now, but tomorrow i will post the
link here.

lets see if we can share knowledge to win this fight, ;)

regards

Clóvis

On Tue, Jul 1, 2008 at 12:23 PM, Christophe Fondacci <
[EMAIL PROTECTED]> wrote:

> Hello all,
>
> We have a problem with tomcat on our production server.
> This problem may be related to the one listed here :
> http://grokbase.com/profile/id:hNxqA0ZEdnD-6GYFRNs-iIkKEvF907FNWdczKYQ719Q
>
> Here it is :
> - We got 2 tomcat servers on 2 distinct machines.
> - 1 server is our application (let's call it A for Application server)
> - The other server is hosting solr (let's call it S for Solr server)
> - All servers are Tomcat 6.0.14 running on jdk 1.6.0_02-b05 on Linux
> (Fedora core 6)
> - Server A performs http requests to server S by 2 ways :
>    > A http get (using apache commons HttpClient) with URL like
> http://S:8080/solr/select/?q=cityuri%3AXEABDBFDDACCXmaidenheadXEABDBFDDACCX&facet=true&facet.field=price&fl=id&facet.sort=false&facet.mincount=1&facet.limit=-1<http://s:8080/solr/select/?q=cityuri%3AXEABDBFDDACCXmaidenheadXEABDBFDDACCX&facet=true&facet.field=price&fl=id&facet.sort=false&facet.mincount=1&facet.limit=-1>
>    > A http post with URL like :  
> http://S:8080/solr/select/<http://s:8080/solr/select/>with a set of 12 
> NameValuePair
>
> When traffic is light on our server A, everything works great.
> When traffic is high on our server A (simultation of 40 simultaneous users
> with Jmeter), some requests to our server S take more than 200 seconds. It
> happens randomly and we couldn't isolate an URL-pattern: an URL can return
> in less than 500ms and the exact same URL can take 300s before returning...
>
> We performed deep jvm analysis (using jprofiler) to observe what was going
> on on the Solr-server. When the problem occurs, we can see threads which
> are
> stucked with the following call stack :
>
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at
> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:700)
> at
> org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:805)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:619)
>
> Requests which returns in 200s+ seem to spend almost all their time reading
> this input stream...
> The javadoc says parseRequestLine is used to parse the http header. As I
> stated above, our URL seem quite small so I can't understand why it happens.
> The response from server S is very small as well.
>
> We are able to reproduce the problem with less than 40 threads, but it is
> more difficult to repoduce.
> As I said at the beginning, I have found a user which had a similar problem
> but the mailing list thread does not give any solution...
>
> Has anyone an idea of what is going on ? Is there settings we can use to
> avoid this problems ?
> I am out of ideas on what to try to fix this...
>
> Any help would be highly appreciated...thank you very much.
> Christophe.
>
>
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to