Hi Christophe,

if in future you get more stucks, try to increase our open files to 4096,
using follow commands:

ulimit -Hn 10240
ulimit -Sn 4096

to persist on reboot edit your /etc/security/limits.conf and add follow
lines:

*       soft    nofile          4096
*       hard    nofile          10240

i have changed my /etc/sysctl.conf with follow lines too, maybe one of that
can help you if you have problems in the future:
# tuneTCP
net.ipv4.tcp_window_scaling=0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_mem=786432  1048576 1572864
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_max_syn_backlog=4096
net.core.wmem_max=8388608
net.core.rmem_max=8388608
net.ipv4.tcp_rmem=4096 87380 8388608
net.ipv4.tcp_wmem=4096 87380 8388608

i followed some tips from IBM RedPaper - Linux Performance and Tuning
Guidelines, http://www.redbooks.ibm.com/abstracts/REDP4285.html

best regards

Clóvis
On Mon, Jul 7, 2008 at 6:02 AM, Christophe Fondacci <
[EMAIL PROTECTED]> wrote:

> Hi Clovis,
>
> Thanks for your answers.
>
> Open files on our production servers are 1024.
> Here is the complete output of ulimit -a :
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> max nice                        (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 36352
> max locked memory       (kbytes, -l) 32
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> max rt priority                 (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 36352
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> Our production servers are connected with gigabit ethernet. However the
> servers used to reproduce the problems are only 100MBPS ethernet. The
> problem occurs in both test and production environment.
>
> Our TCP keep alive settings are :
> tcp_keepalive_time is 7200.
> tcp_keepalive_intvl is 75
> tcp_keepalive_probes is 9
>
> I am monitoring threads call stack using jprofiler which displays the stack
> from my initial mail.
>
> I've tried the suggestion from Filip Hanik (maxKeepAliveRequests="1" in my
> tomcat connector). But I was still able to reproduce the problem.
>
> Then I switched the connector to the NIO connector (I was previously using
> the HTTP/1.1 default connector). I was not able to reproduce the problem
> after hours of test (it usually happens after 10-20 minutes of heavy load).
> Pushed the configuration change to one of our 4 productions servers to
> monitor the efficiency.
>
> So far we didn't have any problem on the server with NIO connector after 5
> production days...
>
> Christophe.
>
>
>
>
> ----- Original Message ----- From: "Clovis Wichoski" <
> [EMAIL PROTECTED]>
> To: "Tomcat Users List" <users@tomcat.apache.org>
> Sent: Friday, July 04, 2008 4:17 AM
> Subject: Re: Tomcat bottleneck on InternalInputBuffer.parseRequestLine
>
>
> hi, Christophe,
>
> well, i still dont find the reason about what is my problem, but some
> things
> that help me to avoid the problem to occurs frequently,
>
> i checked the limit for open files on linux, you can check yous with ulimit
> -a, here i set to 4096,
>
> how the machines are connected? its with gigabit ethernet?
>
> please show us all your configuration on /proc/sys/net/ipv4/ the most
> important is:
>
> cat /proc/sys/net/ipv4/tcp_keepalive_time
>
> but the what more impact on the performance was the right configuration of
> JDBC driver on the pool, i use MaxDB and the driver have a problem, that
> when getting new physical connections, the drive have a singleton pattern,
> that we cant get connections in parallel (really parallel, with multi-core
> processors) and when this parallel attempts to get connections occurs, we
> get stuck threads, but note, maybe the problem isnt with driver, its just a
> suspect, since i dont have a solution for this.
> another suspect is that, for some strange reason the socket on java exists,
> but the socket on linux (inode) dont exists anymore, and until java know
> this, the system stucks, until timeout, but i cant figure or simulate this,
> since its really a rare case, and for me just occurs only one time, and i
> dont have a way to prove that, i'm try to check my problem using the follow
> script:
>
> #!/bin/bash
> today=`date +%Y%m%d%H%M%S`
> psId=`/opt/java/jdk1.6.0_06/bin/jps | grep Bootstrap |  cut -d' ' -f1`
> /opt/java/jdk1.6.0_06/bin/jstack -l $psId >
> /mnt/logs/stack/stack${today}.txt
> echo "--- pstack ---" >> /mnt/logs/stack/stack${today}.txt
> pstack $psId >> /mnt/logs/stack/stack${today}.txt
> echo "--- lsof ---" >> /mnt/logs/stack/stack${today}.txt
> lsof >> /mnt/logs/stack/stack${today}.txt
> echo "--- ls -l /proc/${psId}/fd/ ---" >> /mnt/logs/stack/stack${today}.txt
> ls -l /proc/${psId}/fd/ >> /mnt/logs/stack/stack${today}.txt
> echo "stack do processo $psId gravado em /mnt/logs/stack/stack${today}.txt"
>
> when users reports a stuck, i run this script manually, ten times, then
> compare outputs, IBM have a tool that you can use to read better the jstack
> output, i dont remember the name right now, but tomorrow i will post the
> link here.
>
> lets see if we can share knowledge to win this fight, ;)
>
> regards
>
> Clóvis
>
> On Tue, Jul 1, 2008 at 12:23 PM, Christophe Fondacci <
> [EMAIL PROTECTED]> wrote:
>
>  Hello all,
>>
>> We have a problem with tomcat on our production server.
>> This problem may be related to the one listed here :
>> http://grokbase.com/profile/id:hNxqA0ZEdnD-6GYFRNs-iIkKEvF907FNWdczKYQ719Q
>>
>> Here it is :
>> - We got 2 tomcat servers on 2 distinct machines.
>> - 1 server is our application (let's call it A for Application server)
>> - The other server is hosting solr (let's call it S for Solr server)
>> - All servers are Tomcat 6.0.14 running on jdk 1.6.0_02-b05 on Linux
>> (Fedora core 6)
>> - Server A performs http requests to server S by 2 ways :
>>   > A http get (using apache commons HttpClient) with URL like
>>
>> http://S:8080/solr/select/?q=cityuri%3AXEABDBFDDACCXmaidenheadXEABDBFDDACCX&facet=true&facet.field=price&fl=id&facet.sort=false&facet.mincount=1&facet.limit=-1
>> <
>> http://s:8080/solr/select/?q=cityuri%3AXEABDBFDDACCXmaidenheadXEABDBFDDACCX&facet=true&facet.field=price&fl=id&facet.sort=false&facet.mincount=1&facet.limit=-1
>> >
>>   > A http post with URL like : http://S:8080/solr/select/<
>> http://s:8080/solr/select/>with a set of 12 NameValuePair
>>
>>
>> When traffic is light on our server A, everything works great.
>> When traffic is high on our server A (simultation of 40 simultaneous users
>> with Jmeter), some requests to our server S take more than 200 seconds. It
>> happens randomly and we couldn't isolate an URL-pattern: an URL can return
>> in less than 500ms and the exact same URL can take 300s before
>> returning...
>>
>> We performed deep jvm analysis (using jprofiler) to observe what was going
>> on on the Solr-server. When the problem occurs, we can see threads which
>> are
>> stucked with the following call stack :
>>
>> at java.net.SocketInputStream.socketRead0(Native Method)
>> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>> at
>>
>> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:700)
>> at
>>
>> org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366)
>> at
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:805)
>> at
>>
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
>> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>> at java.lang.Thread.run(Thread.java:619)
>>
>> Requests which returns in 200s+ seem to spend almost all their time
>> reading
>> this input stream...
>> The javadoc says parseRequestLine is used to parse the http header. As I
>> stated above, our URL seem quite small so I can't understand why it
>> happens.
>> The response from server S is very small as well.
>>
>> We are able to reproduce the problem with less than 40 threads, but it is
>> more difficult to repoduce.
>> As I said at the beginning, I have found a user which had a similar
>> problem
>> but the mailing list thread does not give any solution...
>>
>> Has anyone an idea of what is going on ? Is there settings we can use to
>> avoid this problems ?
>> I am out of ideas on what to try to fix this...
>>
>> Any help would be highly appreciated...thank you very much.
>> Christophe.
>>
>>
>> ---------------------------------------------------------------------
>> To start a new topic, e-mail: users@tomcat.apache.org
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to