Hello!
I think I have something for you to reproduce this issue. I managed to
create a test-servlet which causes the problem (every time) when I call it
on Linux kernel 2.4 with the NIO-connector (and no problem in kernel 2.6 or
with the BIO-connector).

I have also created a VMWare-installation of Centos 3.9 (kernel 2.4.21-50)
where this test-case "works", and I'd be happy to send it to you. It's quite
big (~630 Mb), so I'd rather not publish a link to it in public - can I send
you a private email with a download-link to it?

The problem is triggered when the servlet calls Thread.sleep() in a call
(yes, I know all about calling sleep() in a servlet, but this is for showing
this issue, and it'll probably happen in other situations as well).
When the thread is sleeping, and more calls comes in to the same connection,
the NIO-selector starts looping. The CPU goes back down to normal when ALL
sleeping thread is done.

The servlet-code is below..


How to repeat:
==============
Set up this servlet on a Linux system with kernel 2.4
Don't know if my config makes any difference, but the server.xml I used is
also below.

Make a call to it wherever you've installed it
In my case it's http://192.168.0.134/test/NioBugTest

Hit the link "Call sleepy operation" a few times rapidly (3 times will do),
and the NIO-selector starts spinning.
All calls have to be made from the same browser window - it has something to
do with the Kept-alive-connection I think.
If many simultaneous calls are made to "NioBugTest?do=sleep" from
_different_ browser windows, the problem does not occur!




server.xml
===========
<Server port="8005" shutdown="SHUTDOWN">
  <Service name="Catalina">             
    <Connector port="80"   
               protocol="org.apache.coyote.http11.Http11NioProtocol"
               maxHttpHeaderSize="8192" 
               maxThreads="1000"       
               enableLookups="false"  
               redirectPort="8443"
               acceptCount="100"
               maxPostSize="4194304"
               connectionTimeout="10000"
               timeout="120000"
               disableUploadTimeout="false"
                 pollerThreadCount="1"
                 acceptorThreadCount="1"
                 socket.processorCache="600"
                 socket.keyCache="600"
                 socket.eventCache="600"
                 socket.tcpNoDelay="true"
                 socket.soTimeout="10000"
    />

    <Engine name="Catalina" defaultHost="localhost">
       <Host name="test"
             appBase="/java/nio_bug_test"
             unpackWARs="true"
             deployOnStartup="false"
             backgroundProcessorDelay="5"
             autoDeploy="true">

          <Alias>localhost</Alias>

          <Context path="/test"
                   docBase="/java/nio_bug_test"
                   reloadable="false"
                   cachingAllowed="false"
                   swallowOutput="true" />

          <Context path="/"
                   docBase="/var/www/html"
                   cachingAllowed="true"
                   cacheMaxSize="20480"
                   cacheTTL="30000" />
       </Host>
    </Engine>                           
  </Service>
</Server>





NioBugTest.java
=================

import javax.servlet.http.*;
import java.io.PrintWriter;

final public class NioBugTest extends HttpServlet {
  static int servedThreads=0;

  public void doGet(HttpServletRequest req, HttpServletResponse res) {
    try {
      servedThreads++;

      res.setHeader("Cache-Control", "no-store, no-cache, must-revalidate");
      res.setHeader("Pragma", "no-cache");
      res.setHeader("Expires", "Tue, 01 Jan 1980 01:00:00 GMT");

      PrintWriter toClient = res.getWriter();

      String operation=req.getParameter("do");
      if ((operation==null)||(operation.length()==0)) {
        toClient.println("Test NIO-problem in Linux kernel 2.4 with
Tomcat<br><br>");
        toClient.println("<a href=NioBugTest?do=sleep>Call sleepy
operation</a><br><br>");
        toClient.println("<a href=NioBugTest?do=check>Check number of served
clients</a><br><br>");
      } else if (operation.equals("sleep")) {
        int secs=30;
        try {
          secs=Integer.parseInt(req.getParameter("secs"),10);
        } catch (Exception e) {}
        toClient.println("Going to sleep for "+Integer.toString(secs)+"
seconds...");
        Thread.sleep(1000*secs);
      } else if (operation.equals("check")) {
        toClient.println("Number of served
Threads="+Integer.toString(servedThreads));
      } else {
        toClient.println("unknown operation");
      }

      toClient.close();
    } catch (Exception e) {
      System.err.println("Exception in doGet: "+e.toString());
    } finally {
      servedThreads--;
    }
  }
}





-----Original Message-----
From: Filip Hanik - Dev Lists [mailto:devli...@hanik.com] 
Sent: den 13 januari 2010 18:46
To: Tomcat Users List
Subject: Re: NIO-connector problems (excessive CPU-usage)

On 01/13/2010 10:25 AM, Tobias Lind wrote:
> Ok, so chances are that we will not encounter this issue if we upgrade to
a
> newer kernel (and/or another machine - we are currently also thinking
about
> upgrading the hardware)?
>
not necessarily. The bug was in the JDK and how it uses the kernel
libraries.

> It would be nice to see that others are using the NIO-connector on Linux
in
> a productive environment without problems before we start upgrading...
> Anyone?
>
> Is there anything we can do to assist you in reproducing this issue?
>
In order to test a workaround for this issue, one would need to be able
to reproduce it.
So far, we've had a few bugzilla issues opened, but no one to actually
work it to the way where its reproducible.

> Maybe try any special build with extra logging, etc...
>
We know where the code is looping. The reproduction is not for seeing
the issue, its for testing the workaround.
> It only occurs on our production machine, so we can't experiment too much,
> but as the issue is quite serious, we'd gladly help in any way we can.
>
same as everyone else. I would stick with BIO connector in that case.
> It just jumped to>100% CPU just a minute ago now, so it happens pretty
> frequently on our machine. I wish there was some kind of dump I could make
> and send you to help you :)
>
since you are in a prod environment, I'm sure you wouldn't be too
thrilled with experimental builds. What is happening in the JVM is that
Selector.select() returns instantly, but Selector.selectedKeys() returns
an empty list. And so the loop continues.
> The two top-threads on this "top"-dump are the two
> "http-80-ClientPoller"-threads:
>
> 18:24:06  up 212 days,  6:18,  3 users,  load average: 2.66, 2.81, 2.29
> 647 processes: 641 sleeping, 6 running, 0 zombie, 0 stopped
> CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
>             total   26.6%    0.0%  111.8%   0.0%     6.4%   24.2%   30.4%
>             cpu00   15.4%    0.0%   47.6%   0.0%     4.9%   16.0%   15.8%
>             cpu01   11.3%    0.0%   64.2%   0.0%     1.5%    8.1%   14.6%
> Mem:  4031024k av, 3983960k used,   47064k free,       0k shrd,   35336k
> buff
>                     3080496k actv,  593696k in_d,   70952k in_c
> Swap:       0k av,       0k used,       0k free                 2334824k
> cached
>
>    PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>   6563 root      25   0  802M 802M  9144 R    90.1 20.3  36:28   0 java
>   6562 root      25   0  802M 802M  9144 S    15.4 20.3  28:48   1 java
>   6636 root      15   0  802M 802M  9144 S     2.1 20.3   2:05   0 java
> 29633 mysql     16   0  606M 606M  2352 S     1.5 15.4  14:09   1 mysqld
> 29634 mysql     16   0  606M 606M  2352 S     1.3 15.4  15:26   0 mysqld
>   6855 mysql     16   0  606M 606M  2352 S     0.9 15.4   0:10   1 mysqld
>   6964 mysql     15   0  606M 606M  2352 S     0.9 15.4   0:10   0 mysqld
> 11917 root      15   0  1896 1896   900 R     0.9  0.0   0:01   1 top
>   6546 root      16   0  802M 802M  9144 S     0.7 20.3   1:45   1 java
> 32744 mysql     15   0  606M 606M  2352 S     0.5 15.4   0:55   0 mysqld
>   6547 root      16   0  802M 802M  9144 S     0.5 20.3   1:45   0 java
>   6555 root      15   0  802M 802M  9144 S     0.5 20.3   0:11   0 java
>   6962 mysql     15   0  606M 606M  2352 S     0.5 15.4   0:10   1 mysqld
>   6576 root      16   0  802M 802M  9144 S     0.3 20.3   0:05   1 java
>   6578 root      15   0  802M 802M  9144 S     0.3 20.3   0:06   0 java
>   6751 mysql     15   0  606M 606M  2352 S     0.3 15.4   0:09   1 mysqld
>   6963 mysql     15   0  606M 606M  2352 S     0.3 15.4   0:13   0 mysqld
>   6997 root      15   0  802M 802M  9144 S     0.3 20.3   0:05   0 java
>
>
>
> "http-80-ClientPoller-1" daemon prio=10 tid=0x0833ac00 nid=0x19a3 runnable
> [0x4de5d000]
>     java.lang.Thread.State: RUNNABLE
>       at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
>       at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
>       at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
>       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>       - locked<0x53ae1cb8>  (a sun.nio.ch.Util$1)
>       - locked<0x53ae1ca8>  (a java.util.Collections$UnmodifiableSet)
>       - locked<0x548b0008>  (a sun.nio.ch.PollSelectorImpl)
>       at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>       at
> org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1545)
>       at java.lang.Thread.run(Thread.java:619)
>
> "http-80-ClientPoller-0" daemon prio=10 tid=0x0833a400 nid=0x19a2 runnable
> [0x4deae000]
>     java.lang.Thread.State: RUNNABLE
>       at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
>       at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
>       at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
>       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>       - locked<0x53790010>  (a sun.nio.ch.Util$1)
>       - locked<0x53790000>  (a java.util.Collections$UnmodifiableSet)
>       - locked<0x548b0798>  (a sun.nio.ch.PollSelectorImpl)
>       at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>       at
> org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1545)
>       at java.lang.Thread.run(Thread.java:619)
>
> /Tobias
>
>
> -----Original Message-----
> From: Filip Hanik - Dev Lists [mailto:devli...@hanik.com]
> Sent: den 13 januari 2010 16:13
> To: Tomcat Users List
> Subject: Re: NIO-connector problems (excessive CPU-usage)
>
> yes, the issue is known. However, we have not been able to create a use
> case for it, since I've never been able to reproduce it.
> One of the work arounds would be to close the selector, but that is a
> royal pain, since you'd then have to reregister all keys and you'd end
> up in a synchronization nightmare.
>
> Filip
>
> On 01/13/2010 07:57 AM, Tobias Lind wrote:
>
>> Hi!
>> We've been using Tomcat on Linux for a very long time (and the good old
>> JServe before it), and we recently started testing the NIO-connector
>>
> instead
>
>> of the old blocking one. We are currently running the latest Tomcat
>>
> v6.0.20.
>
>>
>>
>> We have a pretty large website with quite a lot of traffic, and switching
>>
> to
>
>> the NIO-connector gives us a VERY good performance boost! We also got rid
>>
> of
>
>> problems with hanging connections, etc, so it was very promising.
>>
>>
>>
>> But it also gave us new headaches :/
>>
>> We were using IBM's JDK 6.0_7 (the latest), and on the first testing on
>>
> our
>
>> production server, the CPU hit 100% (everything started and the site
>>
> worked
>
>> though).
>>
>> We installed Sun's JDK 1.6.0_17 instead, and the CPU was constantly
>>
> running
>
>> at ~20-30% even when the traffic to the site was quite low. In about 24
>> hours runtime, we also saw one occasion where the CPU went up to 100% and
>> never came down again (while no clients where actually running our
>>
> server),
>
>> and it took a Tomcat-restart to get it "down" to 30% again.
>>
>>
>>
>> I started investigating, and found quite a lot of reports on problem with
>> NIO and the Selector looping out of control.
>>
>> Hera are some links to pages about this problem:
>> http://bugs.sun.com/view_bug.do?bug_id=6403933
>>
>> http://bugs.sun.com/view_bug.do?bug_id=6525190
>>
>> http://forums.sun.com/thread.jspa?threadID=5135128
>>
>> http://issues.apache.org/jira/browse/DIRMINA-678
>>
>>
>>
>>
>>
>> A thread-dump showed that it's very likely to be this problem we are
>>
> seeing.
>
>> These threads are taking much more CPU than expected (although on Sun's
>>
> JDK
>
>> it seems a bit better than on IBM's), and when the system load  jumped to
>> 100%, it was the "http-80-ClientPoller-0" that behaving badly:
>>
>>
>>
>> "http-80-Acceptor-0" daemon prio=10 tid=0x0828d400 nid=0x7308 runnable
>> [0x4df19000]
>>
>>      java.lang.Thread.State: RUNNABLE
>>
>>                                at
>> sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>
>>                                at
>>
>>
>
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>
>>                                - locked<0x547f84c8>   (a
java.lang.Object)
>>
>>                                at
>>
org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:1266)
>>
>>                                at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> "http-80-ClientPoller-1" daemon prio=10 tid=0x0825f400 nid=0x7307
runnable
>> [0x4df6a000]
>>
>>      java.lang.Thread.State: RUNNABLE
>>
>>                                at
sun.nio.ch.PollArrayWrapper.poll0(Native
>> Method)
>>
>>                                at
>> sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
>>
>>                                at
>> sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
>>
>>                                at
>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>
>>                                - locked<0x54941568>   (a
sun.nio.ch.Util$1)
>>
>>                                - locked<0x54941558>   (a
>> java.util.Collections$UnmodifiableSet)
>>
>>                                - locked<0x54941410>   (a
>> sun.nio.ch.PollSelectorImpl)
>>
>>                                at
>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>
>>                                at
>> org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1545)
>>
>>                                at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> "http-80-ClientPoller-0" daemon prio=10 tid=0x0831b400 nid=0x7306
runnable
>> [0x4dfbb000]
>>
>>      java.lang.Thread.State: RUNNABLE
>>
>>                                at
sun.nio.ch.PollArrayWrapper.poll0(Native
>> Method)
>>
>>                                at
>> sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
>>
>>                                at
>> sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
>>
>>                                at
>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>
>>                                - locked<0x54941758>   (a
sun.nio.ch.Util$1)
>>
>>                                - locked<0x54941748>   (a
>> java.util.Collections$UnmodifiableSet)
>>
>>                                - locked<0x54941610>   (a
>> sun.nio.ch.PollSelectorImpl)
>>
>>                                at
>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>
>>                                at
>> org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1545)
>>
>>                                at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I'm sure this issue is well known to the Tomcat community and that it has
>> been discussed before, but I'd just like to know the current status on
the
>> issue.
>>
>>
>>
>> The webpages i referred to above indicates that there are some
workarounds
>> to this problem - are these workarounds implemented in Tomcat?
>>
>> Is there anything we can do to get it running?
>>
>> We'd REALLY like to use this connector as it's performing a lot better.
>>
>> Even though the CPU-load is a lot higher, the clients on the site is
>>
> served
>
>> a lot better it seems.
>>
>> So running with 30-40% CPU could actually be ok for now, but when it also
>> jumps up to 100% and stays there, it's not possible to use...
>>
>>
>>
>> We are running on quite an old Linux system with dual CPUs:
>> Linux www.kamrat.com 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 19 01:53:23 GMT
>>
> 2005
>
>> i686 i686 i386 GNU/Linux
>>
>>
>>
>> The issue seems to depend on the kernel, and there are reports that the
>> issue has been fixed for 2.6+ kernels in later JDKs, while others say
that
>> the issue is still there (it also says that for the 2.4-kernels, the fix
>>
> is
>
>> more complicated, and not yet implemented). I'm also thinking that a
>> slightly higher CPU-load with NIO may be normal on the 2.4-kernel because
>>
> of
>
>> the polling-mechanism, but this seems a bit TOO high I think. And jumping
>>
> to
>
>> 100% is certainly not normal...
>>
>>
>>
>> Does anyone know the status of this issue and how Tomcat is dealing with
>>
> it?
>
>> Is there anything we can do/try?
>>
>> Are anyone of you using the NIO-connector on Tomcat 6.0.20 with the
latest
>> JDK on Linux on a production site?
>>
>> Are you seeing any of these issues?
>>
>> If not, what kernel are you running?
>>
>>
>>
>> We'd like to figure out if upgrading to a newer kernel would help us...
>>
>> But as it's our production machine, we would like to mess with it as
>>
> little
>
>> as possible :)
>>
>>
>>
>> Here are our Connector-config:
>>       <Connector port="80"
>>
>>                  protocol="org.apache.coyote.http11.Http11NioProtocol"
>>
>>                  maxHttpHeaderSize="8192"
>>
>>                  maxThreads="1000"
>>
>>                  enableLookups="false"
>>
>>                  redirectPort="8443"
>>
>>                  acceptCount="100"
>>
>>                  maxPostSize="4194304"
>>
>>                  connectionTimeout="10000"
>>
>>                  timeout="120000"
>>
>>                  disableUploadTimeout="false"
>>
>>                  pollerThreadCount="2"
>>
>>                  acceptorThreadCount="1"
>>
>>                  pollTime="4000"
>>
>>                  processCache="600"
>>
>>                  socket.processorCache="600"
>>
>>                  socket.keyCache="600"
>>
>>                  socket.eventCache="600"
>>
>>                  socket.tcpNoDelay="true"
>>
>>                  socket.soTimeout="10000"
>>
>>       />
>>
>>
>>
>> (I've been fiddling around a bit with the NIO-specific parameters, but I
>> haven't seen any significant change in CPU-load)
>>
>>
>>
>> The startup-options are quite normal:
>> export CATALINA_OPTS="-Dfile.encoding=ISO-8859-1"
>>
>> export JAVA_OPTS="-Xms512m -Xmx1536m -server -Dfile.encoding=ISO-8859-1"
>>
>>
>>
>> There's nothing in the catalina-logs and the site is actually running
>>
> quite
>
>> well despite the high CPU-load.
>>
>>
>>
>> I would be very thankful if anyone has any hints or info that may help
us!
>>
>>
>>
>> Regards,
>>
>> Tobias Lind, Sweden
>>
>>
>>
>> p.s. We also tried the ARP-connector and it also made the CPU running at
>> 100%+
>>
>> I didn't test it very thoroughly though...
>>
>>
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to