Hrmmm, perhaps I should reboot using the non-SMP kernel and try it. I'll have to do that when I get back to the servers.
-----Original Message----- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 2:04 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication uname -a machine #1) Linux draco 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 GNU/Linux machine #2) Linux scorpio 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 GNU/Linux java -version: java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) same on both -----Original Message----- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:56 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication [EMAIL PROTECTED] bin]# uname -a Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] bin]# java -version java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) -----Original Message----- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 11:05 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication sun JDK 1.4.2 for Linux Kernel 2.4.20-8smp Tomcat 5.0.16 with catalina-cluster.jar from CVS head Hrmmm....are yours SMP servers? Could be something odd with synch if that is the case. -----Original Message----- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:01 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL what VM (version and name) are you using? Filip -----Original Message----- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:59 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Now that's really very strange. I am running RH9 and everything seems to go through just fine. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: >I don't seem to need the ld_assume_kernel thing. What are the symptoms when >it is required? > > >-----Original Message----- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Friday, January 09, 2004 12:33 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >Just tried the CVS head and everything works with any CPU going crazy! >only if ld_assume_kernel is set to 2.4 > >One more question for you Filip, is the useDirtyFlag working at all? It >seams like even if it's set to true, the whole session gets replicated >after each request. :( > >Jean-Philippe > >[EMAIL PROTECTED] wrote: > > > >>Hurray for Fillip! :) >> >>I'll get the CVS head for the module today and test this out. >>Happy to see that it got fixed that quickly! >> >>Thanks again and I'll let you know how it goes >> >>Jean-Philippe >> >>Filip Hanik wrote: >> >> >> >>>Jean-Philippe and Steve, >>>I fixed the bug, and tried replication on RH9. Immediately it didn't >>>work. >>>The problem is that when RH9 tries to write the ACK back to the NIO >>>socket, >>>it never reaches the other node. and times out after a long time. >>> >>>I set LD_ASSUME_KERNEL=2.4 and it started to work >>> >>>Filip >>> >>>-----Original Message----- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 6:43 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>ok guys, >>>good news. The 100% cpu is totally my fault. I messed up on that one. >>>I was registering OP_WRITE as an interest >>>this is not good :) >>>checking in the working code in 15 min, some more regression tests >>>Filip >>> >>>-----Original Message----- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 2:54 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>another code change was, that I am now accepting keys for OP_READ and >>>OP_WRITE. before it was only OP_READ, >>>but for synchronous replication I need both. >>> >>>this is good info, I just got RH9 installed. will be trying it out >>>this and >>>next week. >>> >>>Filip >>> >>>-----Original Message----- >>>From: [EMAIL PROTECTED] >>>[mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 11:46 AM >>>To: Tomcat Users List >>>Subject: Re: tomcat 5.0.16 Replication >>> >>> >>>The only changes in the ReplicationListener class is the try catch that >>>was added. >>> >>>the code logic is the same. Weird enough. So it's probably elsewhere >>>that something changed in the state of the SelectionKey. >>> >>>Jean-Philippe Bélanger >>> >>>Steve Nelson wrote: >>> >>> >>> >>> >>> >>>>I was just about to try this actually. I found through googling alot of >>>>people >>>>having problems with select with 1.4 and NIO with Redhat 9. They were >>>>actually >>>>experiencing crashes though. >>>> >>>>To verify your results I just put a Thread.Sleep(1); where you >>>> >>>> >>>> >>>suggested and >>> >>> >>> >>> >>>>I also see the jump in performance. >>>> >>>>Something must have changed in ReplicationListener that causes this >>>>because >>>>the 5.0.16 >>>>version doesn't seem to have the problem. I'll see if I can figure >>>>it out >>>>when I get back to where I can diff the files. >>>> >>>>-Steve >>>> >>>>-----Original Message----- >>>>From: [EMAIL PROTECTED] >>>>[mailto:[EMAIL PROTECTED] >>>>Sent: Thursday, January 08, 2004 12:25 PM >>>>To: Tomcat Users List >>>>Subject: Re: tomcat 5.0.16 Replication >>>> >>>> >>>>More content for you Filip. >>>> >>>>I've checked and followed the code of the listen event in >>>>ReplicationListener.java >>>> >>>>Here's what happening: >>>> >>>>selector.select(timeout) -> return immediatly with one SelectorKey >>>> >>>> >>>> >>>available >>> >>> >>> >>> >>>>That key is not Acceptable and not Readable so it immediatly skip those >>>>IFs and loops back to the beginning. >>>> >>>>I've put traces and this is executed once every millisecond hence the >>>>100% load on the server. >>>>Just to make sure, I've put a Thread.sleep(10) at the end of the loop >>>>and the CPU dropped back to 0% and the replication still worked nicely >>>>but probably a little slower since the wait of 10ms. >>>> >>>>I don't know much about those NIO packages but seams like the >>>>select(timeout) method shouldn't return a SelectorKey of that state. >>>>with any waiting. >>>> >>>>Let me know what you can dig from those. >>>> >>>>Jean-Philippe Bélanger >>>> >>>>[EMAIL PROTECTED] wrote: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>>Hi Filip. >>>>> >>>>>I did some profiling of 40mins of tomcat with and without a 2nd node >>>>>up. here are the results with >>>>> >>>>> >>>>> >-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: > > >>>>>Those number are cpu=times and not samples since the later one freezes >>>>>on my systems. >>>>>So that list shows the time spent in each methods. >>>>> >>>>>Major difference the some call to the sun.nio.ch.PollArrayWrapper >>>>>class. I don't know much about those NIOs packages but 819000 call in >>>>>40 mins is a lot. >>>>>The Socket Interface was called more than twice with 2 hosts than with >>>>>a single one. Which seams normal. >>>>> >>>>>Maybe this can help. >>>>>If you need the complete hprof file I can send them to you. >>>>> >>>>>1 host in cluster: >>>>>CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 >>>>>rank self accum count trace method >>>>>1 11.48% 11.48% 54 85 java.lang.Object.wait >>>>>2 11.46% 22.94% 117 86 java.lang.Object.wait >>>>>3 10.95% 33.89% 4115 215 java.net.PlainDatagramSocketImpl.receive >>>>>4 10.93% 44.81% 4114 224 java.lang.Thread.sleep >>>>>5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 >>>>>6 7.37% 63.09% 28 495 java.lang.Object.wait >>>>>7 7.24% 70.34% 10 576 java.lang.Object.wait >>>>>8 4.57% 74.90% 90 716 java.lang.Thread.sleep >>>>>9 4.48% 79.38% 1 909 java.lang.Object.wait >>>>>10 4.48% 83.86% 1 908 java.lang.Object.wait >>>>>11 4.48% 88.34% 15 810 java.lang.Object.wait >>>>>12 4.47% 92.81% 1 910 java.net.PlainSocketImpl.socketAccept >>>>>13 0.71% 93.52% 2 623 java.lang.Object.wait >>>>>14 0.56% 94.08% 2 706 java.lang.Object.wait >>>>>15 0.38% 94.46% 2 914 java.lang.Object.wait >>>>>16 0.24% 94.70% 775 913 java.lang.String.toCharArray >>>>>17 0.23% 94.93% 3 475 java.lang.Thread.sleep >>>>>18 0.16% 95.09% 2 472 java.lang.Object.wait >>>>>19 0.15% 95.24% 2 595 java.lang.Thread.sleep >>>>>20 0.15% 95.40% 2 586 java.lang.Thread.sleep >>>>>21 0.15% 95.55% 2 703 java.lang.Thread.sleep >>>>>22 0.15% 95.70% 2 476 java.lang.Thread.sleep >>>>>23 0.15% 95.85% 2 692 java.lang.Thread.sleep >>>>>24 0.12% 95.97% 218595 385 >>>>>java.lang.CharacterDataLatin1.toLowerCase >>>>>25 0.12% 96.09% 218595 408 java.lang.Character.toLowerCase >>>>>26 0.11% 96.20% 218595 433 >>>>>java.lang.CharacterDataLatin1.getProperties >>>>>27 0.10% 96.30% 210925 389 java.lang.String.equalsIgnoreCase >>>>>28 0.08% 96.38% 157259 387 java.lang.String.charAt >>>>>29 0.08% 96.46% 1 646 java.lang.Thread.sleep >>>>>30 0.08% 96.53% 1 634 java.lang.Thread.sleep >>>>>31 0.08% 96.61% 1 903 java.lang.Thread.sleep >>>>>32 0.08% 96.69% 1 714 java.lang.Thread.sleep >>>>>33 0.08% 96.76% 1 811 java.lang.Thread.sleep >>>>>34 0.08% 96.84% 1 715 java.lang.Thread.sleep >>>>> >>>>>2 hosts: >>>>>CPU TIME (ms) BEGIN (total = 37247) Thu Jan 8 11:01:28 2004 >>>>>rank self accum count trace method >>>>>1 9.56% 9.56% 52 85 java.lang.Object.wait >>>>>2 9.56% 19.12% 29 86 java.lang.Object.wait >>>>>3 9.30% 28.43% 3 267 java.lang.Object.wait >>>>>4 9.25% 37.68% 6644 224 java.lang.Thread.sleep >>>>>5 9.23% 46.91% 13116 215 java.net.PlainDatagramSocketImpl.receive >>>>>6 7.67% 54.58% 3 266 java.lang.Object.wait >>>>>7 5.90% 60.47% 39 847 java.lang.Object.wait >>>>>8 5.76% 66.24% 12 503 java.lang.Object.wait >>>>>9 3.90% 70.14% 145 975 java.lang.Thread.sleep >>>>>10 3.90% 74.04% 1 1174 java.lang.Object.wait >>>>>11 3.90% 77.94% 1 1173 java.lang.Object.wait >>>>>12 3.90% 81.84% 25 973 java.lang.Object.wait >>>>>13 3.90% 85.74% 1 1175 java.net.PlainSocketImpl.socketAccept >>>>>14 3.88% 89.62% 819692 214 sun.nio.ch.PollArrayWrapper.poll0 >>>>>15 0.75% 90.37% 2 958 java.lang.Object.wait >>>>>16 0.28% 90.65% 2 457 java.lang.Object.wait >>>>>17 0.26% 90.91% 2 1181 java.lang.Object.wait >>>>> >>>>>Filip Hanik wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I'll try to get an instance going today. Will let you know how it >>>>>>goes >>>>>>also, try asynchronous replication, does it still go to 100%? >>>>>> >>>>>>Filip >>>>>> >>>>>>-----Original Message----- >>>>>>From: Steve Nelson [mailto:[EMAIL PROTECTED] >>>>>>Sent: Wednesday, January 07, 2004 12:08 PM >>>>>>To: 'Tomcat Users List' >>>>>>Subject: RE: tomcat 5.0.16 Replication >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>Okay, did that got this >>>>>> >>>>>>BEGIN TO RECEIVE >>>>>>SENT:Default 1 >>>>>>RECEIVED:Default 1 FROM /10.0.0.110:5555 >>>>>>SENT:Default 2 >>>>>>BEGIN TO RECEIVE >>>>>>RECEIVED:Default 2 FROM /10.0.0.110:5555 >>>>>>SENT:Default 3 >>>>>>BEGIN TO RECEIVE >>>>>>RECEIVED:Default 3 FROM /10.0.0.110:5555 >>>>>>SENT:Default 4 >>>>>>BEGIN TO RECEIVE >>>>>>RECEIVED:Default 4 FROM /10.0.0.110:5555 >>>>>> >>>>>>*shrug* >>>>>> >>>>>>BTW It didn't go to 100% CPU ute before I started using the code from >>>>>>CVS. >>>>>>Of course the Manager would almost always timeout before it would >>>>>>recieve >>>>>>the message. >>>>>> >>>>>>Now it gets the message right away, but maxes my machine out. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>-----Original Message----- >>>>>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>>>>Sent: Wednesday, January 07, 2004 1:58 PM >>>>>>To: Tomcat Users List >>>>>>Subject: RE: tomcat 5.0.16 Replication >>>>>> >>>>>> >>>>>>100% cpu can mean that you have a multicast problem, try to run >>>>>> >>>>>>java -cp tomcat-replication.jar MCaster >>>>>> >>>>>>download the jar from http://cvs.apache.org/~fhanik/ >>>>>> >>>>>>Filip >>>>>> >>>>>>-----Original Message----- >>>>>>From: Steve Nelson [mailto:[EMAIL PROTECTED] >>>>>>Sent: Wednesday, January 07, 2004 6:51 AM >>>>>>To: '[EMAIL PROTECTED]' >>>>>>Subject: tomcat 5.0.16 Replication >>>>>> >>>>>> >>>>>> >>>>>>I was having random problems with clustering when starting up. Mostly >>>>>>it had >>>>>>to do with Timing out >>>>>>when the manager was starting up. I built the CVS version and it >>>>>>solved that >>>>>>problem. But it has caused >>>>>>some serious performance problems. >>>>>> >>>>>>First a little background. >>>>>> >>>>>>I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, >>>>>>Tomcat >>>>>>5.0.16 (with catalina-cluster.jar build from cvs) The multicast >>>>>>packets are >>>>>>restricted to a crossover link between the servers. There are 3 hosts >>>>>>in the >>>>>>server.xml, all with clustering set up. They all function just fine. >>>>>> >>>>>>But.....the cpu's spikes up to 100% if I start up both servers. I >>>>>>know this >>>>>>didn't happen without the new catalina-cluster.jar. If I shut down 1 >>>>>>server >>>>>>(doesn't matter which) everything returns to normal. But when both >>>>>>are >>>>>>running both servers are at 100% CPU. I am trying to profile it now, >>>>>>but I >>>>>>figured if someone has already experienced this they could save me >>>>>>some >>>>>>time. >>>>>> >>>>>>Oh, and there isn't anything relevant in my logs. It's not throwing >>>>>>millions >>>>>>of errors or something. >>>>>> >>>>>>-Steve Nelson >>>>>> >>>>>> >>>>>> >>>>>>--------------------------------------------------------------------- >>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>>For additional commands, e-mail: [EMAIL PROTECTED] >>>>>> >>>>>> >>>>>>--------------------------------------------------------------------- >>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>>For additional commands, e-mail: [EMAIL PROTECTED] >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>--------------------------------------------------------------------- >>>>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>For additional commands, e-mail: [EMAIL PROTECTED] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>>-- >>>Jean-Philippe Bélanger >>>(514)228-8800 ext 3060 >>>111 Duke >>>CGI >>> >>> >>>--------------------------------------------------------------------- >>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>>--------------------------------------------------------------------- >>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>>--------------------------------------------------------------------- >>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>>--------------------------------------------------------------------- >>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >>> >>> >>> >> >> > > > > -- Jean-Philippe Bélanger (514)228-8800 ext 3060 111 Duke CGI --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]