RE: tomcat 5.0.16 Replication

Steve Nelson Fri, 09 Jan 2004 12:01:29 -0800

Now that's really very strange. I am running RH9 and everything seems to go
through just fine.



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

>I don't seem to need the ld_assume_kernel thing. What are the symptoms when
>it is required?
>
>
>-----Original Message-----
>From: [EMAIL PROTECTED]
>[mailto:[EMAIL PROTECTED]
>Sent: Friday, January 09, 2004 12:33 PM
>To: Tomcat Users List
>Subject: Re: tomcat 5.0.16 Replication
>
>
>Just tried the CVS head and everything works with any CPU going crazy!
>only if ld_assume_kernel is set to 2.4
>
>One more question for you Filip, is the useDirtyFlag working at all? It 
>seams like even if it's set to true, the whole session gets replicated 
>after each request. :(
>
>Jean-Philippe
>
>[EMAIL PROTECTED] wrote:
>
>  
>
>>Hurray for Fillip! :)
>>
>>I'll get the CVS head for the module today and test this out.
>>Happy to see that it got fixed that quickly!
>>
>>Thanks again and I'll let you know how it goes
>>
>>Jean-Philippe
>>
>>Filip Hanik wrote:
>>
>>    
>>
>>>Jean-Philippe and Steve,
>>>I fixed the bug, and tried replication on RH9. Immediately it didn't 
>>>work.
>>>The problem is that when RH9 tries to write the ACK back to the NIO 
>>>socket,
>>>it never reaches the other node. and times out after a long time.
>>>
>>>I set LD_ASSUME_KERNEL=2.4 and it started to work
>>>
>>>Filip
>>>
>>>-----Original Message-----
>>>From: Filip Hanik [mailto:[EMAIL PROTECTED]
>>>Sent: Thursday, January 08, 2004 6:43 PM
>>>To: Tomcat Users List
>>>Subject: RE: tomcat 5.0.16 Replication
>>>
>>>
>>>ok guys,
>>>good news. The 100% cpu is totally my fault. I messed up on that one.
>>>I was registering OP_WRITE as an interest
>>>this is not good :)
>>>checking in the working code in 15 min, some more regression tests
>>>Filip
>>>
>>>-----Original Message-----
>>>From: Filip Hanik [mailto:[EMAIL PROTECTED]
>>>Sent: Thursday, January 08, 2004 2:54 PM
>>>To: Tomcat Users List
>>>Subject: RE: tomcat 5.0.16 Replication
>>>
>>>
>>>another code change was, that I am now accepting keys for OP_READ and
>>>OP_WRITE. before it was only OP_READ,
>>>but for synchronous replication I need both.
>>>
>>>this is good info, I just got RH9 installed. will be trying it out 
>>>this and
>>>next week.
>>>
>>>Filip
>>>
>>>-----Original Message-----
>>>From: [EMAIL PROTECTED]
>>>[mailto:[EMAIL PROTECTED]
>>>Sent: Thursday, January 08, 2004 11:46 AM
>>>To: Tomcat Users List
>>>Subject: Re: tomcat 5.0.16 Replication
>>>
>>>
>>>The only changes in the ReplicationListener class is the try catch that
>>>was added.
>>>
>>>the code logic is the same. Weird enough. So it's probably elsewhere
>>>that something changed in the state of the SelectionKey.
>>>
>>>Jean-Philippe Bélanger
>>>
>>>Steve Nelson wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>I was just about to try this actually. I found through googling alot of
>>>>people
>>>>having problems with select with 1.4 and NIO with Redhat 9. They were
>>>>actually
>>>>experiencing crashes though.
>>>>
>>>>To verify your results I just put a Thread.Sleep(1); where you
>>>>  
>>>>        
>>>>
>>>suggested and
>>> 
>>>
>>>      
>>>
>>>>I also see the jump in performance.
>>>>
>>>>Something must have changed in ReplicationListener that causes this 
>>>>because
>>>>the 5.0.16
>>>>version doesn't seem to have the problem. I'll see if I can figure 
>>>>it out
>>>>when I get back to where I can diff the files.
>>>>
>>>>-Steve
>>>>
>>>>-----Original Message-----
>>>>From: [EMAIL PROTECTED]
>>>>[mailto:[EMAIL PROTECTED]
>>>>Sent: Thursday, January 08, 2004 12:25 PM
>>>>To: Tomcat Users List
>>>>Subject: Re: tomcat 5.0.16 Replication
>>>>
>>>>
>>>>More content for you Filip.
>>>>
>>>>I've checked and followed the code of the listen event in
>>>>ReplicationListener.java
>>>>
>>>>Here's what happening:
>>>>
>>>>selector.select(timeout) -> return immediatly with one SelectorKey
>>>>  
>>>>        
>>>>
>>>available
>>> 
>>>
>>>      
>>>
>>>>That key is not Acceptable and not Readable so it immediatly skip those
>>>>IFs and loops back to the beginning.
>>>>
>>>>I've put traces and this is executed once every millisecond hence the
>>>>100% load on the server.
>>>>Just to make sure, I've put a Thread.sleep(10) at the end of the loop
>>>>and the CPU dropped back to 0% and the replication still worked nicely
>>>>but probably a little slower since the wait of 10ms.
>>>>
>>>>I don't know much about those NIO packages but seams like the
>>>>select(timeout) method shouldn't return a SelectorKey of that state.
>>>>with any waiting.
>>>>
>>>>Let me know what you can dig from those.
>>>>
>>>>Jean-Philippe Bélanger
>>>>
>>>>[EMAIL PROTECTED] wrote:
>>>>
>>>>
>>>>
>>>>  
>>>>
>>>>        
>>>>
>>>>>Hi Filip.
>>>>>
>>>>>I did some profiling of 40mins of tomcat with and without a 2nd node
>>>>>up. here are the results with
>>>>>
>>>>>          
>>>>>
>-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: 
>  
>
>>>>>Those number are cpu=times and not samples since the later one freezes
>>>>>on my systems.
>>>>>So that list shows the time spent in each methods.
>>>>>
>>>>>Major difference the some call to the sun.nio.ch.PollArrayWrapper
>>>>>class. I don't know much about those NIOs packages but 819000 call in
>>>>>40 mins is a lot.
>>>>>The Socket Interface was called more than twice with 2 hosts than with
>>>>>a single one. Which seams normal.
>>>>>
>>>>>Maybe this can help.
>>>>>If you need the complete hprof file I can send them to you.
>>>>>
>>>>>1 host in cluster:
>>>>>CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
>>>>>rank   self  accum   count trace method
>>>>>1 11.48% 11.48%      54    85 java.lang.Object.wait
>>>>>2 11.46% 22.94%     117    86 java.lang.Object.wait
>>>>>3 10.95% 33.89%    4115   215 java.net.PlainDatagramSocketImpl.receive
>>>>>4 10.93% 44.81%    4114   224 java.lang.Thread.sleep
>>>>>5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
>>>>>6  7.37% 63.09%      28   495 java.lang.Object.wait
>>>>>7  7.24% 70.34%      10   576 java.lang.Object.wait
>>>>>8  4.57% 74.90%      90   716 java.lang.Thread.sleep
>>>>>9  4.48% 79.38%       1   909 java.lang.Object.wait
>>>>>10  4.48% 83.86%       1   908 java.lang.Object.wait
>>>>>11  4.48% 88.34%      15   810 java.lang.Object.wait
>>>>>12  4.47% 92.81%       1   910 java.net.PlainSocketImpl.socketAccept
>>>>>13  0.71% 93.52%       2   623 java.lang.Object.wait
>>>>>14  0.56% 94.08%       2   706 java.lang.Object.wait
>>>>>15  0.38% 94.46%       2   914 java.lang.Object.wait
>>>>>16  0.24% 94.70%     775   913 java.lang.String.toCharArray
>>>>>17  0.23% 94.93%       3   475 java.lang.Thread.sleep
>>>>>18  0.16% 95.09%       2   472 java.lang.Object.wait
>>>>>19  0.15% 95.24%       2   595 java.lang.Thread.sleep
>>>>>20  0.15% 95.40%       2   586 java.lang.Thread.sleep
>>>>>21  0.15% 95.55%       2   703 java.lang.Thread.sleep
>>>>>22  0.15% 95.70%       2   476 java.lang.Thread.sleep
>>>>>23  0.15% 95.85%       2   692 java.lang.Thread.sleep
>>>>>24  0.12% 95.97%  218595   385 
>>>>>java.lang.CharacterDataLatin1.toLowerCase
>>>>>25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
>>>>>26  0.11% 96.20%  218595   433
>>>>>java.lang.CharacterDataLatin1.getProperties
>>>>>27  0.10% 96.30%  210925   389 java.lang.String.equalsIgnoreCase
>>>>>28  0.08% 96.38%  157259   387 java.lang.String.charAt
>>>>>29  0.08% 96.46%       1   646 java.lang.Thread.sleep
>>>>>30  0.08% 96.53%       1   634 java.lang.Thread.sleep
>>>>>31  0.08% 96.61%       1   903 java.lang.Thread.sleep
>>>>>32  0.08% 96.69%       1   714 java.lang.Thread.sleep
>>>>>33  0.08% 96.76%       1   811 java.lang.Thread.sleep
>>>>>34  0.08% 96.84%       1   715 java.lang.Thread.sleep
>>>>>
>>>>>2 hosts:
>>>>>CPU TIME (ms) BEGIN (total = 37247) Thu Jan  8 11:01:28 2004
>>>>>rank   self  accum   count trace method
>>>>>1  9.56%  9.56%      52    85 java.lang.Object.wait
>>>>>2  9.56% 19.12%      29    86 java.lang.Object.wait
>>>>>3  9.30% 28.43%       3   267 java.lang.Object.wait
>>>>>4  9.25% 37.68%    6644   224 java.lang.Thread.sleep
>>>>>5  9.23% 46.91%   13116   215 java.net.PlainDatagramSocketImpl.receive
>>>>>6  7.67% 54.58%       3   266 java.lang.Object.wait
>>>>>7  5.90% 60.47%      39   847 java.lang.Object.wait
>>>>>8  5.76% 66.24%      12   503 java.lang.Object.wait
>>>>>9  3.90% 70.14%     145   975 java.lang.Thread.sleep
>>>>>10  3.90% 74.04%       1  1174 java.lang.Object.wait
>>>>>11  3.90% 77.94%       1  1173 java.lang.Object.wait
>>>>>12  3.90% 81.84%      25   973 java.lang.Object.wait
>>>>>13  3.90% 85.74%       1  1175 java.net.PlainSocketImpl.socketAccept
>>>>>14  3.88% 89.62%  819692   214 sun.nio.ch.PollArrayWrapper.poll0
>>>>>15  0.75% 90.37%       2   958 java.lang.Object.wait
>>>>>16  0.28% 90.65%       2   457 java.lang.Object.wait
>>>>>17  0.26% 90.91%       2  1181 java.lang.Object.wait
>>>>>
>>>>>Filip Hanik wrote:
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>          
>>>>>
>>>>>>I'll try to get an instance going today. Will let you know how it 
>>>>>>goes
>>>>>>also, try asynchronous replication, does it still go to 100%?
>>>>>>
>>>>>>Filip
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Steve Nelson [mailto:[EMAIL PROTECTED]
>>>>>>Sent: Wednesday, January 07, 2004 12:08 PM
>>>>>>To: 'Tomcat Users List'
>>>>>>Subject: RE: tomcat 5.0.16 Replication
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>Okay, did that got this
>>>>>>
>>>>>>BEGIN TO RECEIVE
>>>>>>SENT:Default 1
>>>>>>RECEIVED:Default 1 FROM /10.0.0.110:5555
>>>>>>SENT:Default 2
>>>>>>BEGIN TO RECEIVE
>>>>>>RECEIVED:Default 2 FROM /10.0.0.110:5555
>>>>>>SENT:Default 3
>>>>>>BEGIN TO RECEIVE
>>>>>>RECEIVED:Default 3 FROM /10.0.0.110:5555
>>>>>>SENT:Default 4
>>>>>>BEGIN TO RECEIVE
>>>>>>RECEIVED:Default 4 FROM /10.0.0.110:5555
>>>>>>
>>>>>>*shrug*
>>>>>>
>>>>>>BTW It didn't go to 100% CPU ute before I started using the code from
>>>>>>CVS.
>>>>>>Of course the Manager would almost always timeout before it would
>>>>>>recieve
>>>>>>the message.
>>>>>>
>>>>>>Now it gets the message right away, but maxes my machine out.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Filip Hanik [mailto:[EMAIL PROTECTED]
>>>>>>Sent: Wednesday, January 07, 2004 1:58 PM
>>>>>>To: Tomcat Users List
>>>>>>Subject: RE: tomcat 5.0.16 Replication
>>>>>>
>>>>>>
>>>>>>100% cpu can mean that you have a multicast problem, try to run
>>>>>>
>>>>>>java -cp tomcat-replication.jar MCaster
>>>>>>
>>>>>>download the jar from http://cvs.apache.org/~fhanik/
>>>>>>
>>>>>>Filip
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Steve Nelson [mailto:[EMAIL PROTECTED]
>>>>>>Sent: Wednesday, January 07, 2004 6:51 AM
>>>>>>To: '[EMAIL PROTECTED]'
>>>>>>Subject: tomcat 5.0.16 Replication
>>>>>>
>>>>>>
>>>>>>
>>>>>>I was having random problems with clustering when starting up. Mostly
>>>>>>it had
>>>>>>to do with Timing out
>>>>>>when the manager was starting up. I built the CVS version and it
>>>>>>solved that
>>>>>>problem. But it has caused
>>>>>>some serious performance problems.
>>>>>>
>>>>>>First a little background.
>>>>>>
>>>>>>I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9,
>>>>>>Tomcat
>>>>>>5.0.16 (with catalina-cluster.jar build from cvs) The multicast
>>>>>>packets are
>>>>>>restricted to a crossover link between the servers. There are 3 hosts
>>>>>>in the
>>>>>>server.xml, all with clustering set up. They all function just fine.
>>>>>>
>>>>>>But.....the cpu's spikes up to 100% if I start up both servers. I
>>>>>>know this
>>>>>>didn't happen without the new catalina-cluster.jar. If I shut down 1
>>>>>>server
>>>>>>(doesn't matter which) everything returns to normal. But when both 
>>>>>>are
>>>>>>running both servers are at 100% CPU. I am trying to profile it now,
>>>>>>but I
>>>>>>figured if someone has already experienced this they could save me 
>>>>>>some
>>>>>>time.
>>>>>>
>>>>>>Oh, and there isn't anything relevant in my logs. It's not throwing
>>>>>>millions
>>>>>>of errors or something.
>>>>>>
>>>>>>-Steve Nelson
>>>>>>
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>>
>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      
>>>>>>            
>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>          
>>>>>
>>>>
>>>>  
>>>>        
>>>>
>>>
>>>-- 
>>>Jean-Philippe Bélanger
>>>(514)228-8800 ext 3060
>>>111 Duke
>>>CGI
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>> 
>>>
>>>      
>>>
>>    
>>
>
>
>  
>


-- 
Jean-Philippe Bélanger
(514)228-8800 ext 3060
111 Duke
CGI


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: tomcat 5.0.16 Replication

Reply via email to