Re: WAS: tomcat 5.0.16 Replication

jean-philippe . belanger Mon, 12 Jan 2004 13:00:29 -0800

I understand that. Here more info...

I make a servlet request that does this:
while ( i < 25 ) {
   if ( request.getParameters("xxx") == null ) {
      System.out.println("param is null");
   }
   Thread.currentThread().sleep(1000);
   i++
}

I then send a couple of request with param xxx=123 at 1 secs interval. (those request are all made to the same server ie: web1) Once a couple of them are sent I shutdown web2. As soon as the "INFO: Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://10.128.29.66:4001,10.128.29.66,4001, alive=6576]" message is received my log gets flooded with param is null


Jean-Philippe Bélanger
CGI

Filip Hanik wrote:

the way the login is done, is that a request is being saved in the session
(in a session note, and that is not replicated).
So for a login, you must hit the same server twice in a row. Otherwise you
will see NULL in your request parameters. Also in this release, the
principal is not being replicated, I am working on that right now

Filip

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Monday, January 12, 2004 11:29 AM
To: Tomcat Users List
Subject: Re: WAS: tomcat 5.0.16 Replication


Been working on testing the new modules and came across something weird.
Wondering if you got any idea on the cause/problem while I continue
investigating

Scenario:
- one web page login in a user. receive 3 parameters (user, password and
community)
- To be able to replicate the problem I had to put a sleep on 25 secs in
code.
- Post one request each second or so and after a couple of them, shudown
one tomcat and restart it. (stop/start sequence)
- A couple of request will start pourring the result, but after some..
when tomcat that got shutdown is restarting, the request parameters
becomes NULL.

As if the replication code was killing my request objects or resetting
my parameters on those requests. Any thought on what it could be?
I even had session mix-up once. when restarting a tomcat a user was
logging in and was assigned a session from another user that never
logged on from his station (that session was idle for more than 10 hours
too).

Just trying to pinpoint where the problem could be. Any pointer would help.

Thanks

Jean-Philippe Bélanger
CGI

Filip Hanik wrote:

Steve and Jean-Philippe,
I've been working on some more replication stuff and made a major change
that I think you might want to use.
I have added a third configuration to the parameter replicationMode,

replicationMode="pooled"

With this setting it still is synchronized replication, but uses a pool of
sockets to replicate the data.
It improves performance a lot. Try it out, and let me know how it works for
you
You will notice the improvement under load.

of course, get latest from cvs first

Filip

-----Original Message-----
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:05 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


Hrmmm, perhaps I should reboot using the non-SMP kernel and try it. I'll
have to do that when I get back to the servers.


-----Original Message-----
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 2:04 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication

uname -a machine #1) Linux draco 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST

2003 i686

i686 i386 GNU/Linux
machine #2) Linux scorpio 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003
i686 i686 i386 GNU/Linux


java -version:
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

same on both


-----Original Message-----
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:56 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


[EMAIL PROTECTED] bin]# uname -a
Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux

[EMAIL PROTECTED] bin]# java -version
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)


-----Original Message-----
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 11:05 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


sun JDK 1.4.2 for Linux
Kernel 2.4.20-8smp
Tomcat 5.0.16 with catalina-cluster.jar from CVS head

Hrmmm....are yours SMP servers? Could be something odd with synch

if that is

the case.


-----Original Message-----
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:01 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication

interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL

what VM (version and name) are you using?

Filip

-----Original Message-----
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:59 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication

Now that's really very strange. I am running RH9 and everything seems to go
through just fine.


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the

symptoms when

it is required?


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:

Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-----Original Message-----
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-----Original Message-----
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out
this and
next week.

Filip

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you

suggested and

I also see the jump in performance.

Something must have changed in ReplicationListener that causes this
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure
it out
when I get back to where I can diff the files.

-Steve

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication

More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) -> return immediatly with one SelectorKey

available

That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:

Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with

-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.

Maybe this can help.
If you need the complete hprof file I can send them to you.

1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
1 11.48% 11.48%      54    85 java.lang.Object.wait
2 11.46% 22.94%     117    86 java.lang.Object.wait
3 10.95% 33.89%    4115   215 java.net.PlainDatagramSocketImpl.receive
4 10.93% 44.81%    4114   224 java.lang.Thread.sleep
5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
6  7.37% 63.09%      28   495 java.lang.Object.wait
7  7.24% 70.34%      10   576 java.lang.Object.wait
8  4.57% 74.90%      90   716 java.lang.Thread.sleep
9  4.48% 79.38%       1   909 java.lang.Object.wait
10  4.48% 83.86%       1   908 java.lang.Object.wait
11  4.48% 88.34%      15   810 java.lang.Object.wait
12  4.47% 92.81%       1   910 java.net.PlainSocketImpl.socketAccept
13  0.71% 93.52%       2   623 java.lang.Object.wait
14  0.56% 94.08%       2   706 java.lang.Object.wait
15  0.38% 94.46%       2   914 java.lang.Object.wait
16  0.24% 94.70%     775   913 java.lang.String.toCharArray
17  0.23% 94.93%       3   475 java.lang.Thread.sleep
18  0.16% 95.09%       2   472 java.lang.Object.wait
19  0.15% 95.24%       2   595 java.lang.Thread.sleep
20  0.15% 95.40%       2   586 java.lang.Thread.sleep
21  0.15% 95.55%       2   703 java.lang.Thread.sleep
22  0.15% 95.70%       2   476 java.lang.Thread.sleep
23  0.15% 95.85%       2   692 java.lang.Thread.sleep
24  0.12% 95.97%  218595   385
java.lang.CharacterDataLatin1.toLowerCase
25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
26  0.11% 96.20%  218595   433
java.lang.CharacterDataLatin1.getProperties
27  0.10% 96.30%  210925   389 java.lang.String.equalsIgnoreCase
28  0.08% 96.38%  157259   387 java.lang.String.charAt
29  0.08% 96.46%       1   646 java.lang.Thread.sleep
30  0.08% 96.53%       1   634 java.lang.Thread.sleep
31  0.08% 96.61%       1   903 java.lang.Thread.sleep
32  0.08% 96.69%       1   714 java.lang.Thread.sleep
33  0.08% 96.76%       1   811 java.lang.Thread.sleep
34  0.08% 96.84%       1   715 java.lang.Thread.sleep

2 hosts:
CPU TIME (ms) BEGIN (total = 37247) Thu Jan  8 11:01:28 2004
rank   self  accum   count trace method
1  9.56%  9.56%      52    85 java.lang.Object.wait
2  9.56% 19.12%      29    86 java.lang.Object.wait
3  9.30% 28.43%       3   267 java.lang.Object.wait
4  9.25% 37.68%    6644   224 java.lang.Thread.sleep
5  9.23% 46.91%   13116   215 java.net.PlainDatagramSocketImpl.receive
6  7.67% 54.58%       3   266 java.lang.Object.wait
7  5.90% 60.47%      39   847 java.lang.Object.wait
8  5.76% 66.24%      12   503 java.lang.Object.wait
9  3.90% 70.14%     145   975 java.lang.Thread.sleep
10  3.90% 74.04%       1  1174 java.lang.Object.wait
11  3.90% 77.94%       1  1173 java.lang.Object.wait
12  3.90% 81.84%      25   973 java.lang.Object.wait
13  3.90% 85.74%       1  1175 java.net.PlainSocketImpl.socketAccept
14  3.88% 89.62%  819692   214 sun.nio.ch.PollArrayWrapper.poll0
15  0.75% 90.37%       2   958 java.lang.Object.wait
16  0.28% 90.65%       2   457 java.lang.Object.wait
17  0.26% 90.91%       2  1181 java.lang.Object.wait

Filip Hanik wrote:

I'll try to get an instance going today. Will let you know how it
goes
also, try asynchronous replication, does it still go to 100%?

Filip

-----Original Message-----
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication

Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:5555
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:5555
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:5555
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:5555

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from
CVS.
Of course the Manager would almost always timeout before it would
recieve
the message.

Now it gets the message right away, but maxes my machine out.


-----Original Message-----
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication

100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-----Original Message-----
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication

I was having random problems with clustering when starting up. Mostly
it had
to do with Timing out
when the manager was starting up. I built the CVS version and it
solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9,
Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast
packets are
restricted to a crossover link between the servers. There are 3 hosts
in the
server.xml, all with clustering set up. They all function just fine.

But.....the cpu's spikes up to 100% if I start up both servers. I
know this
didn't happen without the new catalina-cluster.jar. If I shut down 1
server
(doesn't matter which) everything returns to normal. But when both
are
running both servers are at 100% CPU. I am trying to profile it now,
but I
figured if someone has already experienced this they could save me
some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing
millions
of errors or something.

-Steve Nelson

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Jean-Philippe Bélanger
(514)228-8800 ext 3060
111 Duke
CGI


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Jean-Philippe Bélanger
(514)228-8800 ext 3060
111 Duke
CGI


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: WAS: tomcat 5.0.16 Replication

Reply via email to