Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't work.
The problem is that when RH9 tries to write the ACK back to the NIO socket,
it never reaches the other node. and times out after a long time.
I set LD_ASSUME_KERNEL=2.4 and it started to work
Filip
-----Original Message-----
From: Filip Hanik [mailto:devlists@(protected)]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip
-----Original Message-----
From: Filip Hanik [mailto:devlists@(protected)]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.
this is good info, I just got RH9 installed. will be trying it out this and
next week.
Filip
-----Original Message-----
From: jean-philippe.belanger@(protected)
[mailto:jean-philippe.belanger@(protected)]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
The only changes in the ReplicationListener class is the try catch that
was added.
the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.
Jean-Philippe B�langer
Steve Nelson wrote:
>I was just about to try this actually. I found through googling alot of
>people
>having problems with select with 1.4 and NIO with Redhat 9. They were
>actually
>experiencing crashes though.
>
>To verify your results I just put a Thread.Sleep(1); where you
suggested and
>I also see the jump in performance.
>
>Something must have changed in ReplicationListener that causes this because
>the 5.0.16
>version doesn't seem to have the problem. I'll see if I can figure it out
>when I get back to where I can diff the files.
>
>-Steve
>
>-----Original Message-----
>From: jean-philippe.belanger@(protected)
>[mailto:jean-philippe.belanger@(protected)]
>Sent: Thursday, January 08, 2004 12:25 PM
>To: Tomcat Users List
>Subject: Re: tomcat 5.0.16 Replication
>
>
>More content for you Filip.
>
>I've checked and followed the code of the listen event in
>ReplicationListener.java
>
>Here's what happening:
>
>selector.select(timeout) -> return immediatly with one SelectorKey
available
>That key is not Acceptable and not Readable so it immediatly skip those
>IFs and loops back to the beginning.
>
>I've put traces and this is executed once every millisecond hence the
>100% load on the server.
>Just to make sure, I've put a Thread.sleep(10) at the end of the loop
>and the CPU dropped back to 0% and the replication still worked nicely
>but probably a little slower since the wait of 10ms.
>
>I don't know much about those NIO packages but seams like the
>select(timeout) method shouldn't return a SelectorKey of that state.
>with any waiting.
>
>Let me know what you can dig from those.
>
>Jean-Philippe B�langer
>
>jean-philippe.belanger@(protected):
>
>
>
>>Hi Filip.
>>
>>I did some profiling of 40mins of tomcat with and without a 2nd node
>>up. here are the results with
>>-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:
>>
>>Those number are cpu=times and not samples since the later one freezes
>>on my systems.
>>So that list shows the time spent in each methods.
>>
>>Major difference the some call to the
sun.nio.ch.PollArrayWrapper>>class. I don't know much about those NIOs packages but 819000 call in
>>40 mins is a lot.
>>The Socket Interface was called more than twice with 2 hosts than with
>>a single one. Which seams normal.
>>
>>Maybe this can help.
>>If you need the complete hprof file I can send them to you.
>>
>>1 host in cluster:
>>CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004
>>rank self accum count trace method
>> 1 11.48% 11.48% 54 85
java.lang.Object.wait>> 2 11.46% 22.94% 117 86
java.lang.Object.wait>> 3 10.95% 33.89% 4115 215
java.net.PlainDatagramSocketImpl.receive>> 4 10.93% 44.81% 4114 224
java.lang.Thread.sleep>> 5 10.91% 55.73% 19005 214
sun.nio.ch.PollArrayWrapper.poll0>> 6 7.37% 63.09% 28 495
java.lang.Object.wait>> 7 7.24% 70.34% 10 576
java.lang.Object.wait>> 8 4.57% 74.90% 90 716
java.lang.Thread.sleep>> 9 4.48% 79.38% 1 909
java.lang.Object.wait>> 10 4.48% 83.86% 1 908
java.lang.Object.wait>> 11 4.48% 88.34% 15 810
java.lang.Object.wait>> 12 4.47% 92.81% 1 910
java.net.PlainSocketImpl.socketAccept>> 13 0.71% 93.52% 2 623
java.lang.Object.wait>> 14 0.56% 94.08% 2 706
java.lang.Object.wait>> 15 0.38% 94.46% 2 914
java.lang.Object.wait>> 16 0.24% 94.70% 775 913
java.lang.String.toCharArray>> 17 0.23% 94.93% 3 475
java.lang.Thread.sleep>> 18 0.16% 95.09% 2 472
java.lang.Object.wait>> 19 0.15% 95.24% 2 595
java.lang.Thread.sleep>> 20 0.15% 95.40% 2 586
java.lang.Thread.sleep>> 21 0.15% 95.55% 2 703
java.lang.Thread.sleep>> 22 0.15% 95.70% 2 476
java.lang.Thread.sleep>> 23 0.15% 95.85% 2 692
java.lang.Thread.sleep>> 24 0.12% 95.97% 218595 385 java.lang.CharacterDataLatin1.toLowerCase
>> 25 0.12% 96.09% 218595 408
java.lang.Character.toLowerCase>> 26 0.11% 96.20% 218595 433
>>java.lang.CharacterDataLatin1.getProperties
>> 27 0.10% 96.30% 210925 389
java.lang.String.equalsIgnoreCase>> 28 0.08% 96.38% 157259 387
java.lang.String.charAt>> 29 0.08% 96.46% 1 646
java.lang.Thread.sleep>> 30 0.08% 96.53% 1 634
java.lang.Thread.sleep>> 31 0.08% 96.61% 1 903
java.lang.Thread.sleep>> 32 0.08% 96.69% 1 714
java.lang.Thread.sleep>> 33 0.08% 96.76% 1 811
java.lang.Thread.sleep>> 34 0.08% 96.84% 1 715
java.lang.Thread.sleep>>
>>2 hosts:
>>CPU TIME (ms) BEGIN (total = 37247) Thu Jan 8 11:01:28 2004
>>rank self accum count trace method
>> 1 9.56% 9.56% 52 85
java.lang.Object.wait>> 2 9.56% 19.12% 29 86
java.lang.Object.wait>> 3 9.30% 28.43% 3 267
java.lang.Object.wait>> 4 9.25% 37.68% 6644 224
java.lang.Thread.sleep>> 5 9.23% 46.91% 13116 215
java.net.PlainDatagramSocketImpl.receive>> 6 7.67% 54.58% 3 266
java.lang.Object.wait>> 7 5.90% 60.47% 39 847
java.lang.Object.wait>> 8 5.76% 66.24% 12 503
java.lang.Object.wait>> 9 3.90% 70.14% 145 975
java.lang.Thread.sleep>> 10 3.90% 74.04% 1 1174
java.lang.Object.wait>> 11 3.90% 77.94% 1 1173
java.lang.Object.wait>> 12 3.90% 81.84% 25 973
java.lang.Object.wait>> 13 3.90% 85.74% 1 1175
java.net.PlainSocketImpl.socketAccept>> 14 3.88% 89.62% 819692 214
sun.nio.ch.PollArrayWrapper.poll0>> 15 0.75% 90.37% 2 958
java.lang.Object.wait>> 16 0.28% 90.65% 2 457
java.lang.Object.wait>> 17 0.26% 90.91% 2 1181
java.lang.Object.wait>>
>>Filip Hanik wrote:
>>
>>
>>
>>>I'll try to get an instance going today. Will let you know how it goes
>>>also, try asynchronous replication, does it still go to 100%?
>>>
>>>Filip
>>>
>>>-----Original Message-----
>>>From: Steve Nelson [mailto:Steve.Nelson@(protected)]
>>>Sent: Wednesday, January 07, 2004 12:08 PM
>>>To: 'Tomcat Users List'
>>>Subject: RE: tomcat 5.0.16 Replication
>>>
>>>
>>>
>>>
>>>Okay, did that got this
>>>
>>>BEGIN TO RECEIVE
>>>SENT:Default 1
>>>RECEIVED:Default 1 FROM /10.0.0.110:5555
>>>SENT:Default 2
>>>BEGIN TO RECEIVE
>>>RECEIVED:Default 2 FROM /10.0.0.110:5555
>>>SENT:Default 3
>>>BEGIN TO RECEIVE
>>>RECEIVED:Default 3 FROM /10.0.0.110:5555
>>>SENT:Default 4
>>>BEGIN TO RECEIVE
>>>RECEIVED:Default 4 FROM /10.0.0.110:5555
>>>
>>>*shrug*
>>>
>>>BTW It didn't go to 100% CPU ute before I started using the code from
>>>CVS.
>>>Of course the Manager would almost always timeout before it would
>>>recieve
>>>the message.
>>>
>>>Now it gets the message right away, but maxes my machine out.
>>>
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Filip Hanik [mailto:devlists@(protected)]
>>>Sent: Wednesday, January 07, 2004 1:58 PM
>>>To: Tomcat Users List
>>>Subject: RE: tomcat 5.0.16 Replication
>>>
>>>
>>>100% cpu can mean that you have a multicast problem, try to run
>>>
>>>java -cp tomcat-replication.jar MCaster
>>>
>>>download the jar from http://cvs.apache.org/~fhanik/
>>>
>>>Filip
>>>
>>>-----Original Message-----
>>>From: Steve Nelson [mailto:Steve.Nelson@(protected)]
>>>Sent: Wednesday, January 07, 2004 6:51 AM
>>>To: 'tomcat-user@(protected)'
>>>Subject: tomcat 5.0.16 Replication
>>>
>>>
>>>
>>>I was having random problems with clustering when starting up. Mostly
>>>it had
>>>to do with Timing out
>>>when the manager was starting up. I built the CVS version and it
>>>solved that
>>>problem. But it has caused
>>>some serious performance problems.
>>>
>>>First a little background.
>>>
>>>I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9,
>>>Tomcat
>>>5.0.16 (with catalina-cluster.jar build from cvs) The multicast
>>>packets are
>>>restricted to a crossover link between the servers. There are 3 hosts
>>>in the
>>>server.xml, all with clustering set up. They all function just fine.
>>>
>>>But.....the cpu's spikes up to 100% if I start up both servers. I
>>>know this
>>>didn't happen without the new catalina-cluster.jar. If I shut down 1
>>>server
>>>(doesn't matter which) everything returns to normal. But when both are
>>>running both servers are at 100% CPU. I am trying to profile it now,
>>>but I
>>>figured if someone has already experienced this they could save me some
>>>time.
>>>
>>>Oh, and there isn't anything relevant in my logs. It's not throwing
>>>millions
>>>of errors or something.
>>>
>>>-Steve Nelson
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@(protected)
>>>For additional commands, e-mail: tomcat-user-help@(protected)
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: tomcat-user-unsubscribe@(protected)
>>>For additional commands, e-mail: tomcat-user-help@(protected)
>>>
>>>
>>>
>>>
>>>
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: tomcat-user-unsubscribe@(protected)
>>For additional commands, e-mail: tomcat-user-help@(protected)
>>
>>
>>
>>
>
>
>
>
--
Jean-Philippe B�langer
(514)228-8800 ext 3060
111 Duke
CGI
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@(protected)
For additional commands, e-mail: tomcat-user-help@(protected)
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@(protected)
For additional commands, e-mail: tomcat-user-help@(protected)
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@(protected)
For additional commands, e-mail: tomcat-user-help@(protected)
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@(protected)
For additional commands, e-mail: tomcat-user-help@(protected)