Java Mailing List Archive

http://www.junlu.com/

Google
Google
Mailing List
Home
Forum Home
JBoss - Java Application Server
Tomcat - JSP/Servlet container
Struts - A MVC web framework
iText - An open source PDF Java Library
JDOM - JDOM XML Parser
J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition
JSP - A mailing list about Java Server Pages specification and reference
J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog
Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology
Struts & Hibernate
Subjects
JSP editor plugin for eclipse ?
org apache jasper JasperException: Unable to compile class for JSP
Tomcat: Connection reset by peer: socket write error
Cannot retrieve definition for form bean null
Struts Tiles Tutorial (free Struts training)
Where do I download Tomcat 4 0 6?
Data Access Object (DAO) pattern, example DAO 's
Where to download Tomcat v 4 1 24 from?
Tomcat 5 0 16 Requested resource not available
Oracle Connection Pooling in 3 2 2
Servlet : Session invalidate
Servlet action is currently unavailable
Tomcat/Struts Unicode Encoding/Decoding problems
Tomcat and webapplication specific java library path
Running a Simple JMS Example
Mapping in workers2 properties
org apache jasper JasperException
Cannot find message resources under key org apache struts action
   MESSAGE
problem with html:text bean throwing exception
Cannot find message resources under key org apache struts action MESSAGE
invalid direct reference problem with solution
Tool for jsp debug Try Sysdeo Eclipse Plugin
Tomcat 5 Cannot load JDBC driver class 'null ' SQL state: null
weblogic ejbc
java properties file
Jboss 3 2 3 Coyote Can 't re
Tomcat 5, Apache2 and mod jk2 integration problem
JBoss example problem new to J2EE
url string for connecting jboss to oracle
Value attribute of <html:checkbox
javax servlet ServletException: BeanUtils populate
HTTP Status 404 The requested resource is not available
5 0 18: Windows XP Pro vs Windows 2000
 
- FD, replication timeouts, and shunning

- FD, replication timeouts, and shunning

2007-07-10       - By quinine

 Back
The short version of this question is: I have two nodes, A & B, with TreeCache
hibernate 2nd-level cache replication.  Node B will have several GCs during the
day that make it unresponsive for minutes at a time.  During these, Node A
throws ReplicationExceptions.  I want to set my jbosscache/jgroups
configuration to enable node A to withstand these GC events on node B, either
by temporarily shunning or by having appropriate timeouts, or some combination
of both.

Long Version:

I have inherited a jboss cluster.  I have been doing a great deal of reading, I
have gone through the jgroups, jbosscache, & hibernate jbosscache wikis here on
jboss.org.

We have 2 nodes , A & B, in a cluster.  Both are running:

JBoss 4.0.5
JBossCache 1.4.1.SP3
JGroups 2.4.1-SP1.


The jboss-config.xml for the hibernate treecache (identical on both):


 | <?xml version="1.0" encoding="UTF-8 (See http://UTF-8.ora-code.com)"?>
 | <server>
 |     <mbean code="org.jboss.cache.TreeCache"
 |         name="jboss.cache:service=HibernateTreeCache">
 |
 |         <depends>jboss:service=Naming</depends>
 |         <depends>jboss:service=TransactionManager</depends>
 |
 |         <attribute name="ClusterName">Hibernate-${jboss.partition.name
:Cluster}</attribute>
 |        
 |         <attribute name="IsolationLevel">READ_COMMITTED</attribute>
 |
 |         <attribute name="CacheMode">REPL_SYNC</attribute>
 |
 |     <attribute name="UseRegionBasedMarshalling">false</attribute>
 |        
 |     <attribute name="InactiveOnStartup">false</attribute>
 |          
 |         <attribute name="TransactionManagerLookupClass">org.jboss.cache
.BatchModeTransactionManagerLookup</attribute>
 |
 |         <attribute name="ClusterConfig">
 |            <config>
 |               <TCP bind_addr="${partition.tcphost:HIB3-MISCONFIGURED}"
 |                   start_port="${partition.tcpport.hib3:HIB3-MISCONFIGURED}"
loopback="false"
 |                    tcp_nodelay="false" up_thread="false" down_thread="false
"/>
 |               <TCPPING initial_hosts="${partition.tcphosts.hib3:HIB3
-MISCONFIGURED}"
 |                 port_range="3" timeout="3500"
 |                  num_initial_members="3" up_thread="false" down_thread=
"false"/>
 |               <MERGE2 min_interval="20000" max_interval="100000"
 |                  down_thread="false" up_thread="false"/>
 |               <FD_SOCK down_thread="false" up_thread="false"/>
 |               <FD shun="true" down_thread="false" up_thread="false"
 |                  timeout="20000" max_tries="5"/>
 |               <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread=
"false"/>
 |               <pbcast.NAKACK up_thread="false" down_thread="false" gc_lag=
"100"
 |                  retransmit_timeout="60000"/>
 |               <pbcast.STABLE desired_avg_gossip="50000" up_thread="false"
down_thread="false" />
 |               <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true"
 |                  print_local_addr="true" down_thread="false" up_thread=
"false"/>
 |               <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
 |            </config>
 |
 |
 |         </attribute>
 |
 |          <attribute name="FetchInMemoryState">false</attribute>
 |          <attribute name="InitialStateRetrievalTimeout">20000</attribute>
 |
 |         <attribute name="SyncReplTimeout">20000</attribute>
 |
 |         <attribute name="LockAcquisitionTimeout">15000</attribute>
 |
 |         <attribute name="BuddyReplicationConfig">
 |             <config>
 |                 <buddyReplicationEnabled>false</buddyReplicationEnabled>
 |                 <buddyLocatorClass>org.jboss.cache.buddyreplication
.NextMemberBuddyLocator</buddyLocatorClass>
 |                 <buddyLocatorProperties>
 |                     numBuddies = 1
 |                     ignoreColocatedBuddies = true
 |                 </buddyLocatorProperties>
 |
 |                 <buddyPoolName>default</buddyPoolName>
 |                 <buddyCommunicationTimeout>2000</buddyCommunicationTimeout>
 |
 |                 <autoDataGravitation>false</autoDataGravitation>
 |                 <dataGravitationRemoveOnFind>true<
/dataGravitationRemoveOnFind>
 |                 <dataGravitationSearchBackupTrees>true<
/dataGravitationSearchBackupTrees>
 |
 |             </config>
 |         </attribute>
 |      
 |     </mbean>
 |
 | </server>
 |

Our cluster-service.xml:

 | <?xml version="1.0" encoding="UTF-8 (See http://UTF-8.ora-code.com)"?>
 |
 | <server>
 |
 |    <mbean code="org.jboss.ha.framework.server.ClusterPartition Source code of org.jboss.ha.framework.server.ClusterPartition"
 |       name="jboss:service=${jboss.partition.name:DefaultPartition}">
 |          
 |       <attribute name="PartitionName">${jboss.partition.name
:DefaultPartition}</attribute>
 |
 |       <attribute name="NodeAddress">${jboss.bind.address}</attribute>
 |
 |       <attribute name="DeadlockDetection">False</attribute>
 |      
 |       <attribute name="StateTransferTimeout">30000</attribute>
 |
 |       <attribute name="PartitionConfig">
 |          <Config>
 |             <TCP bind_addr="${partition.tcphost:CLUSTERCONFIG-MISCONFIGURED
}" start_port="${partition.tcpport.cluster:CLUSTERCONFIG-MISCONFIGURED}"
loopback="false"
 |                  recv_buf_size="2000000" send_buf_size="640000"
 |                  tcp_nodelay="true" up_thread="true" down_thread="true"/>
 |             <TCPPING initial_hosts="${partition.tcphosts.cluster
:CLUSTERCONFIG-MISCONFIGURED}"
 |                 port_range="3" timeout="3500"
 |                num_initial_members="3" up_thread="true" down_thread="true"/>
 |             <MERGE2 min_interval="10000" max_interval="20000" />
 |             <FD_SOCK down_thread="true" up_thread="true"/>
 |             <FD shun="true" up_thread="true" down_thread="true"
 |                timeout="10000" max_tries="5"/>
 |             <VERIFY_SUSPECT timeout="3000" down_thread="true" up_thread=
"true" />
 |             <pbcast.NAKACK up_thread="true" down_thread="true" gc_lag="100"
 |                retransmit_timeout="300,600,1200,2400,4800"/>
 |             <pbcast.STABLE desired_avg_gossip="20000" max_bytes="400000"
 |                down_thread="true" up_thread="true" />
 |             <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun=
"true"
 |                print_local_addr="true"  up_thread="true" down_thread="true"
/>
 |             <FC max_credits="2000000" down_thread="true" up_thread="true"
 |                  min_threshold="0.10"/>
 |             <FRAG2 frag_size="60000" down_thread="true" up_thread="true"/>
 |             <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
 |          </Config>
 |       </attribute>
 |       <depends>jboss:service=Naming</depends>
 |    </mbean>
 |
 |
 |    <mbean code="org.jboss.ha.hasessionstate.server.HASessionStateService Source code of org.jboss.ha.hasessionstate.server.HASessionStateService"
 |       name="jboss:service=HASessionState">
 |       <depends>jboss:service=Naming</depends>
 |       <!-- We now inject the partition into the HAJNDI service instead
 |            of requiring that the partition name be passed -->
 |       <depends optional-attribute-name="ClusterPartition"
 |          proxy-type="attribute">jboss:service=${jboss.partition.name
:DefaultPartition}</depends>
 |       <!-- JNDI name under which the service is bound -->
 |       <attribute name="JndiName">/HASessionState/Default</attribute>
 |       <!-- Max delay before cleaning unreclaimed state.
 |            Defaults to 30*60*1000 => 30 minutes -->
 |       <attribute name="BeanCleaningDelay">0</attribute>
 |    </mbean>
 |
 |    <mbean code="org.jboss.ha.jndi.HANamingService Source code of org.jboss.ha.jndi.HANamingService"
 |       name="jboss:service=HAJNDI">
 |       <!-- We now inject the partition into the HAJNDI service instead
 |            of requiring that the partition name be passed -->
 |       <depends optional-attribute-name="ClusterPartition"
 |          proxy-type="attribute">jboss:service=${jboss.partition.name
:DefaultPartition}</depends>
 |       <!-- Bind address of bootstrap and HA-JNDI RMI endpoints -->
 |       <attribute name="BindAddress">${jboss.bind.address}</attribute>
 |       <!-- Port on which the HA-JNDI stub is made available -->
 |       <attribute name="Port">1100</attribute>
 |       <!-- RmiPort to be used by the HA-JNDI service once bound. 0 => auto.
-->
 |       <attribute name="RmiPort">1101</attribute>
 |       <!-- Accept backlog of the bootstrap socket -->
 |       <attribute name="Backlog">50</attribute>
 |       <!-- The thread pool service used to control the bootstrap and
 |       auto discovery lookups -->
 |       <depends optional-attribute-name="LookupPool"
 |          proxy-type="attribute">jboss.system:service=ThreadPool</depends>
 |
 |       <!-- A flag to disable the auto discovery via multicast -->
 |       <attribute name="DiscoveryDisabled">false</attribute>
 |       <!-- Set the auto-discovery bootstrap multicast bind address. If not
 |       specified and a BindAddress is specified, the BindAddress will be
used. -->
 |       <attribute name="AutoDiscoveryBindAddress">${jboss.bind.address}<
/attribute>
 |       <!-- Multicast Address and group port used for auto-discovery -->
 |       <attribute name="AutoDiscoveryAddress">${jboss.partition.udpGroup:230
.0.0.4}</attribute>
 |       <attribute name="AutoDiscoveryGroup">1102</attribute>
 |       <!-- The TTL (time-to-live) for autodiscovery IP multicast packets -->
 |       <attribute name="AutoDiscoveryTTL">16</attribute>
 |       <!-- The load balancing policy for HA-JNDI -->
 |       <attribute name="LoadBalancePolicy">org.jboss.ha.framework.interfaces
.RoundRobin</attribute>
 |      
 |       <!-- Client socket factory to be used for client-server
 |            RMI invocations during JNDI queries
 |       <attribute name="ClientSocketFactory">custom</attribute>
 |       -->
 |       <!-- Server socket factory to be used for client-server
 |            RMI invocations during JNDI queries
 |       <attribute name="ServerSocketFactory">custom</attribute>
 |       -->
 |    </mbean>
 |
 |    <mbean code="org.jboss.invocation.jrmp.server.JRMPInvokerHA Source code of org.jboss.invocation.jrmp.server.JRMPInvokerHA"
 |       name="jboss:service=invoker,type=jrmpha">
 |       <attribute name="ServerAddress">${jboss.bind.address}</attribute>
 |       <attribute name="RMIObjectPort">4447</attribute>
 |       <!--
 |       <attribute name="RMIClientSocketFactory">custom</attribute>
 |       <attribute name="RMIServerSocketFactory">custom</attribute>
 |       -->
 |       <depends>jboss:service=Naming</depends>
 |    </mbean>
 |
 |    <!-- the JRMPInvokerHA creates a thread per request.  This
implementation uses a pool of threads -->
 |    <mbean code="org.jboss.invocation.pooled.server.PooledInvokerHA Source code of org.jboss.invocation.pooled.server.PooledInvokerHA"
 |       name="jboss:service=invoker,type=pooledha">
 |       <attribute name="NumAcceptThreads">1</attribute>
 |       <attribute name="MaxPoolSize">300</attribute>
 |       <attribute name="ClientMaxPoolSize">300</attribute>
 |       <attribute name="SocketTimeout">60000</attribute>
 |       <attribute name="ServerBindAddress">${jboss.bind.address}</attribute>
 |       <attribute name="ServerBindPort">4446</attribute>
 |       <attribute name="ClientConnectAddress">${jboss.bind.address}<
/attribute>
 |       <attribute name="ClientConnectPort">0</attribute>
 |       <attribute name="EnableTcpNoDelay">false</attribute>
 |       <depends optional-attribute-name="TransactionManagerService">jboss
:service=TransactionManager</depends>
 |       <depends>jboss:service=Naming</depends>
 |    </mbean>
 |
 |    <mbean code="org.jboss.cache.invalidation.bridges
.JGCacheInvalidationBridge"
 |       name="jboss.cache:service=InvalidationBridge,type=JavaGroups">
 |       <!-- We now inject the partition into the HAJNDI service instead
 |            of requiring that the partition name be passed -->
 |       <depends optional-attribute-name="ClusterPartition"
 |          proxy-type="attribute">jboss:service=${jboss.partition.name
:DefaultPartition}</depends>
 |       <depends>jboss.cache:service=InvalidationManager</depends>
 |       <attribute name="InvalidationManager">jboss.cache:service
=InvalidationManager</attribute>
 |       <attribute name="BridgeName">DefaultJGBridge</attribute>
 |    </mbean>
 |
 | </server>
 |

We occasionally get ReplicationExceptions on A, and we have been able to verify
that these occur during long (up to 4-5 minutes) GCs on B, where the jvm
becomes unresponsive.  

As I read the config snippets above, node A will not shun until it doesn't
receive a heartbeat for at least (20x5 + 1.5) = 101.5 seconds, but the
replication timeout is 20 seconds.  

So my questions are -

1) If I change the config so A will shun B before SyncReplTimeout, will this
prevent replication during the time where B is unresponsive (hence preventing
the ReplicationExceptions)?
2) I then am expecting B to be shunned somewhate regularly, but I always want B
to be able to rejoin the cluster when it becomes responsive again.  From what I
'm reading, this means setting shun=false.  Without shunning, how do I prevent
replication to the unresponsive node B?
3) Are there further caveats that I need to consider?  Will I need to make
similar timeout/config changes to the cluster-config.xml?

Thank you very much for your time.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic
&p=4062708#4062708

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode
=reply&p=4062708
__ ____ ____ ____ ____ ____ ____ ____ ____ ____
jboss-user mailing list
jboss-user@(protected)
https://lists.jboss.org/mailman/listinfo/jboss-user

©2008 junlu.com - Jax Systems, LLC, U.S.A.