  | Mailing List | | Home | | Forum Home | | JBoss - Java Application Server | | Tomcat - JSP/Servlet container | | Struts - A MVC web framework | | iText - An open source PDF Java Library | | JDOM - JDOM XML Parser | | J2EE - A mailing list for Java(tm) 2 Platform, Enterprise Edition | | JSP - A mailing list about Java Server Pages specification and reference | | J2EE Pattern - An interest list for Sun Java Center J2EE Pattern Catalog | | Servlet - A mailing list for discussion about Sun Microsystem's Java Servlet API Technology | |
Struts & Hibernate
|
|
|
  | | | - FD, replication timeouts, and shunning | - FD, replication timeouts, and shunning 2007-07-10 - By quinine
Back The short version of this question is: I have two nodes, A & B, with TreeCache hibernate 2nd-level cache replication. Node B will have several GCs during the day that make it unresponsive for minutes at a time. During these, Node A throws ReplicationExceptions. I want to set my jbosscache/jgroups configuration to enable node A to withstand these GC events on node B, either by temporarily shunning or by having appropriate timeouts, or some combination of both.
Long Version:
I have inherited a jboss cluster. I have been doing a great deal of reading, I have gone through the jgroups, jbosscache, & hibernate jbosscache wikis here on jboss.org.
We have 2 nodes , A & B, in a cluster. Both are running:
JBoss 4.0.5 JBossCache 1.4.1.SP3 JGroups 2.4.1-SP1.
The jboss-config.xml for the hibernate treecache (identical on both):
| <?xml version="1.0" encoding="UTF-8 (See http://UTF-8.ora-code.com)"?> | <server> | <mbean code="org.jboss.cache.TreeCache" | name="jboss.cache:service=HibernateTreeCache"> | | <depends>jboss:service=Naming</depends> | <depends>jboss:service=TransactionManager</depends> | | <attribute name="ClusterName">Hibernate-${jboss.partition.name :Cluster}</attribute> | | <attribute name="IsolationLevel">READ_COMMITTED</attribute> | | <attribute name="CacheMode">REPL_SYNC</attribute> | | <attribute name="UseRegionBasedMarshalling">false</attribute> | | <attribute name="InactiveOnStartup">false</attribute> | | <attribute name="TransactionManagerLookupClass">org.jboss.cache .BatchModeTransactionManagerLookup</attribute> | | <attribute name="ClusterConfig"> | <config> | <TCP bind_addr="${partition.tcphost:HIB3-MISCONFIGURED}" | start_port="${partition.tcpport.hib3:HIB3-MISCONFIGURED}" loopback="false" | tcp_nodelay="false" up_thread="false" down_thread="false "/> | <TCPPING initial_hosts="${partition.tcphosts.hib3:HIB3 -MISCONFIGURED}" | port_range="3" timeout="3500" | num_initial_members="3" up_thread="false" down_thread= "false"/> | <MERGE2 min_interval="20000" max_interval="100000" | down_thread="false" up_thread="false"/> | <FD_SOCK down_thread="false" up_thread="false"/> | <FD shun="true" down_thread="false" up_thread="false" | timeout="20000" max_tries="5"/> | <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread= "false"/> | <pbcast.NAKACK up_thread="false" down_thread="false" gc_lag= "100" | retransmit_timeout="60000"/> | <pbcast.STABLE desired_avg_gossip="50000" up_thread="false" down_thread="false" /> | <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" | print_local_addr="true" down_thread="false" up_thread= "false"/> | <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/> | </config> | | | </attribute> | | <attribute name="FetchInMemoryState">false</attribute> | <attribute name="InitialStateRetrievalTimeout">20000</attribute> | | <attribute name="SyncReplTimeout">20000</attribute> | | <attribute name="LockAcquisitionTimeout">15000</attribute> | | <attribute name="BuddyReplicationConfig"> | <config> | <buddyReplicationEnabled>false</buddyReplicationEnabled> | <buddyLocatorClass>org.jboss.cache.buddyreplication .NextMemberBuddyLocator</buddyLocatorClass> | <buddyLocatorProperties> | numBuddies = 1 | ignoreColocatedBuddies = true | </buddyLocatorProperties> | | <buddyPoolName>default</buddyPoolName> | <buddyCommunicationTimeout>2000</buddyCommunicationTimeout> | | <autoDataGravitation>false</autoDataGravitation> | <dataGravitationRemoveOnFind>true< /dataGravitationRemoveOnFind> | <dataGravitationSearchBackupTrees>true< /dataGravitationSearchBackupTrees> | | </config> | </attribute> | | </mbean> | | </server> |
Our cluster-service.xml:
| <?xml version="1.0" encoding="UTF-8 (See http://UTF-8.ora-code.com)"?> | | <server> | | <mbean code="org.jboss.ha.framework.server.ClusterPartition " | name="jboss:service=${jboss.partition.name:DefaultPartition}"> | | <attribute name="PartitionName">${jboss.partition.name :DefaultPartition}</attribute> | | <attribute name="NodeAddress">${jboss.bind.address}</attribute> | | <attribute name="DeadlockDetection">False</attribute> | | <attribute name="StateTransferTimeout">30000</attribute> | | <attribute name="PartitionConfig"> | <Config> | <TCP bind_addr="${partition.tcphost:CLUSTERCONFIG-MISCONFIGURED }" start_port="${partition.tcpport.cluster:CLUSTERCONFIG-MISCONFIGURED}" loopback="false" | recv_buf_size="2000000" send_buf_size="640000" | tcp_nodelay="true" up_thread="true" down_thread="true"/> | <TCPPING initial_hosts="${partition.tcphosts.cluster :CLUSTERCONFIG-MISCONFIGURED}" | port_range="3" timeout="3500" | num_initial_members="3" up_thread="true" down_thread="true"/> | <MERGE2 min_interval="10000" max_interval="20000" /> | <FD_SOCK down_thread="true" up_thread="true"/> | <FD shun="true" up_thread="true" down_thread="true" | timeout="10000" max_tries="5"/> | <VERIFY_SUSPECT timeout="3000" down_thread="true" up_thread= "true" /> | <pbcast.NAKACK up_thread="true" down_thread="true" gc_lag="100" | retransmit_timeout="300,600,1200,2400,4800"/> | <pbcast.STABLE desired_avg_gossip="20000" max_bytes="400000" | down_thread="true" up_thread="true" /> | <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun= "true" | print_local_addr="true" up_thread="true" down_thread="true" /> | <FC max_credits="2000000" down_thread="true" up_thread="true" | min_threshold="0.10"/> | <FRAG2 frag_size="60000" down_thread="true" up_thread="true"/> | <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/> | </Config> | </attribute> | <depends>jboss:service=Naming</depends> | </mbean> | | | <mbean code="org.jboss.ha.hasessionstate.server.HASessionStateService " | name="jboss:service=HASessionState"> | <depends>jboss:service=Naming</depends> | <!-- We now inject the partition into the HAJNDI service instead | of requiring that the partition name be passed --> | <depends optional-attribute-name="ClusterPartition" | proxy-type="attribute">jboss:service=${jboss.partition.name :DefaultPartition}</depends> | <!-- JNDI name under which the service is bound --> | <attribute name="JndiName">/HASessionState/Default</attribute> | <!-- Max delay before cleaning unreclaimed state. | Defaults to 30*60*1000 => 30 minutes --> | <attribute name="BeanCleaningDelay">0</attribute> | </mbean> | | <mbean code="org.jboss.ha.jndi.HANamingService " | name="jboss:service=HAJNDI"> | <!-- We now inject the partition into the HAJNDI service instead | of requiring that the partition name be passed --> | <depends optional-attribute-name="ClusterPartition" | proxy-type="attribute">jboss:service=${jboss.partition.name :DefaultPartition}</depends> | <!-- Bind address of bootstrap and HA-JNDI RMI endpoints --> | <attribute name="BindAddress">${jboss.bind.address}</attribute> | <!-- Port on which the HA-JNDI stub is made available --> | <attribute name="Port">1100</attribute> | <!-- RmiPort to be used by the HA-JNDI service once bound. 0 => auto. --> | <attribute name="RmiPort">1101</attribute> | <!-- Accept backlog of the bootstrap socket --> | <attribute name="Backlog">50</attribute> | <!-- The thread pool service used to control the bootstrap and | auto discovery lookups --> | <depends optional-attribute-name="LookupPool" | proxy-type="attribute">jboss.system:service=ThreadPool</depends> | | <!-- A flag to disable the auto discovery via multicast --> | <attribute name="DiscoveryDisabled">false</attribute> | <!-- Set the auto-discovery bootstrap multicast bind address. If not | specified and a BindAddress is specified, the BindAddress will be used. --> | <attribute name="AutoDiscoveryBindAddress">${jboss.bind.address}< /attribute> | <!-- Multicast Address and group port used for auto-discovery --> | <attribute name="AutoDiscoveryAddress">${jboss.partition.udpGroup:230 .0.0.4}</attribute> | <attribute name="AutoDiscoveryGroup">1102</attribute> | <!-- The TTL (time-to-live) for autodiscovery IP multicast packets --> | <attribute name="AutoDiscoveryTTL">16</attribute> | <!-- The load balancing policy for HA-JNDI --> | <attribute name="LoadBalancePolicy">org.jboss.ha.framework.interfaces .RoundRobin</attribute> | | <!-- Client socket factory to be used for client-server | RMI invocations during JNDI queries | <attribute name="ClientSocketFactory">custom</attribute> | --> | <!-- Server socket factory to be used for client-server | RMI invocations during JNDI queries | <attribute name="ServerSocketFactory">custom</attribute> | --> | </mbean> | | <mbean code="org.jboss.invocation.jrmp.server.JRMPInvokerHA " | name="jboss:service=invoker,type=jrmpha"> | <attribute name="ServerAddress">${jboss.bind.address}</attribute> | <attribute name="RMIObjectPort">4447</attribute> | <!-- | <attribute name="RMIClientSocketFactory">custom</attribute> | <attribute name="RMIServerSocketFactory">custom</attribute> | --> | <depends>jboss:service=Naming</depends> | </mbean> | | <!-- the JRMPInvokerHA creates a thread per request. This implementation uses a pool of threads --> | <mbean code="org.jboss.invocation.pooled.server.PooledInvokerHA " | name="jboss:service=invoker,type=pooledha"> | <attribute name="NumAcceptThreads">1</attribute> | <attribute name="MaxPoolSize">300</attribute> | <attribute name="ClientMaxPoolSize">300</attribute> | <attribute name="SocketTimeout">60000</attribute> | <attribute name="ServerBindAddress">${jboss.bind.address}</attribute> | <attribute name="ServerBindPort">4446</attribute> | <attribute name="ClientConnectAddress">${jboss.bind.address}< /attribute> | <attribute name="ClientConnectPort">0</attribute> | <attribute name="EnableTcpNoDelay">false</attribute> | <depends optional-attribute-name="TransactionManagerService">jboss :service=TransactionManager</depends> | <depends>jboss:service=Naming</depends> | </mbean> | | <mbean code="org.jboss.cache.invalidation.bridges .JGCacheInvalidationBridge" | name="jboss.cache:service=InvalidationBridge,type=JavaGroups"> | <!-- We now inject the partition into the HAJNDI service instead | of requiring that the partition name be passed --> | <depends optional-attribute-name="ClusterPartition" | proxy-type="attribute">jboss:service=${jboss.partition.name :DefaultPartition}</depends> | <depends>jboss.cache:service=InvalidationManager</depends> | <attribute name="InvalidationManager">jboss.cache:service =InvalidationManager</attribute> | <attribute name="BridgeName">DefaultJGBridge</attribute> | </mbean> | | </server> |
We occasionally get ReplicationExceptions on A, and we have been able to verify that these occur during long (up to 4-5 minutes) GCs on B, where the jvm becomes unresponsive.
As I read the config snippets above, node A will not shun until it doesn't receive a heartbeat for at least (20x5 + 1.5) = 101.5 seconds, but the replication timeout is 20 seconds.
So my questions are -
1) If I change the config so A will shun B before SyncReplTimeout, will this prevent replication during the time where B is unresponsive (hence preventing the ReplicationExceptions)? 2) I then am expecting B to be shunned somewhate regularly, but I always want B to be able to rejoin the cluster when it becomes responsive again. From what I 'm reading, this means setting shun=false. Without shunning, how do I prevent replication to the unresponsive node B? 3) Are there further caveats that I need to consider? Will I need to make similar timeout/config changes to the cluster-config.xml?
Thank you very much for your time.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic &p=4062708#4062708
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode =reply&p=4062708 __ ____ ____ ____ ____ ____ ____ ____ ____ ____ jboss-user mailing list jboss-user@(protected) https://lists.jboss.org/mailman/listinfo/jboss-user
|
|
 |