- Initial State Transfer Failure 2007-11-01 - By alauro
Back Hi - I'm new to JBossCache and we are having an initial state transfer failure. Once the clustered caches are up - they talk fine and new objects or updates replicate perfectly. The problem is that if one box goes down, when it comes back up, the cache coming up doesn't get the old data from the other node. New data replicates or any objects that get updated. We have a pretty strict firewall setup with 48866 open for multicast traffic. We're using JBoss 4.0.4 and JBossCache 2.x.
Here's our config and some relevant logging. I wonder if our Eviction Policy is the problem.
<?xml version="1.0" encoding="UTF-8 (See http://UTF-8.ora-code.com)"?>
<!-- ===================================================================== --> <!-- Service to deploy and configure a replicating cache --> <!-- This is the default file used for installations. --> <!-- ===================================================================== -->
<!-- ==================================================================== - -> <!-- Defines TreeCache configuration - -> <!-- ==================================================================== - ->
jboss:service=Naming jboss:service=TransactionManager jboss.aop:service=AspectDeployer
<!-- Configure the TransactionManager --> org.jboss.cache.BatchModeTransactionManagerLookup
<!-- Isolation level : SERIALIZABLE REPEATABLE_READ (default) READ_COMMITTED READ_UNCOMMITTED NONE --> REPEATABLE_READ
<!-- Valid modes are LOCAL REPL_ASYNC REPL_SYNC INVALIDATION_ASYNC INVALIDATION_SYNC --> REPL_ASYNC
<!-- Just used for async repl: use a replication queue --> false
<!-- Replication interval for replication queue (in ms) --> 0
<!-- Max number of elements which trigger replication --> 0
<!-- Name of cluster. Needs to be the same for all clusters, in order to find each other --> TreeCache-Cluster
<!-- JGroups protocol stack properties. Can also be a URL, e.g. file:/home/bela/default.xml -->
<UDP mcast_addr="228.1.2.3" mcast_port="48866" ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="false" bind_addr="192.168.181.190" /> <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" initial_hosts="192.168.181.191[48866]"/> <MERGE2 min_interval="10000" max_interval="20000"/> <FD_SOCK/> <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false"/> <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192" up_thread="false" down_thread="false"/> <UNICAST timeout="600,1200,2400" window_size="100" min_threshold= "10" down_thread="false"/> <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false"/> <FRAG frag_size="8192" down_thread="false" up_thread="false"/> <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/> <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
<!-- Whether or not to fetch state on joining a cluster NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive. --> true
<!-- The max amount of time (in milliseconds) we wait until the initial state (ie. the contents of the cache) are retrieved from existing members in a clustered environment --> 20000
<!-- Number of milliseconds to wait until all responses for a synchronous call have been received. --> 20000
<!-- Max number of milliseconds to wait for a lock acquisition --> 15000
<!-- Eviction policy defines how long nodes in defined regions stay --> <!-- in cache before being removed. In this case a cache entry can --> <!-- stay in cache w/out any activity for the specified number of --> <!-- seconds. BAM will update timestamp on the record every time --> <!-- an interium accounting record is received. Likewise the record --> <!-- will be removed when a accounting stop record is received. --> <!-- So, this policy is used to remove nodes in cases where an --> <!-- accounting stop record is lost. --> <!-- --> <!-- NOTE: the timeToLiveSeconds must be greater than the time --> <!-- used to send interium accouning records. Also /SessionKeys --> <!-- and Sessions must have a timeToLiveSeconds > /SessionKeyIndex --> <!-- as the indexs have to be removed before the acutal data to --> <!-- avoid a race condition. -->
5
<!-- See notes above --> 0 3900
<!-- See notes above, time must be > than /SessionKeyuIndex --> 0 3901
<!-- See notes above, time must be > than /sessionKeyIndex --> 0 3902
<!-- Default Eviction policy for all other Nodes in cache --> 0 0
<!-- Indicate whether to use marshalling or not. Set this to true if you are running under a scoped class loader, e.g., inside an application server. Default is "false". --> false false
2007-11-01 08:39:39,009 DEBUG [org.jgroups.conf.ClassConfigurator] mapping is: 1: class org.jgroups.stack.IpAddress 2: class org.jgroups.protocols.CAUSAL$CausalHeader 3: class org.jgroups.protocols.FD$FdHeader 4: class org.jgroups.protocols.FD_PID$FdHeader 5: class org.jgroups.protocols.FD_PROB$FdHeader 6: class org.jgroups.protocols.FD_SOCK$FdHeader 7: class org.jgroups.protocols.FragHeader 8: class org.jgroups.protocols.MERGE$MergeHeader 9: class org.jgroups.protocols.NakAckHeader 10: class org.jgroups.protocols.PARTITIONER$PartitionerHeader 11: class org.jgroups.protocols.PerfHeader 12: class org.jgroups.protocols.PIGGYBACK$PiggybackHeader 13: class org.jgroups.protocols.PingHeader 14: class org.jgroups.protocols.TcpHeader 15: class org.jgroups.protocols.TOTAL$Header 16: class org.jgroups.protocols.TOTAL_OLD$TotalHeader 17: class org.jgroups.protocols.TOTAL_TOKEN$TotalTokenHeader 18: class org.jgroups.protocols.TOTAL_TOKEN$RingTokenHeader 19: class org.jgroups.protocols.TunnelHeader 20: class org.jgroups.protocols.UdpHeader 21: class org.jgroups.protocols.UNICAST$UnicastHeader 22: class org.jgroups.protocols.VERIFY_SUSPECT$VerifyHeader 23: class org.jgroups.protocols.WANPIPE$WanPipeHeader 24: class org.jgroups.protocols.pbcast.GMS$GmsHeader 25: class org.jgroups.protocols.pbcast.NakAckHeader 26: class org.jgroups.protocols.pbcast.PbcastHeader 27: class org.jgroups.protocols.pbcast.STABLE$StableHeader 28: class org.jgroups.protocols.pbcast.STATE_TRANSFER$StateHeader 29: class org.jgroups.protocols.SMACK$SmackHeader 30: class org.jgroups.Message 31: class org.jgroups.View 32: class org.jgroups.ViewId 33: class org.jgroups.util.List 34: interface org.jgroups.Address 35: class [Lorg.jgroups.Address; 36: class org.jgroups.protocols.PingRsp 37: class [Ljava.lang.Object; 38: class java.util.Vector 39: class org.jgroups.protocols.pbcast.JoinRsp 40: class org.jgroups.protocols.pbcast.Digest 41: class java.util.Hashtable 50: class org.jgroups.service.lease.LeaseRequestHeader 51: class org.jgroups.service.lease.LeaseResponseHeader 52: class org.jgroups.service.lease.DenyResponseHeader 53: class org.jgroups.protocols.COMPRESS$CompressHeader 54: class org.jgroups.protocols.FC$FcHeader 55: class org.jgroups.protocols.WanPipeAddress
2007-11-01 08:39:39,024 DEBUG [org.jgroups.protocols.AUTOCONF] frag_size=64000 2007-11-01 08:39:39,061 DEBUG [org.jgroups.protocols.pbcast.GMS] changed role to org.jgroups.protocols.pbcast.ClientGmsImpl 2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.marshall.VersionAwareMarshaller] Initialised with version 1.4.0 and versionInt 14 2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.marshall.VersionAwareMarshaller] Using default marshaller class org.jboss.cache.marshall.TreeCacheMarshaller140 2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.aop.TreeCacheAop] Created jboss .cache:service=TomcatClusteringCache 2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.aop.TreeCacheAop] Starting jboss .cache:service=TomcatClusteringCache 2007-11-01 08:39:39,173 DEBUG [org.jgroups.protocols.UDP] creating sockets and starting threads 2007-11-01 08:39:39,173 INFO [org.jgroups.protocols.UDP] sockets will use interface 192.168.181.191 2007-11-01 08:39:39,187 INFO [org.jgroups.protocols.UDP] socket information: local_addr=192.168.181.191:37990, mcast_addr=228.1.2.3:48866, bind_addr=/192 .168.181.191, ttl=64 sock: bound to 192.168.181.191:37990, receive buffer size=80000, send buffer size=131071 mcast_recv_sock: bound to 192.168.181.191:48866, send buffer size=131071, receive buffer size=80000 mcast_send_sock: bound to 192.168.181.191:37991, send buffer size=131071, receive buffer size=80000 2007-11-01 08:39:39,188 DEBUG [org.jgroups.protocols.PING] adding my address (192.168.181.191:37990) to initial_hosts; initial_hosts=[[192.168.181.190:48866 ] [192.168.181.191:37990 ] ] 2007-11-01 08:39:39,190 INFO [STDOUT] -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- -- GMS: address is 192.168.181.191:37990 -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- -- 2007-11-01 08:39:39,190 DEBUG [org.jgroups.protocols.UDP] created unicast receiver thread 2007-11-01 08:39:41,193 DEBUG [org.jgroups.protocols.PingWaiter] initial mbrs are [] 2007-11-01 08:39:41,194 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] initial_mbrs are [] 2007-11-01 08:39:41,194 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] no initial members discovered: creating group as first member 2007-11-01 08:39:41,200 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr =192.168.181.191:37990] view is [192.168.181.191:37990|0] [192.168.181.191:37990 ] 2007-11-01 08:39:41,204 INFO [org.jboss.cache.TreeCache] TreeCache local address is 192.168.181.191:37990 2007-11-01 08:39:41,206 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] GET _STATE: first member (no state) 2007-11-01 08:39:41,206 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192 .168.181.191:37990|0] [192.168.181.191:37990] 2007-11-01 08:39:41,212 DEBUG [org.jboss.cache.TreeCache] transferred state is null (may be first member in cluster) 2007-11-01 08:39:41,216 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.181 .191:37990 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl 2007-11-01 08:39:41,216 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.181 .191:37990 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl 2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] created group (first member). My view is [192.168.181.191:37990|0], impl is org .jgroups.protocols.pbcast.CoordGmsImpl 2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [192.168.181.191:37990] 2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.FD_SOCK] first member; cache is empty 2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.MERGE2] merge task started as I'm the coordinator 2007-11-01 08:39:41,217 INFO [org.jboss.cache.TreeCache] State could not be retrieved (we are the first member in group) 2007-11-01 08:39:41,217 DEBUG [org.jboss.cache.eviction.RegionManager] Starting eviction timer 2007-11-01 08:39:41,218 INFO [org.jboss.cache.TreeCache] parseConfig(): PojoCacheConfig is empty 2007-11-01 08:39:41,218 DEBUG [org.jboss.cache.aop.TreeCacheAop] Started jboss .cache:service=TomcatClusteringCache 2007-11-01 08:39:41,235 DEBUG [org.jboss.cache.invalidation.InvalidationManager ] Creating jboss.cache:service=InvalidationManager 2007-11-01 08:39:41,235 DEBUG [org.jboss.cache.invalidation.InvalidationManager ] Created jboss.cache:service=InvalidationManager 2007-11-01 08:39:41,236 DEBUG [org.jboss.cache.invalidation.InvalidationManager ] Starting jboss.cache:service=InvalidationManager 2007-11-01 08:39:41,236 DEBUG [org.jboss.cache.invalidation.InvalidationManager ] Starting Invalidation Manager jboss.cache:service=InvalidationManager 2007-11-01 08:39:41,236 DEBUG [org.jboss.cache.invalidation.InvalidationManager ] Started jboss.cache:service=InvalidationManager 2007-11-01 08:39:48,625 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] sender=192.168.181.190:37863, merge_id=[192.168.181.190:37863|1193927988632] 2007-11-01 08:39:48,626 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] response=GmsHeader[MERGE_RSP]: view=[192.168.181.191:37990|0] [192.168.181.191 :37990], digest=[192.168.181.191:37990: [0 : 0], merge_rejected=false, merge_id= [192.168.181.190:37863|1193927988632] 2007-11-01 08:39:48,630 WARN [org.jgroups.protocols.pbcast.NAKACK] 192.168.181 .191:37990] discarded message from non-member 192.168.181.190:37863 2007-11-01 08:39:48,632 DEBUG [org.jgroups.protocols.FD_SOCK] [SUSPECT] hdr =SUSPECT, mbrs=[192.168.181.191:37990] 2007-11-01 08:39:48,632 DEBUG [org.jgroups.protocols.FD_SOCK] member is 192.168 .181.191:37990 2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.pbcast.GMS] mcasting view {[192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990]} (2 mbrs)
2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] view= [192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990], digest =[192.168.181.190:37863: [433 : 486, 192.168.181.191:37990: [0 : 0] 2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr =192.168.181.191:37990] view is [192.168.181.190:37863|3] [192.168.181.190:37863 , 192.168.181.191:37990] 2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [192.168.181.190:37863, 192.168.181.191:37990] 2007-11-01 08:39:48,634 DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest ()=192.168.181.190:37863, pingable_mbrs=[192.168.181.190:37863, 192.168.181.191 :37990] 2007-11-01 08:39:48,634 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192 .168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990] 2007-11-01 08:39:48,634 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.181 .191:37990 changed role to org.jgroups.protocols.pbcast.ParticipantGmsImpl 2007-11-01 08:39:48,937 DEBUG [org.jgroups.protocols.FD_SOCK] could not create socket to 192.168.181.190:37863; suspecting 192.168.181.190:37863 2007-11-01 08:39:48,937 DEBUG [org.jgroups.protocols.FD_SOCK] suspecting 192 .168.181.190:37863 (own address is 192.168.181.191:37990) 2007-11-01 08:39:48,937 DEBUG [org.jgroups.protocols.FD_SOCK] mbr=192.168.181 .190:37863 (size=1) 2007-11-01 08:39:48,938 DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest ()=null, pingable_mbrs=[192.168.181.191:37990] 2007-11-01 08:39:48,938 DEBUG [org.jgroups.protocols.FD_SOCK] pinger thread terminated 2007-11-01 08:39:48,938 DEBUG [org.jgroups.protocols.FD_SOCK] [SUSPECT] hdr =SUSPECT, mbrs=[192.168.181.190:37863] 2007-11-01 08:39:48,939 DEBUG [org.jgroups.protocols.FD_SOCK] member is 192.168 .181.190:37863 2007-11-01 08:39:58,068 DEBUG [org.jgroups.protocols.pbcast.ParticipantGmsImpl] view=[192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990] 2007-11-01 08:39:58,068 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr =192.168.181.191:37990] view is [192.168.181.190:37863|3] [192.168.181.190:37863 , 192.168.181.191:37990] 2007-11-01 08:39:58,068 DEBUG [org.jgroups.protocols.pbcast.GMS] [192.168.181 .191:37990] received view <= current view; discarding it (current vid: [192.168 .181.190:37863|3], new vid: [192.168.181.190:37863|3]) 2007-11-01 08:39:58,627 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] cancelling merge (merge_id=[192.168.181.190:37863|1193927988632]) 2007-11-01 08:40:01,234 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/IP_ADDR/70.1.2.3 for a node that doesn't exist yet. Process as an add. 2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/DEVICE_IP_LIST/10.10.10.10 for a node that doesn't exist yet. Process as an add. 2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/USER_NAME/testuser2 for a node that doesn't exist yet. Process as an add. 2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/SERVICE_LIST/internet2 for a node that doesn't exist yet. Process as an add. 2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /Sessions/42E2E6FCD2EC4345 for a node that doesn't exist yet. Process as an add. 2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeys/42E2E6FCD2EC4345 for a node that doesn't exist yet. Process as an add.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic &p=4101073#4101073
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode =reply&p=4101073 __ ____ ____ ____ ____ ____ ____ ____ ____ ____ jboss-user mailing list jboss-user@(protected) https://lists.jboss.org/mailman/listinfo/jboss-user
|
|