Working oscache cluster stoppped working - how to id the cause/resolve

View: New views
1 Messages — Rating Filter:   Alert me  

Working oscache cluster stoppped working - how to id the cause/resolve

by Rajesh SIa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I had a working oscache implementation working and due to changes which i believe are network related my cluster members cannot join each other and work as a cluster.  What seems to happen is they both broadcas to the multicast (same ip for both, again was working fine) but they do not join each others group.  They seem to miss one another and then treat each other as suspect members.  Meaning we do not see the typical cacheflush and reloading normally expected.  We see the MERGE view operations.  Put a snoop on the unix (recently patched Sun Solaris 9 Patch Cluster) and network guys tell me they can see the packets from 1 server to the other.  I confirm this as it seems like the servers will communicate with each other after the intial members try to join.  Seems like that is normal.  But the big problem is they do not join and don't seem to try again.

Have been trying to get to finest log level available.  We are down to FINE level but can't get to debug which would seem to help us see message level of the initial join process.  Currently reported to me that developer thinks that we don't use Log4j so is why we can't get to the debug level of messaging in oscache or gms.  At ALL level in oscache my logs currently show the following on initiatialization.  

Any pointers or help on this much appreciated.
- Why might they miss each joingin group, yet still be able to broadcast updates to one another?
- How to get to Debug level of logging referenced in many of the oschache and gms wikis?
+++++++++++++++++++++++
Catalina log showing our initialization.  
This is the 1 Server - FINE: initial_mbrs=[[own_addr=10.78.45.87:46943, coord_addr=10.78.45.87:46943, is_server=true]]
Other clustered server member is 10.78.45.88 which is not in the members.

++++++++++++++++++++++++
Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.base.Config <init>
FINE: Config() called

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.base.Config loadProps
FINE: Getting Config

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.base.Config loadProps
INFO: Properties {cache.blocking=true, cache.algorithm=com.opensymphony.oscache.base.algorithm.LRUCache, cache.cluster.multicast.ip=231.12.21.150, cache.capacity=1000, cache.event.listeners=com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListener}

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.base.Config <init>
FINE: Config() called

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.base.Config loadProps
FINE: Getting Config

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.base.Config loadProps
INFO: Properties {cache.blocking=true, cache.algorithm=com.opensymphony.oscache.base.algorithm.LRUCache, cache.cluster.multicast.ip=231.12.21.150, cache.capacity=1000, cache.event.listeners=com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListener}

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.base.AbstractCacheAdministrator <init>
FINE: Constructed AbstractCacheAdministrator()

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.general.GeneralCacheAdministrator <init>
INFO: Constructed GeneralCacheAdministrator()

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.general.GeneralCacheAdministrator createCache
INFO: Creating new cache

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.plugins.clustersupport.AbstractBroadcastingListener <init>
INFO: AbstractBroadcastingListener registered

Sep 8, 2008 1:40:46 PM com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListener initialize
INFO: Starting a new JavaGroups broadcasting listener with properties=UDP(mcast_addr=231.12.21.150;mcast_port=45566;ip_ttl=32;mcast_send_buf_size=150000;mcast_recv_buf_size=80000):PING(timeout=2000;num_initial_members=3):MERGE2(min_interval=5000;max_interval=10000):FD_SOCK:VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800;max_xmit_size=8192):UNICAST(timeout=300,600,1200,2400):pbcast.STABLE(desired_avg_gossip=20000):FRAG(frag_size=8096;down_thread=false;up_thread=false):pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=true)

Sep 8, 2008 1:40:47 PM org.jgroups.conf.ClassConfigurator init
FINE: mapping is:
1: class org.jgroups.stack.IpAddress
2: class org.jgroups.protocols.CAUSAL$CausalHeader
3: class org.jgroups.protocols.FD$FdHeader
4: class org.jgroups.protocols.FD_PID$FdHeader
5: class org.jgroups.protocols.FD_PROB$FdHeader
6: class org.jgroups.protocols.FD_SOCK$FdHeader
7: class org.jgroups.protocols.FragHeader
8: class org.jgroups.protocols.MERGE$MergeHeader
9: class org.jgroups.protocols.NakAckHeader
10: class org.jgroups.protocols.PARTITIONER$PartitionerHeader
11: class org.jgroups.protocols.PerfHeader
12: class org.jgroups.protocols.PIGGYBACK$PiggybackHeader
13: class org.jgroups.protocols.PingHeader
14: class org.jgroups.protocols.TcpHeader
15: class org.jgroups.protocols.TOTAL$Header
16: class org.jgroups.protocols.TOTAL_OLD$TotalHeader
17: class org.jgroups.protocols.TOTAL_TOKEN$TotalTokenHeader
18: class org.jgroups.protocols.TOTAL_TOKEN$RingTokenHeader
19: class org.jgroups.protocols.TunnelHeader
20: class org.jgroups.protocols.UdpHeader
21: class org.jgroups.protocols.UNICAST$UnicastHeader
22: class org.jgroups.protocols.VERIFY_SUSPECT$VerifyHeader
23: class org.jgroups.protocols.WANPIPE$WanPipeHeader
24: class org.jgroups.protocols.pbcast.GMS$GmsHeader
25: class org.jgroups.protocols.pbcast.NakAckHeader
26: class org.jgroups.protocols.pbcast.PbcastHeader
27: class org.jgroups.protocols.pbcast.STABLE$StableHeader
28: class org.jgroups.protocols.pbcast.STATE_TRANSFER$StateHeader
29: class org.jgroups.protocols.SMACK$SmackHeader
30: class org.jgroups.Message
31: class org.jgroups.View
32: class org.jgroups.ViewId
33: class org.jgroups.util.List
34: interface org.jgroups.Address
35: class [Lorg.jgroups.Address;
36: class org.jgroups.protocols.PingRsp
37: class [Ljava.lang.Object;
38: class java.util.Vector
39: class org.jgroups.protocols.pbcast.JoinRsp
40: class org.jgroups.protocols.pbcast.Digest
41: class java.util.Hashtable
50: class org.jgroups.service.lease.LeaseRequestHeader
51: class org.jgroups.service.lease.LeaseResponseHeader
52: class org.jgroups.service.lease.DenyResponseHeader
53: class org.jgroups.protocols.COMPRESS$CompressHeader
54: class org.jgroups.protocols.FC$FcHeader
55: class org.jgroups.protocols.WanPipeAddress

Sep 8, 2008 1:40:47 PM org.jgroups.protocols.AUTOCONF senseMaxFragSize
FINE: frag_size=64000
Sep 8, 2008 1:40:47 PM org.jgroups.protocols.pbcast.GMS setImpl
FINE: changed role to org.jgroups.protocols.pbcast.ClientGmsImpl
Sep 8, 2008 1:40:47 PM org.jgroups.protocols.UDP start
FINE: creating sockets and starting threads
Sep 8, 2008 1:40:47 PM org.jgroups.protocols.UDP createSockets
INFO: sockets will use interface 10.78.45.87

Sep 8, 2008 1:40:47 PM org.jgroups.protocols.UDP createSockets
INFO: socket information:
local_addr=10.78.45.87:46943, mcast_addr=231.12.21.150:45566, bind_addr=/10.78.45.87, ttl=32
sock: bound to 10.78.45.87:46943, receive buffer size=64000, send buffer size=32000
mcast_recv_sock: bound to 10.78.45.87:45566, send buffer size=150000, receive buffer size=80000
mcast_send_sock: bound to 10.78.45.87:46944, send buffer size=150000, receive buffer size=80000

-------------------------------------------------------
GMS: address is 10.78.45.87:46943
-------------------------------------------------------
Sep 8, 2008 1:40:47 PM org.jgroups.protocols.UDP startThreads
FINE: created unicast receiver thread
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.PingWaiter run
FINE: initial mbrs are []
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.pbcast.ClientGmsImpl join
FINE: initial_mbrs are []
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.pbcast.ClientGmsImpl join
FINE: no initial members discovered: creating group as first member
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.pbcast.GMS installView
FINE: [local_addr=10.78.45.87:46943] view is [10.78.45.87:46943|0] [10.78.45.87:46943]
Sep 8, 2008 1:40:49 PM com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListener initialize
INFO: JavaGroups clustering support started successfully
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.pbcast.GMS setImpl
FINE: 10.78.45.87:46943 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.pbcast.GMS setImpl
FINE: 10.78.45.87:46943 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.pbcast.ClientGmsImpl becomeSingletonMember
FINE: created group (first member). My view is [10.78.45.87:46943|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
Sep 8, 2008 1:40:49 PM com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListener memberJoined
INFO: A new member at address '10.78.45.87:46943' has joined the cluster
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.FD_SOCK down
FINE: VIEW_CHANGE received: [10.78.45.87:46943]
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.MERGE2$FindSubgroups run
FINE: merge task started as I'm the coordinator
Sep 8, 2008 1:40:49 PM org.jgroups.protocols.FD_SOCK getCacheFromCoordinator
FINE: first member; cache is empty
Sep 8, 2008 1:40:58 PM org.jgroups.protocols.PingWaiter run
FINE: initial mbrs are []
Sep 8, 2008 1:40:58 PM org.jgroups.protocols.MERGE2$FindSubgroups run
FINE: initial_mbrs=[[own_addr=10.78.45.87:46943, coord_addr=10.78.45.87:46943, is_server=true]]
Sep 8, 2008 1:41:06 PM org.jgroups.protocols.PingWaiter run
FINE: initial mbrs are []
Sep 8, 2008 1:41:06 PM org.jgroups.protocols.MERGE2$FindSubgroups run
FINE: initial_mbrs=[[own_addr=10.78.45.87:46943, coord_addr=10.78.45.87:46943, is_server=true]]
Sep 8, 2008 1:41:17 PM org.jgroups.protocols.PingWaiter run
FINE: initial mbrs are []
Sep 8, 2008 1:41:17 PM org.jgroups.protocols.MERGE2$FindSubgroups run
FINE: initial_mbrs=[[own_addr=10.78.45.87:46943, coord_addr=10.78.45.87:46943, is_server=true]]
Sep 8, 2008 1:41:23 PM org.jgroups.protocols.pbcast.STABLE handleStableGossip
FINE: received digest 10.78.45.87:46943#0 (-1) from 10.78.45.87:46943
Sep 8, 2008 1:41:23 PM org.jgroups.protocols.pbcast.STABLE handleStableGossip
FINE: sending stability msg 10.78.45.87:46943#0 (-1)
Sep 8, 2008 1:41:24 PM org.jgroups.protocols.PingWaiter run
FINE: initial mbrs are []
Sep 8, 2008 1:41:24 PM org.jgroups.protocols.MERGE2$FindSubgroups run
FINE: initial_mbrs=[[own_addr=10.78.45.87:46943, coord_addr=10.78.45.87:46943, is_server=true]]
Sep 8, 2008 1:41:28 PM org.jgroups.protocols.pbcast.STABLE handleStabilityMessage
FINE: stability vector is [10.78.45.87:46943#0]
---------------------------------------------------------------------
Posted via Jive Forums
http://forums.opensymphony.com/thread.jspa?threadID=454342&messageID=542911#542911


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...

LightInTheBox - Buy quality products at wholesale price!