Re: Recovery issue - how to debug?

2010-04-19 Thread Dr Hao He
hi, All, Thanks folks. It turned out that zookeeper did send messages to all nodes. The issue was not caused by zookeeper. Regards, Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com On 20/04/2010, at 7:15 AM, Ted Dunning wrote: > Can you attach the scree

Re: Recovery issue - how to debug?

2010-04-19 Thread Travis Crawford
On Mon, Apr 19, 2010 at 2:15 PM, Ted Dunning wrote: > Can you attach the screen shot to the JIRA issue?  The mailing list strips > these things. Oops. Updated jira: https://issues.apache.org/jira/browse/ZOOKEEPER-744 --travis > > On Mon, Apr 19, 2010 at 1:18 PM, Travis Crawford > wrote: > >>

Re: Recovery issue - how to debug?

2010-04-19 Thread Ted Dunning
Can you attach the screen shot to the JIRA issue? The mailing list strips these things. On Mon, Apr 19, 2010 at 1:18 PM, Travis Crawford wrote: > Filed: > >https://issues.apache.org/jira/browse/ZOOKEEPER-744 > > Attached is a screenshot of some JMX output in Ganglia - its currently > impleme

Re: Recovery issue - how to debug?

2010-04-19 Thread Travis Crawford
On Mon, Apr 19, 2010 at 12:10 PM, Patrick Hunt wrote: > > On 04/19/2010 11:55 AM, Travis Crawford wrote: >> >> To double-check, is the best way to tell a ZK instance is up-to-date >> by looking at its ``LastZxid`` value? For example: >> >> $ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - lo

Re: Recovery issue - how to debug?

2010-04-19 Thread Ted Dunning
Travis, have you seen the ruok command? It should be pretty easy to add other stats to that. On Mon, Apr 19, 2010 at 11:55 AM, Travis Crawford wrote: > It would be a lot easier from the operations perspective if the leader > explicitly published some health stats: > > (a) Count of instances in t

Re: Recovery issue - how to debug?

2010-04-19 Thread Patrick Hunt
On 04/19/2010 11:55 AM, Travis Crawford wrote: To double-check, is the best way to tell a ZK instance is up-to-date by looking at its ``LastZxid`` value? For example: $ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081 org.apache.ZooKeeperService:name0=ReplicatedServer_id1,na

Re: Recovery issue - how to debug?

2010-04-19 Thread Travis Crawford
To double-check, is the best way to tell a ZK instance is up-to-date by looking at its ``LastZxid`` value? For example: $ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081 org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=InMemoryDataTre

Re: Recovery issue - how to debug?

2010-04-19 Thread Patrick Hunt
Usually the server logs will shed light on such issues. If we had access to them it might be easier to speculate. Patrick On 04/19/2010 09:22 AM, Mahadev Konar wrote: Hi Hao, As Vishal already asked, how are you determining if the writes are being received? Also, what was the status of C2

Re: Recovery issue - how to debug?

2010-04-19 Thread Mahadev Konar
Hi Hao, As Vishal already asked, how are you determining if the writes are being received? Also, what was the status of C2 when you checked for these writes? Do you have the output of echo "stat" | nc localhost port? How long did you wait when you say that C2 did not received the writes? What

Re: Recovery issue - how to debug?

2010-04-19 Thread Vishal K
Hi Hao, How are you determining whether a ZK server has received the writes or not? Regards, -Vishal On Mon, Apr 19, 2010 at 1:54 AM, Dr Hao He wrote: > I have zookeeper cluster E1 with 3 nodes A,B, and C. > > I stopped C and did some writes on E1. Both A and B received the writes. > I then