Re: Should a data node restart cause a region server to go down?

2012-02-07 Thread Jeff Whiting
version of CDH HBase? Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - From: Ted Yu To: user@hbase.apache.org Cc: Sent: Tuesday, February 7, 2012 3:45 AM Subject: Re: Should a data node restart caus

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Andrew Purtell
February 7, 2012 3:45 AM > Subject: Re: Should a data node restart cause a region server to go down? > > In your case Error Recovery wasn't successful because of: > All datanodes 10.49.29.92:50010 are bad. Aborting... > > On Mon, Feb 6, 2012 at 10:28 AM, Jeff Whiting wrote

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
So I restart one of the data nodes and everything continues to work just fine even though the local one is no longer valid. Additionally I can restart n-1 nodes without any problem and hbase continues to work. However as soon as I restart the last data node RSs start dying. hbck and fsck say

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Harsh J
This is the normal behavior of the sync-API (that when the first DN in pipeline fails, the whole op is failed), correct me if am wrong. The rule here I think was that you do not want RSes to go switch over writing to a remote DN cause the first one in the pipeline (always the local one) failed. He

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
What would "hadoop fsck /" that type of problem if there really were no nodes with that data? The worst I've seen is: Target Replicas is 4 but found 3 replica(s). ~Jeff On 2/6/2012 12:45 PM, Ted Yu wrote: In your case Error Recovery wasn't successful because of: All datanodes 10.49.29.92:500

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
I've been able to reproduce this on multiple clusters. I'm basically doing a rolling restart of data nodes with 1 every 5-10+ minutes. However the region servers will just die. "hadoop fsck /" shows it is healthy, the web interface says all the data nodes are up, and region servers logs seem q

Re: Should a data node restart cause a region server to go down?

2012-02-06 Thread Ted Yu
In your case Error Recovery wasn't successful because of: All datanodes 10.49.29.92:50010 are bad. Aborting... On Mon, Feb 6, 2012 at 10:28 AM, Jeff Whiting wrote: > I was increasing the storage on some of my data nodes and thus had to do a > restart of the data node. I use cdh3u2 and ran "/etc

Should a data node restart cause a region server to go down?

2012-02-06 Thread Jeff Whiting
I was increasing the storage on some of my data nodes and thus had to do a restart of the data node. I use cdh3u2 and ran "/etc/init.d/hadoop-0.20-datanode restart" (I don't think this is a cdh problem). Unfortunately doing the restart caused region servers to go offline. Is this expected beha