Recovering corrupt HLog files

2012-06-30 Thread Bryan Beaudreault
Hello all, In an AWS outtage we lost about a 5th of our regionservers, and about an 8th of our total datanodes. Despite a replication factor of 3, it appears we may have lost some data from corrupt HLogs. Looking at my hmaster I see messages like this: 12/06/30 00:00:48 INFO wal.HLogSplitter:

HBase dies shortly after starting.

2012-06-30 Thread Jay Wilson
I somewhat have HBase up and running in a distributed mode. It starts fine, I can use hbase shell to create, disable, and drop tables; however, after a short period of time HMaster and the HRegionalservers terminate. Decoding the error messages is a bit bewildering and the O'Reilly HBase book

Re:: HBase dies shortly after starting.

2012-06-30 Thread Dhaval Shah
Try cleaning up your zookeeper data.. I have had similar issues before due to corrupt zookeeper data/bad zookeeper state -- On Sat 30 Jun, 2012 4:12 AM IST Jay Wilson wrote: I somewhat have HBase up and running in a distributed mode. It starts fine, I can use

Re: HBase configuration using two hadoop servers

2012-06-30 Thread Asaf Mesika
I'm new to HBase my self, and when first trying to learn its installation path, I couldn't find a descent installation guide end-to-end (HDFS, HBase, Linux specific stuff, etc). I wrote an installation guide notes, which I'll be happy to expand into a full fledge guide, if it can be added to the

Re: HBase configuration using two hadoop servers

2012-06-30 Thread Stack
On Sat, Jun 30, 2012 at 3:50 PM, Asaf Mesika asaf.mes...@gmail.com wrote: I'm new to HBase my self, and when first trying to learn its installation path, I couldn't find a descent installation guide end-to-end (HDFS, HBase, Linux specific stuff, etc). So, you complain about the hbase doc

Re: HBase dies shortly after starting.

2012-06-30 Thread Stack
On Sat, Jun 30, 2012 at 12:42 AM, Jay Wilson j...@circle-cross-jn.com wrote: java.net.NoRouteToHostException: No route to host I do not see how hbase config. could provoke the above. There is something up w/ your base network setup. St.Ack

Re: Recovering corrupt HLog files

2012-06-30 Thread Stack
On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. Marking as corrupted What size

Re: HMaster not failing over dead RegionServers

2012-06-30 Thread Stack
On Sat, Jun 30, 2012 at 7:04 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: 12/06/30 00:07:22 INFO ipc.Client: Retrying connect to server: / 10.125.18.129:50020. Already tried 14 time(s). This was one of the servers that went down? It was not following through the splitting of HLog

Re: HMaster not failing over dead RegionServers

2012-06-30 Thread Jimmy Xiang
Bryan, The master could not detect if the region server is dead. How do you set the zookeeper session timeout? Thanks, Jimmy On Sat, Jun 30, 2012 at 8:09 AM, Stack st...@duboce.net wrote: On Sat, Jun 30, 2012 at 7:04 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: 12/06/30 00:07:22

Re: HBase configuration using two hadoop servers

2012-06-30 Thread Asaf Mesika
I've tried editing but I don't have permissions. What should be done to obtain them? Yeah I am. Look, using hbase over Linux and hdfs seems like the basic installation for an hbase newbie from my perspective. Thus quick starting this scheme could save time for many people. Sent from my iPhone

Re: Recovering corrupt HLog files

2012-06-30 Thread Bryan Beaudreault
They are all pretty large, around 40+mb. Will the walplayer be smart enough to only write edits that still look relevant (i.e. based on timestamps of the edits vs timestamps of the versions in hbase)? Writes have been coming in since we recovered. On Sat, Jun 30, 2012 at 11:05 AM, Stack

Re: Recovering corrupt HLog files

2012-06-30 Thread Li Pi
WALPlayer will look at the timestamp. Replaying an older edit that has since been overwritten shouldn't change anything. On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: They are all pretty large, around 40+mb. Will the walplayer be smart enough to only

Re: Recovering corrupt HLog files

2012-06-30 Thread Bryan Beaudreault
I should have mentioned in my initial email that I am operating on HBase 0.90.4. Is WALPlayer available in this version? I am having trouble finding it or anything similar. On Sat, Jun 30, 2012 at 1:14 PM, Li Pi l...@idle.li wrote: WALPlayer will look at the timestamp. Replaying an older edit

Re: Recovering corrupt HLog files

2012-06-30 Thread Li Pi
Nope. It came out in 0.94 otoh. On Sat, Jun 30, 2012 at 12:29 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: I should have mentioned in my initial email that I am operating on HBase 0.90.4. Is WALPlayer available in this version? I am having trouble finding it or anything similar.

Re: : HBase dies shortly after starting.

2012-06-30 Thread Christian Schäfer
I had exactly the same behaviour some months ago. Check your heapspaces of all hadoop services vs. available RAM for every machine. (machine memory should be higher than the sum of the services' heapspace) In my case that solved the problem. Von: Dhaval

Re: HBase dies shortly after starting.

2012-06-30 Thread Amandeep Khurana
To run HBase (or for that matter any distributed system) you need your networking setup to function properly. No route to host is caused due to issues with the underlying network. I have seen TORs losing packets, causing these exceptions. There could be several other issues that could cause

Re: Recovering corrupt HLog files

2012-06-30 Thread Jerry Lam
This is interesting because I saw this happens in the past. Is walplayer can be back ported to 0.90.x? Best Regards, Jerry Sent from my iPad On 2012-06-30, at 16:34, Li Pi l...@idle.li wrote: Nope. It came out in 0.94 otoh. On Sat, Jun 30, 2012 at 12:29 PM, Bryan Beaudreault

Leap second bug

2012-06-30 Thread Jean-Daniel Cryans
Hi all, If you are still debugging high CPU usage on your java processes, read this: http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-today Hope this helps, J-D

Re: Recovering corrupt HLog files

2012-06-30 Thread Bryan Beaudreault
Thanks all for the additional input. I do not think the HLogs are corrupted any longer, at least I think it was because we had also lost a good portion of data nodes. We have since recovered all the datanodes, so they are good. We will look in to creating an executable jar out of your WALPlayer