Re: Regions loading too fast

2010-09-28 Thread Stack
Jack: You might want to try applying hbase-3038 (there's two patches up there. you'll need both). Thought is that it might be cause of the EOFE you were running into (even though your files seemed less than the 2G that hbase-3038 is about). St.Ack On Tue, Sep 28, 2010 at 12:12 PM, Stack wrote:

Re: Regions loading too fast

2010-09-28 Thread Stack
I made https://issues.apache.org/jira/browse/HBASE-3046 for looking into this. We though we're repro'd it here but it seems like we were running into hbase-3038... which was not your case, at least, not for the two files you made available to me. St.Ack On Fri, Sep 24, 2010 at 4:52 PM, Jack Levi

Re: Regions loading too fast

2010-09-27 Thread Stack
Oh, yeah, could be the OOME... But is OOME in regionserver or in Master? If in regionserver, I'd think that splitting we'd skip the incomplete record. Something is going on here. I'd like to figure it. The flag should get you going in that you'll recover all up to the last edit. St.Ack On Mo

Re: Regions loading too fast

2010-09-27 Thread Jack Levin
Yay, my first bug. I have not tried the flag yet (thanks), I will do that shortly. I was not able to find those files are created. Its likely the last addition not flushed properly as the server runs out of memory, so may as well call it lost data (in our case this is fine, we have source data

Re: Regions loading too fast

2010-09-27 Thread Stack
I took a look. When we go to read the last edit in each file, we overshoot. Its as though the last addition is not properly flushed. I looked at the writing of the recovered.edits file contents and we're using straight sequencefile#close so flush should be going on. You pasted again from regionse

Re: Regions loading too fast

2010-09-26 Thread Jack Levin
They were about 64MB, I will put them somewhere... like here: http://img2.imageshack.us/00618601094 http://img2.imageshack.us/00618601136 They are not gzippable, sorry... full of jpeg data I think. Here is an error snipper from master: http://pastebin.com/TdbYbDyy -Jack On Sun, S

Re: Regions loading too fast

2010-09-26 Thread Stack
On Sun, Sep 26, 2010 at 1:53 PM, Jack Levin wrote: > I had the same issue this morning, some of the regions > 'recovered.edits' was corrupt and no single region server was able to > load them.  I saved them if someone is interested to see why they can > not be processed. Are they zero-length? I

Re: Regions loading too fast

2010-09-26 Thread Jack Levin
I had the same issue this morning, some of the regions 'recovered.edits' was corrupt and no single region server was able to load them. I saved them if someone is interested to see why they can not be processed. I think here is what happens: 1. I am writing data to hbase, and it hits the regions

Re: Regions loading too fast

2010-09-24 Thread Jack Levin
http://pastebin.com/bD3JJ0sD The logs were 17MB in size max, and variable sizes like that. -Jack On Fri, Sep 24, 2010 at 4:56 PM, Stack wrote: > Please paste the section from regionserver where you were getting the > EOF to pastebin.  I'd like to see exactly where (but yeah, you get the > idea

Re: Regions loading too fast

2010-09-24 Thread Stack
Please paste the section from regionserver where you were getting the EOF to pastebin. I'd like to see exactly where (but yeah, you get the idea moving the files aside). Check the files too. Are they zero-length? If so, please look for them in the master log and paste me the section where we ar

Re: Regions loading too fast

2010-09-24 Thread Jack Levin
It was EOF exception, but now that I deleted edits files: Moved to trash: hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1062260343/recovered.edits/00617305532 Moved to trash: hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1321772129/recovered.edits/00617328530 Moved to trash

Re: Regions loading too fast

2010-09-24 Thread Stack
What is the complaint in regionserver log when region load fails? St.Ack On Fri, Sep 24, 2010 at 4:40 PM, Jack Levin wrote: > so, datanode log shows no errors whatsoever, however I do see same > blocks fetched repeatedly, and the network speed is quite high, but I > am unable to load _some_ regio

Re: Regions loading too fast

2010-09-24 Thread Jack Levin
so, datanode log shows no errors whatsoever, however I do see same blocks fetched repeatedly, and the network speed is quite high, but I am unable to load _some_ regions, what could it be? 2010-09-24 16:38:42,729 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.101.6.2:50

Re: Regions loading too fast

2010-09-24 Thread Stack
(Good one Ryan) Master is doing the assigning. It needs to be restarted to see the config change. St.Ack On Fri, Sep 24, 2010 at 4:28 PM, Jack Levin wrote: > Only regionserver, do I need to restart both? > > -jack > > On Fri, Sep 24, 2010 at 4:22 PM, Ryan Rawson wrote: >> Did you restart the

Re: Regions loading too fast

2010-09-24 Thread Stack
The '1' is give a regionserver '1' region to open each time it checks in. Servers check in every second by default. It looks like you have a good few servers coming in. Change this if you'd have them come in less frequently: hbase.regionserver.msginterval 1000 Interval between m

Re: Regions loading too fast

2010-09-24 Thread Jack Levin
Only regionserver, do I need to restart both? -jack On Fri, Sep 24, 2010 at 4:22 PM, Ryan Rawson wrote: > Did you restart the master and the regionserver? Or just one or the other? > > -ryan > > On Fri, Sep 24, 2010 at 4:21 PM, Jack Levin wrote: >> Also, even with '1' value, I see: >> >> 2010-0

Re: Regions loading too fast

2010-09-24 Thread Ryan Rawson
Did you restart the master and the regionserver? Or just one or the other? -ryan On Fri, Sep 24, 2010 at 4:21 PM, Jack Levin wrote: > Also, even with '1' value, I see: > > 2010-09-24 16:20:29,983 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > img834,1000351n.jpg,12

Re: Regions loading too fast

2010-09-24 Thread Jack Levin
Also, even with '1' value, I see: 2010-09-24 16:20:29,983 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: img834,1000351n.jpg,1285251664421.d09510a16c0cfd0d8a251a36229125e0. 2010-09-24 16:20:29,984 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:

Re: Regions loading too fast

2010-09-24 Thread Jack Levin
Still having a problem: 2010-09-24 16:15:02,572 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening img695,p1908101232.jpg,1285288492084.d451f05024b42f71a115951c62cdcccf. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org

Re: Regions loading too fast

2010-09-24 Thread Stack
Try hbase.regions.percheckin 10 Maximum number of regions that can be assigned in a single go to a region server. What do you have now? Whatever it is, go down from there. St.Ack On Fri, Sep 24, 2010 at 3:07 PM, Jack Levin wrote: > My regions are 1gb in size and wh

Regions loading too fast

2010-09-24 Thread Jack Levin
My regions are 1gb in size and when I cold start the cluster I oversaturate my network links (1000 mbps) and get client dfs timeouts , anyway to slow the m down? -Jack