Jack: You might want to try applying hbase-3038 (there's two patches
up there. you'll need both). Thought is that it might be cause of
the EOFE you were running into (even though your files seemed less
than the 2G that hbase-3038 is about).
St.Ack
On Tue, Sep 28, 2010 at 12:12 PM, Stack wrote:
I made https://issues.apache.org/jira/browse/HBASE-3046 for looking
into this. We though we're repro'd it here but it seems like we were
running into hbase-3038... which was not your case, at least, not for
the two files you made available to me.
St.Ack
On Fri, Sep 24, 2010 at 4:52 PM, Jack Levi
Oh, yeah, could be the OOME... But is OOME in regionserver or in
Master? If in regionserver, I'd think that splitting we'd skip the
incomplete record. Something is going on here. I'd like to figure
it. The flag should get you going in that you'll recover all up to
the last edit.
St.Ack
On Mo
Yay, my first bug. I have not tried the flag yet (thanks), I will do
that shortly. I was not able to find those files are created. Its
likely the last addition not flushed properly as the server runs out
of memory, so may as well call it lost data (in our case this is fine,
we have source data
I took a look. When we go to read the last edit in each file, we
overshoot. Its as though the last addition is not properly flushed.
I looked at the writing of the recovered.edits file contents and we're
using straight sequencefile#close so flush should be going on. You
pasted again from regionse
They were about 64MB, I will put them somewhere... like here:
http://img2.imageshack.us/00618601094
http://img2.imageshack.us/00618601136
They are not gzippable, sorry... full of jpeg data I think.
Here is an error snipper from master: http://pastebin.com/TdbYbDyy
-Jack
On Sun, S
On Sun, Sep 26, 2010 at 1:53 PM, Jack Levin wrote:
> I had the same issue this morning, some of the regions
> 'recovered.edits' was corrupt and no single region server was able to
> load them. I saved them if someone is interested to see why they can
> not be processed.
Are they zero-length?
I
I had the same issue this morning, some of the regions
'recovered.edits' was corrupt and no single region server was able to
load them. I saved them if someone is interested to see why they can
not be processed. I think here is what happens:
1. I am writing data to hbase, and it hits the regions
http://pastebin.com/bD3JJ0sD
The logs were 17MB in size max, and variable sizes like that.
-Jack
On Fri, Sep 24, 2010 at 4:56 PM, Stack wrote:
> Please paste the section from regionserver where you were getting the
> EOF to pastebin. I'd like to see exactly where (but yeah, you get the
> idea
Please paste the section from regionserver where you were getting the
EOF to pastebin. I'd like to see exactly where (but yeah, you get the
idea moving the files aside). Check the files too. Are they
zero-length? If so, please look for them in the master log and paste
me the section where we ar
It was EOF exception, but now that I deleted edits files:
Moved to trash:
hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1062260343/recovered.edits/00617305532
Moved to trash:
hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1321772129/recovered.edits/00617328530
Moved to trash
What is the complaint in regionserver log when region load fails?
St.Ack
On Fri, Sep 24, 2010 at 4:40 PM, Jack Levin wrote:
> so, datanode log shows no errors whatsoever, however I do see same
> blocks fetched repeatedly, and the network speed is quite high, but I
> am unable to load _some_ regio
so, datanode log shows no errors whatsoever, however I do see same
blocks fetched repeatedly, and the network speed is quite high, but I
am unable to load _some_ regions, what could it be?
2010-09-24 16:38:42,729 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/10.101.6.2:50
(Good one Ryan)
Master is doing the assigning. It needs to be restarted to see the
config change.
St.Ack
On Fri, Sep 24, 2010 at 4:28 PM, Jack Levin wrote:
> Only regionserver, do I need to restart both?
>
> -jack
>
> On Fri, Sep 24, 2010 at 4:22 PM, Ryan Rawson wrote:
>> Did you restart the
The '1' is give a regionserver '1' region to open each time it checks
in. Servers check in every second by default. It looks like you have
a good few servers coming in.
Change this if you'd have them come in less frequently:
hbase.regionserver.msginterval
1000
Interval between m
Only regionserver, do I need to restart both?
-jack
On Fri, Sep 24, 2010 at 4:22 PM, Ryan Rawson wrote:
> Did you restart the master and the regionserver? Or just one or the other?
>
> -ryan
>
> On Fri, Sep 24, 2010 at 4:21 PM, Jack Levin wrote:
>> Also, even with '1' value, I see:
>>
>> 2010-0
Did you restart the master and the regionserver? Or just one or the other?
-ryan
On Fri, Sep 24, 2010 at 4:21 PM, Jack Levin wrote:
> Also, even with '1' value, I see:
>
> 2010-09-24 16:20:29,983 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> img834,1000351n.jpg,12
Also, even with '1' value, I see:
2010-09-24 16:20:29,983 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
img834,1000351n.jpg,1285251664421.d09510a16c0cfd0d8a251a36229125e0.
2010-09-24 16:20:29,984 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
Still having a problem:
2010-09-24 16:15:02,572 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening
img695,p1908101232.jpg,1285288492084.d451f05024b42f71a115951c62cdcccf.
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at
org
Try
hbase.regions.percheckin
10
Maximum number of regions that can be assigned in a single go
to a region server.
What do you have now? Whatever it is, go down from there.
St.Ack
On Fri, Sep 24, 2010 at 3:07 PM, Jack Levin wrote:
> My regions are 1gb in size and wh
My regions are 1gb in size and when I cold start the cluster I oversaturate my
network links (1000 mbps) and get client dfs timeouts , anyway to slow the m
down?
-Jack
21 matches
Mail list logo