Tracked it down:
http://pastebin.com/QaFktFKg
From my novice eyes it looks to have been played back cleanly and then deleted.
Thanks again!
-chris
On Mar 17, 2011, at 7:21 PM, Stack wrote:
> But did you see log of its replay of recovered.edits and then
> subsequent delete of this file just b
But did you see log of its replay of recovered.edits and then
subsequent delete of this file just before open (The file is only
deleted if we successfully opened a region).
St.Ack
On Thu, Mar 17, 2011 at 6:38 PM, Chris Tarnas wrote:
> I looked in the master log and the regionserver log that is h
I looked in the master log and the regionserver log that is hosting a formerly
damaged region now, but the only reference to it was during the 0.89 timeframe,
no EOFE after restart with 0.90.1.
thanks,
-chris
On Mar 17, 2011, at 6:30 PM, Stack wrote:
> I don't know. See the name of the file
I don't know. See the name of the file that failed w/ 0.89. Look
for it being replayed in your 0.90.1. Did it succeed or did we hit
EOFE toward of recovered.edits but in 0.90.1 keep going?
St.Ack
On Thu, Mar 17, 2011 at 6:26 PM, Chris Tarnas wrote:
> Good news, so I restarted with 0.90.1, a
On Thu, Mar 17, 2011 at 6:16 PM, Chris Tarnas wrote:
> So we loose this data, no recovery options?
>
Right. A tool to read these files is a few lines of jruby but its not
been done yet. If you want to work on it together, I'm game.
St.Ack
Good news, so I restarted with 0.90.1, and now have all 288 regions online
including the three problematic ones. Could it be those were already updated to
0.90.1 from my earlier attempt and 0.89 could not cope?
Thank you all!
-chris
On Mar 17, 2011, at 6:16 PM, Chris Tarnas wrote:
> So we loos
So we loose this data, no recovery options?
-chris
On Mar 17, 2011, at 6:13 PM, Stack wrote:
> Those files look like they were trashed on their tail. There is an
> issue on this, where recovered.edits files EOFE. For now, only 'soln'
> is to move them aside. Doesn't look related to your other
Those files look like they were trashed on their tail. There is an
issue on this, where recovered.edits files EOFE. For now, only 'soln'
is to move them aside. Doesn't look related to your other troubles.
May be from 0.89 since I have not seen this in a good while.
St.Ack
On Thu, Mar 17, 2011
Could these have been regions that were updated to 0.90.1 during the first
attempted startup? Should I now go back to that?
thank you,
-chris
On Mar 17, 2011, at 5:16 PM, Chris Tarnas wrote:
> I restarted it with 0.89 (CDHb3b3, patchedin the new hadoop jar), it has come
> up but is having trou
So you are configuring your Scan to scan only the specified start/stop
rows? I guess you could store that anywhere right? Like HBase?
J-D
On Thu, Mar 17, 2011 at 8:30 AM, Vishal Kapoor
wrote:
> If I have a bunch of MRs and I want to keep a tab on what they should
> process in terms of scope of r
I restarted it with 0.89 (CDHb3b3, patchedin the new hadoop jar), it has come
up but is having trouble opening three regions (of 285), from hbck:
ERROR: Region
sequence,8eUWjPYt2fBStS32zCJFzQ\x09A2740005-e5d6f259a1b7617eecd56aadd2867a24-1\x09,1299147700483.6b72bbe5fe43ae429215c1217cf8d6c6.
is n
Aw, thanks JD :)
Geoff, we'd be happy to help you over on the cdh-user list. Don't
hesitate to email - there's a good reason it's fixing permissions like
this.
Also, putting your tmp dir inside your data dir is not a good idea.
Would definitely discourage that.
-Todd
On Thu, Mar 17, 2011 at 4:5
Yes at the moment it's easier to use the Hadoop version shipped with
CDH3 in order to have durability (the other option being compiling the
0.20-append hadoop branch, which isn't so hard but then again I'm a
committer so my opinion is biased).
At StumbleUpon we use CDH3b2 that we patched a bit, an
When we looked at it here at SU the log was REALLY old. Is yours? If
really old, you have been living w/o the edits for a while anyways so
just remove and press on. Regards going back, we say no -- but sounds
like you didn't get off the ground so perhaps you can go back to
0.20.x to replay the o
Yep. I appreciate the info. Thanks. I wish I didn't have to use CDH, but
I was basically advised on this list that upgrading to CDH3 was a good
option (if not my best bet) to prevent the "data loss" that the HBase
admin page warns about.
-geoff
-Original Message-
From: jdcry...@gmail.com
I know I didn't have a clean shutdown, I thought I had hit HBASE-3038, but
looking further I first had a OOME on a region server. Can I revert to the oder
HBASE to reconstruct the log or has that ship sailed?
thanks,
-chris
On Mar 17, 2011, at 4:22 PM, Ryan Rawson wrote:
> If you know you had a
I think this is a general hbase issue, but I sent this to the Cloudera list
after reading recent messages :)
On Mar 17, 2011, at 4:20 PM, Chris Tarnas wrote:
>
>
> I just had to upgrade our second cluster CDH3B4 (the 2GB log file problem,
> same as the reason for upgrading another cluster)
If you know you had a clean shutdown just nuke all directories in /hbase/.logs
we hit this @ SU as well, its older logfile formats messing us up.
remember, only if you had a CLEAN shutdown, or else you lose data
On Thu, Mar 17, 2011 at 4:20 PM, Chris Tarnas wrote:
>
>
> I just had to upgrad
I just had to upgrade our second cluster CDH3B4 (the 2GB log file problem, same
as the reason for upgrading another cluster) and now the master is not coming
up, it dies with this error:
2011-03-17 18:15:24,209 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled
exception. Starting shutd
We tolerate some level of discussion regarding CDH when the issues
look more generic to HBase, like "I use CDH3b4 and the master has this
issue". The HBase version they ship isn't patched a lot, so an HBase
issue in CDH is mostly likely a real issue.
In your case the question is at the HDFS level
Fair. I've seen discussion of CDH3 on this list which is why I pinged.
Is it bad form to discuss CDH3 here?
-geoff
-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Thursday, March 17, 2011 4:05 PM
To: user@hbase.apache.org
Subj
Good question, you might want to ask that to cloudera.
J-D
On Thu, Mar 17, 2011 at 4:00 PM, Geoff Hendrey wrote:
> Hi -
>
>
>
> I just upgraded to CDH3B4. I noticed when I ran 'hadoop dfsadmin
> -upgrade' that the logs on the datanodes showed that hadoop was "fixing"
> the permissions of my dfs
Hi -
I just upgraded to CDH3B4. I noticed when I ran 'hadoop dfsadmin
-upgrade' that the logs on the datanodes showed that hadoop was "fixing"
the permissions of my dfs storage disks to "rwx--". I am just
wondering why it does this? I had been using a subdirectory of one of
the disks for ha
Hi,
> Patrick raised an issue that might be of concern... region splits.
Right. And if I understand correctly, if I want to have multiple CFs that grow
unevenly, these region splits are something I have to then be willing to accept.
> But barring that... what makes the most sense on retentio
Grep the namenode logs for one of the files throwing
FileNotFoundException. See if you can tell a story going by the grep
emissions. Someone is moving the file on you is how it looks. NN
logs might give you a clue.
St.Ack
On Thu, Mar 17, 2011 at 11:58 AM, Nichole Treadway wrote:
> Hi all,
>
>
On Wed, Mar 16, 2011 at 11:30 PM, Otis Gospodnetic
wrote:
> If I'm reading http://hbase.apache.org/book/schema.html#number.of.cfs
> correctly,
> the advice is not to have more than 2-3 CFs per table?
> And what happens if I have say 6 CFs per table?
>
> Again if I read the above page correctly, t
Hi all,
I am attempting to bulk load data into HBase using the importtsv program. I
have a very wide table (about 200 columns, 2 column families), and right now
I'm trying to load in data from a single data file with 1 million rows.
Importtsv works fine for this data when I am writing directly to
Ok - now I understand - doing pre-splits using the full binary space does not
make sense when using a limited range. I do all my splits in the base-64
character space or let hbase do them organically.
thanks for the explanation.
-chris
On Mar 17, 2011, at 11:32 AM, Ted Dunning wrote:
> Just th
Final tally on the import of a full days worth of search logs. The process
started out at 12 seconds per log and ended at 15 seconds per log. Previously,
the process started out at 24 seconds per log and ended at 154 seconds per log.
I think I'll stay with my current Hash Code Generation Algorit
Otis,
Patrick raised an issue that might be of concern... region splits.
But barring that... what makes the most sense on retention policies?
The point is that its a business issue that will be driving the logic.
Depending on a clarification from Patrick or JGray or JDCryans... you may want
t
Just that base-64 is not uniformly distributed relative to a binary
representation. This is simply because it is all printable characters. If
you do a 256 way pre-split based on a binary interpretation of the key, 64
regions will get traffic and 192 will get none. Among other things, this
can s
Ryan, Vishal,
Yep, right after I sent the email we figured out that the problem was on the
Hadoop side. We are tracking it down; thanks for the very quick responses.
Ron
Ronald Taylor, Ph.D.
Computatational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy
I'm not sure I am clear, are you saying 64 bit chunks of a MD5 keys are not
uniformly distributed? Or that a base-64 encoding is not evenly distributed?
thanks,
-chris
On Mar 17, 2011, at 10:23 AM, Ted Dunning wrote:
>
> There can be some odd effects with this because the keys are not uniforml
If you are in safe mode it's because not all datanodes have reported
in. So actually NO your hadoop did NOT come up properly.
Check your nn pages, look for any missing nodes. It won't help you
any more than telling you what is online or not.
Good luck!
-ryan
On Thu, Mar 17, 2011 at 11:12 AM, V
Can you add logging to your tasks?
Is it that hbase goes unavailable or is it that you are not pulling on
the progessable w/i the ten minute timeout.
You are not trying to post 4G of data to a single cell (I know you are
not but asking just in case).
St.Ack
On Thu, Mar 17, 2011 at 8:32 AM, jonh
you should have more info on why dfs is in the safe mode in the logs,
you can always leave safe mode
hadoop dfs -safemode leave
but again, thats a symptom, not a problem.
Vishal
On Thu, Mar 17, 2011 at 1:55 PM, Taylor, Ronald C wrote:
> Folks,
>
> We had a power outage here, and we are trying
Folks,
We had a power outage here, and we are trying to bring our Hadoop/HBase cluster
back up. Hadoop has been just fine - came up smoothly. HBase has not. Our HBase
master log file is filled with just one msg:
2011-03-17 10:50:08,712 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for
dfs
Hi,
> Patrick,
>
> Perhaps I misunderstood Otis' design.
>
> I thought he'd create the CF based on duration.
> So you could have a CF for (daily, weekly, monthly, annual, indefinite).
> So that you set up the table once with all CFs.
> Then you'd write the data to one and only one of those
On Thu, Mar 17, 2011 at 8:21 AM, Michael Segel wrote:
>
> Why not keep it simple?
>
> Use a SHA-1 hash of your key. See:
> http://codelog.blogial.com/2008/09/13/password-encryption-using-sha1-md5-java/
> (This was just the first one I found and there are others...)
>
Sha-1 is kind of slow.
>
>
Thanks for the explanation. Makes perfect sense now that you've explained it. That would incur a
huge write overhead so I see whey we don't keep the counts.
~Jeff
On 3/16/2011 2:59 PM, Matt Corgan wrote:
Jeff,
The problem is that when hbase receives a put or delete, it doesn't know if
the p
There can be some odd effects with this because the keys are not uniformly
distributed. Beware if you are using pre-split tables because the region
traffic can be pretty unbalanced if you do a naive split.
On Thu, Mar 17, 2011 at 9:20 AM, Chris Tarnas wrote:
> I've been using base-64 encoding w
Hash Code in Object is limited to an int and a quick look at HashMap and
Trove's HashMap looks like they are only using 31 bits of that. I am now trying
a modified version of what Ted pointed at and it seems to be working very well.
I modified the original since only the last few bytes in the ke
Thank Cryans, I'll try them!
On Fri, Mar 18, 2011 at 12:20 AM, Jean-Daniel Cryans wrote:
> You can limit the number of WALs and their size on the region server by
> tuning:
>
> hbase.regionserver.maxlogs the default is 32
> hbase.regionserver.hlog.blocksize the default is whatever your HDFS
> blo
You can limit the number of WALs and their size on the region server by tuning:
hbase.regionserver.maxlogs the default is 32
hbase.regionserver.hlog.blocksize the default is whatever your HDFS
blocksize times 0.95
You can limit the number of parallel threads in the master by tuning:
hbase.region
With 24 million elements you'd probably want a 64bit hash to minimize the risk
of collision, the rule of thumb is with 64bit hash key expect a collision when
you reach about 2^32 elements in your set. I half of a 128bit MD5 sum (a
cryptographic hash so you can only use parts of it if you want) a
Pardon my ignorance about jaql, but where in your job is HBase used?
J-D
On Thu, Mar 17, 2011 at 8:32 AM, jonh111 wrote:
>
> Hi,
>
> I'm running jaql over a cluster of 6 machines.
>
> When i run my jobs on small data it runs smoothly.
>
> However, when i use larger data (~4G) the following occur
Patrick,
Perhaps I misunderstood Otis' design.
I thought he'd create the CF based on duration.
So you could have a CF for (daily, weekly, monthly, annual, indefinite).
So that you set up the table once with all CFs.
Then you'd write the data to one and only one of those buckets.
The only time
Hi,
I'm running jaql over a cluster of 6 machines.
When i run my jobs on small data it runs smoothly.
However, when i use larger data (~4G) the following occurs:
I can see that alot of tasks which have been completed, go back to "pending"
state.
When this happens i get exceptions that look li
If I have a bunch of MRs and I want to keep a tab on what they should
process in terms of scope of row ids
eg,
first run :
startRow1
stopRowN
second run
startRowN+1
stopRowM
and similar for others,
is there any light weight accomplish this?
thanks,
vk
Otis,
Perhaps your biggest issue will be the need to disable the table to add a
new CF. So effectively you need to bring down the application to move in a
new tenant.
Another thing with multiple CFs is that if one CF tends to get
disproportionally more data, you will get a lot of region splitting
Why not keep it simple?
Use a SHA-1 hash of your key. See:
http://codelog.blogial.com/2008/09/13/password-encryption-using-sha1-md5-java/
(This was just the first one I found and there are others...)
So as long as your key is unique, the sha-1 hash should also be unique.
The reason I suggest s
Otis, you sure are busy blogging. ;-)
Ok but to answer your question... you want as few column families as possible.
When we first started looking at HBase, we tried to view the column families as
if they were relational tables and the key was a foreign key joining the two
tables.
(Its actuall
Hi Lars,
Many tks for your reply.
For now, I just rely on random or hashed keys and don't need any range
queries.
I will have to choose a nice solution one day for ordered keys upon
which I will range-query.
I will post the results of the different data models I will try (looking
for other t
Updating the lzo libraries resolved the problem. Thanks for pointing it
out and thanks to Todd Lipcon for his hadoop-lzo-packager.
On 03/16/2011 06:35 PM, Stack wrote:
Poking in our mail archives, does it help? For example:
http://search-hadoop.com/m/QMDV41Sh1GI/lzo+compression&subj=LZO+Compre
Thanks, I'll give that a try.
-Pete
On Thu, 17 Mar 2011 00:23:00 -0700, Ted Dunning
wrote:
Double hashing is a find thing. To actually answer the question,
though, I
would recommend Murmurhash or JOAAT (
http://en.wikipedia.org/wiki/Jenkins_hash_function)
On Wed, Mar 16, 2011 at 3:48 P
Double hashing is a find thing. To actually answer the question, though, I
would recommend Murmurhash or JOAAT (
http://en.wikipedia.org/wiki/Jenkins_hash_function)
On Wed, Mar 16, 2011 at 3:48 PM, Andrey Stepachev wrote:
> Try hash table with double hashing.
> Something like this
>
> http://ww
Hi,
In our tests, we've accumulated lots of WAL logs, in .logs, which leads to
quite long time pause or even
OOME when restarting either master or region server. We're doing sort of
bulk import and have not using
bulk import tricks, like turning off WAL feature. We think it's unknown how
our appli
57 matches
Mail list logo