How to limit the number of logs that producted by DailyRollingFileAppender

2011-02-15 Thread 陈加俊
How to limit the number of logs that producted by DailyRollingFileAppender ? I find the logs are exceeding disk apace limit.

RE: Row Key Question

2011-02-15 Thread Gary Gilbert - SQLstream
Hi I've been considering a slightly different scenario. In this scenario I'd hash the column qualifier and mod by some constant and append the result to the rowkey. The idea is to spread the writes for a specific rowkey among the various regions. Mod by the constant gives control over how many ra

Hbase inserts very slow

2011-02-15 Thread Vishal Kapoor
all was working fine and suddenly I see a lot of logs like below 2011-02-15 22:19:04,023 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 19.88 MB of total=168.64 MB 2011-02-15 22:19:04,025 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCac

Re: Row Key Question

2011-02-15 Thread Chris Tarnas
I've been playing with salting my keys as well keys. My current experiments are around hashing the rowkey and using digits of that to create the prefix. That would make your salts and your puts idempotent, but you do loose control of data-locality. -chris On Feb 15, 2011, at 4:38 PM, Peter Hai

Row Key Question

2011-02-15 Thread Peter Haidinyak
Hi All, A couple of weeks ago I asked about how to distribute my rows across the servers if the key always starts with the date in the format... -MM-DD I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when 'X' is a number from 1 to the number of servers I have. Thi

Re: Put errors via thrift

2011-02-15 Thread Chris Tarnas
Thanks for the help. It definitely looks like the move to 0.90 would resolve many of these issues. -chris On Feb 15, 2011, at 2:33 PM, Jean-Daniel Cryans wrote: > That would make sense... although I've done testing and the more files > you have to split, the longer it takes to create the refere

Re: Put errors via thrift

2011-02-15 Thread Jean-Daniel Cryans
That would make sense... although I've done testing and the more files you have to split, the longer it takes to create the reference files so the longer the split. Now that I think of it, with your high blocking store files setting, you may be running into an extreme case of https://issues.apache.

Re: Put errors via thrift

2011-02-15 Thread Chris Tarnas
No swapping, about 30% of the total CPU is idle, looking through ganglia I do see a spike in cpu_wio at that time - but only to 2%. My suspect though is GZ compression is just taking a while. On Feb 15, 2011, at 2:10 PM, Jean-Daniel Cryans wrote: > Yeah if it's the same key space that splits

Re: Put errors via thrift

2011-02-15 Thread Jean-Daniel Cryans
Yeah if it's the same key space that splits, it could explain the issue... 65 seconds is a long time! Is there any swapping going on? CPU or IO starvation? In that context I don't see any problem setting the pausing time higher. J-D On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas wrote: > Hi JD,

Re: Put errors via thrift

2011-02-15 Thread Chris Tarnas
Hi JD, Two splits happened within 90 seconds of each other on one server - one took 65 seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1 second between) I think that was the issue. Are their any hidden issues to raising those retry parameters so I can withstand a 120

Re: Put errors via thrift

2011-02-15 Thread Chris Tarnas
On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote: > On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas wrote: >> We are definitely considering writing a bulk loader, but as it is this fits >> into an existing processing pipeline that is not Java and does not fit into >> the importtsv tool (w

Re: Put errors via thrift

2011-02-15 Thread Ryan Rawson
0.90.0 has been out since Jan 19th (nearly a month). The 0.89 variant you are running is substantially different in key areas than what is current and published. There are no fees for upgrading btw, it's completely free! -ryan On Tue, Feb 15, 2011 at 11:26 AM, Chris Tarnas wrote: > We are runni

Re: Put errors via thrift

2011-02-15 Thread Jean-Daniel Cryans
On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas wrote: > We are definitely considering writing a bulk loader, but as it is this fits > into an existing processing pipeline that is not Java and does not fit into > the importtsv tool (we use column names as data as well) we have not done it > yet.

Re: Put errors via thrift

2011-02-15 Thread Chris Tarnas
We are running cdh3b3 - so next week when they go to b4 we'll be up to 0.90 - I'm looking forward to it. -chris On Feb 15, 2011, at 11:05 AM, Ryan Rawson wrote: > If you were using 0.90, that unhelpful error message would be much more > helpful! > > On Tue, Feb 15, 2011 at 9:56 AM, Jean-Danie

Re: Put errors via thrift

2011-02-15 Thread Chris Tarnas
We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though. Does th

Re: Put errors via thrift

2011-02-15 Thread Ryan Rawson
If you were using 0.90, that unhelpful error message would be much more helpful! On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel Cryans wrote: > Compactions are done in the background, they won't block writes. > > Regarding splitting time, it could be that it had to retry a bunch of > times in such

Re: Need unique RowID in the Hbase table

2011-02-15 Thread Jason
Or based on int/long: ID[i]=ID[i-1]+N ID[0]=n N - number of mappers or reducers in whi ids are generated n - task id Sent from my iPhone 4 On Feb 15, 2011, at 10:26 AM, Ryan Rawson wrote: > Or the natural business key? > On Feb 15, 2011 10:00 AM, "Jean-Daniel Cryans" wrote: >> Try UUIDs. >>

Re: createTable with specified region splits: works great

2011-02-15 Thread Jean-Daniel Cryans
That's a great report Matt, thanks for sharing! J-D On Tue, Feb 15, 2011 at 10:52 AM, Matt Wheeler wrote: > Pre-creating regions using the byte[][] overload of createTable more or less > doubled the performance of our main index table generation.  Our keys start > with hashes of the original r

createTable with specified region splits: works great

2011-02-15 Thread Matt Wheeler
Pre-creating regions using the byte[][] overload of createTable more or less doubled the performance of our main index table generation. Our keys start with hashes of the original record IDs, so the data can be evenly distributed between all regions. The keys are ASCII strings starting with th

Re: Need unique RowID in the Hbase table

2011-02-15 Thread Ryan Rawson
Or the natural business key? On Feb 15, 2011 10:00 AM, "Jean-Daniel Cryans" wrote: > Try UUIDs. > > J-D > > On Tue, Feb 15, 2011 at 8:57 AM, praba karan wrote: >> Hi, >> >> I am having the Map Reduce program for the uploading the Bulk data into the >> Hbase-0.89 from HDFS file system. I need uniq

Re: Need unique RowID in the Hbase table

2011-02-15 Thread Jean-Daniel Cryans
Try UUIDs. J-D On Tue, Feb 15, 2011 at 8:57 AM, praba karan wrote: > Hi, > > I am having the Map Reduce program for the uploading the Bulk data into the > Hbase-0.89 from HDFS file system. I need unique row ID for every row > (millions of rows). So that overwriting in the hbase table is to be av

Re: Put errors via thrift

2011-02-15 Thread Jean-Daniel Cryans
Compactions are done in the background, they won't block writes. Regarding splitting time, it could be that it had to retry a bunch of times in such a way that the write timed out, but I can't say for sure without the logs. Have you considered using the bulk loader? I personally would never try t

Re: Hbase Hardware needs

2011-02-15 Thread Stack
Hey William: Have you checked the mailing list archives. This topic has come up in various guises in our past. Here's one such thread: http://search-hadoop.com/m/4DQfl2TGBb22/hardware&subj=Hadoop+HBase+hardware+requirement Hopefully this helps some. St.Ack On Tue, Feb 15, 2011 at 9:34 AM, Wi

Re: Hbase Hardware needs

2011-02-15 Thread Jean-Daniel Cryans
Start with this: http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ Then regarding the number of servers... it's really hard to tell, you'd have to test with a handful of machines first and see how they perform under your type of load. Scaling is then as easy as adding the new mac

Hbase Hardware needs

2011-02-15 Thread William Theisinger
Hi Thinking of implementing Hbase on top of our data processing pipeline (Hadoop) and was curious if there are some guidelines to memory needs, number of region servers recommended based on the size of the grid/volume of data etc. Any thoughts here would be appreciated I would interested in

RE: Truncate tables

2011-02-15 Thread Peter Haidinyak
Thanks, I added the new parameters and my client now runs as fast as the shell. This makes it easier to debug import routines. -Pete -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Tuesday, February 15, 2011 7:10 AM To: user@hbase.apache

Put errors via thrift

2011-02-15 Thread Chris Tarnas
I have a long running Hadoop streaming job that also puts about a billion sub 1kb rows into Hbase via thrift, and last night I got quite a few errors like this one: Still had 34 puts left after retrying 10 times. Could that be caused by one or more long running compactions and a split? I'm usi

Re:Need unique RowID in the Hbase table

2011-02-15 Thread praba karan
Hi, I am having the Map Reduce program for the uploading the Bulk data into the Hbase-0.89 from HDFS file system. I need unique row ID for every row (millions of rows). So that overwriting in the hbase table is to be avoided. Any solution to overcome the Row ID problem without overwriting in the H

Re: Truncate tables

2011-02-15 Thread Stack
What Andrey describes is how the shell makes itself more prompt responding to changes. See src/main/ruby/hbase/hbase.rb # Turn off retries in hbase and ipc. Human doesn't want to wait on N retries. configuration.setInt("hbase.client.retries.number", 7) configuration.set

Re: Truncate tables

2011-02-15 Thread Andrey Stepachev
It is strage thing in hbase. Operations like create or drop are asyncronous, so immidiately after first rpc 'disable' hbase client try to check succesfull execution. Often it is not really complete yet, so hbase client pauses an amount of time configured in 'hbase.client.pause'. If you change it in

Re: Lily 0.3 is released

2011-02-15 Thread Lars George
Oh, so you have that much time? Easy then... ;) Congrats Steven and the whole team, you are awesome! On Tue, Feb 15, 2011 at 11:37 AM, Steven Noels wrote: > On Mon, Feb 14, 2011 at 6:28 PM, Stack wrote: > > Congrats lads.  Keep the releases coming. >> > > Just one more and we hit 1.0. Let's mak

Re: Lily 0.3 is released

2011-02-15 Thread Steven Noels
On Mon, Feb 14, 2011 at 6:28 PM, Stack wrote: Congrats lads. Keep the releases coming. > Just one more and we hit 1.0. Let's make this coincide with HBase 1.0. ;-) Thanks! Steven. -- Steven Noels http://outerthought.org/ Scalable Smart Data Makers of Kauri, Daisy CMS and Lily