How to limit the number of logs that producted by DailyRollingFileAppender ?
I find the logs are exceeding disk apace limit.
Hi
I've been considering a slightly different scenario.
In this scenario I'd hash the column qualifier and mod by some constant and
append the result to the rowkey. The idea is to spread the writes for a
specific rowkey among the various regions. Mod by the constant gives
control over how many ra
all was working fine and suddenly I see a lot of logs like below
2011-02-15 22:19:04,023 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 19.88 MB of total=168.64 MB
2011-02-15 22:19:04,025 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCac
I've been playing with salting my keys as well keys. My current experiments are
around hashing the rowkey and using digits of that to create the prefix. That
would make your salts and your puts idempotent, but you do loose control of
data-locality.
-chris
On Feb 15, 2011, at 4:38 PM, Peter Hai
Hi All,
A couple of weeks ago I asked about how to distribute my rows across the
servers if the key always starts with the date in the format...
-MM-DD
I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when
'X' is a number from 1 to the number of servers I have. Thi
Thanks for the help. It definitely looks like the move to 0.90 would resolve
many of these issues.
-chris
On Feb 15, 2011, at 2:33 PM, Jean-Daniel Cryans wrote:
> That would make sense... although I've done testing and the more files
> you have to split, the longer it takes to create the refere
That would make sense... although I've done testing and the more files
you have to split, the longer it takes to create the reference files
so the longer the split. Now that I think of it, with your high
blocking store files setting, you may be running into an extreme case
of https://issues.apache.
No swapping, about 30% of the total CPU is idle, looking through ganglia I do
see a spike in cpu_wio at that time - but only to 2%. My suspect though is GZ
compression is just taking a while.
On Feb 15, 2011, at 2:10 PM, Jean-Daniel Cryans wrote:
> Yeah if it's the same key space that splits
Yeah if it's the same key space that splits, it could explain the
issue... 65 seconds is a long time! Is there any swapping going on?
CPU or IO starvation?
In that context I don't see any problem setting the pausing time higher.
J-D
On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas wrote:
> Hi JD,
Hi JD,
Two splits happened within 90 seconds of each other on one server - one took 65
seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1
second between) I think that was the issue. Are their any hidden issues to
raising those retry parameters so I can withstand a 120
On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas wrote:
>> We are definitely considering writing a bulk loader, but as it is this fits
>> into an existing processing pipeline that is not Java and does not fit into
>> the importtsv tool (w
0.90.0 has been out since Jan 19th (nearly a month). The 0.89 variant
you are running is substantially different in key areas than what is
current and published.
There are no fees for upgrading btw, it's completely free!
-ryan
On Tue, Feb 15, 2011 at 11:26 AM, Chris Tarnas wrote:
> We are runni
On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas wrote:
> We are definitely considering writing a bulk loader, but as it is this fits
> into an existing processing pipeline that is not Java and does not fit into
> the importtsv tool (we use column names as data as well) we have not done it
> yet.
We are running cdh3b3 - so next week when they go to b4 we'll be up to 0.90 -
I'm looking forward to it.
-chris
On Feb 15, 2011, at 11:05 AM, Ryan Rawson wrote:
> If you were using 0.90, that unhelpful error message would be much more
> helpful!
>
> On Tue, Feb 15, 2011 at 9:56 AM, Jean-Danie
We are definitely considering writing a bulk loader, but as it is this fits
into an existing processing pipeline that is not Java and does not fit into the
importtsv tool (we use column names as data as well) we have not done it yet. I
do foresee a Java bulk loader in our future though.
Does th
If you were using 0.90, that unhelpful error message would be much more helpful!
On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel Cryans wrote:
> Compactions are done in the background, they won't block writes.
>
> Regarding splitting time, it could be that it had to retry a bunch of
> times in such
Or based on int/long:
ID[i]=ID[i-1]+N
ID[0]=n
N - number of mappers or reducers in whi ids are generated
n - task id
Sent from my iPhone 4
On Feb 15, 2011, at 10:26 AM, Ryan Rawson wrote:
> Or the natural business key?
> On Feb 15, 2011 10:00 AM, "Jean-Daniel Cryans" wrote:
>> Try UUIDs.
>>
That's a great report Matt, thanks for sharing!
J-D
On Tue, Feb 15, 2011 at 10:52 AM, Matt Wheeler
wrote:
> Pre-creating regions using the byte[][] overload of createTable more or less
> doubled the performance of our main index table generation. Our keys start
> with hashes of the original r
Pre-creating regions using the byte[][] overload of createTable more or less
doubled the performance of our main index table generation. Our keys start
with hashes of the original record IDs, so the data can be evenly distributed
between all regions. The keys are ASCII strings starting with th
Or the natural business key?
On Feb 15, 2011 10:00 AM, "Jean-Daniel Cryans" wrote:
> Try UUIDs.
>
> J-D
>
> On Tue, Feb 15, 2011 at 8:57 AM, praba karan wrote:
>> Hi,
>>
>> I am having the Map Reduce program for the uploading the Bulk data into
the
>> Hbase-0.89 from HDFS file system. I need uniq
Try UUIDs.
J-D
On Tue, Feb 15, 2011 at 8:57 AM, praba karan wrote:
> Hi,
>
> I am having the Map Reduce program for the uploading the Bulk data into the
> Hbase-0.89 from HDFS file system. I need unique row ID for every row
> (millions of rows). So that overwriting in the hbase table is to be av
Compactions are done in the background, they won't block writes.
Regarding splitting time, it could be that it had to retry a bunch of
times in such a way that the write timed out, but I can't say for sure
without the logs.
Have you considered using the bulk loader? I personally would never
try t
Hey William:
Have you checked the mailing list archives. This topic has come up in
various guises in our past. Here's one such thread:
http://search-hadoop.com/m/4DQfl2TGBb22/hardware&subj=Hadoop+HBase+hardware+requirement
Hopefully this helps some.
St.Ack
On Tue, Feb 15, 2011 at 9:34 AM, Wi
Start with this:
http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/
Then regarding the number of servers... it's really hard to tell,
you'd have to test with a handful of machines first and see how they
perform under your type of load. Scaling is then as easy as adding the
new mac
Hi
Thinking of implementing Hbase on top of our data processing pipeline (Hadoop)
and was curious if there are some guidelines to memory needs, number of region
servers recommended based on the size of the grid/volume of data etc.
Any thoughts here would be appreciated I would interested in
Thanks, I added the new parameters and my client now runs as fast as the shell.
This makes it easier to debug import routines.
-Pete
-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Tuesday, February 15, 2011 7:10 AM
To: user@hbase.apache
I have a long running Hadoop streaming job that also puts about a billion sub
1kb rows into Hbase via thrift, and last night I got quite a few errors like
this one:
Still had 34 puts left after retrying 10 times.
Could that be caused by one or more long running compactions and a split? I'm
usi
Hi,
I am having the Map Reduce program for the uploading the Bulk data into the
Hbase-0.89 from HDFS file system. I need unique row ID for every row
(millions of rows). So that overwriting in the hbase table is to be avoided.
Any solution to overcome the Row ID problem without overwriting in the H
What Andrey describes is how the shell makes itself more prompt
responding to changes.
See src/main/ruby/hbase/hbase.rb
# Turn off retries in hbase and ipc. Human doesn't want to
wait on N retries.
configuration.setInt("hbase.client.retries.number", 7)
configuration.set
It is strage thing in hbase. Operations like create or drop are asyncronous,
so immidiately after first rpc 'disable'
hbase client try to check succesfull execution. Often it is not really
complete yet, so hbase client pauses an
amount of time configured in 'hbase.client.pause'. If you change it in
Oh, so you have that much time? Easy then... ;)
Congrats Steven and the whole team, you are awesome!
On Tue, Feb 15, 2011 at 11:37 AM, Steven Noels wrote:
> On Mon, Feb 14, 2011 at 6:28 PM, Stack wrote:
>
> Congrats lads. Keep the releases coming.
>>
>
> Just one more and we hit 1.0. Let's mak
On Mon, Feb 14, 2011 at 6:28 PM, Stack wrote:
Congrats lads. Keep the releases coming.
>
Just one more and we hit 1.0. Let's make this coincide with HBase 1.0. ;-)
Thanks!
Steven.
--
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily
32 matches
Mail list logo