org
>Subject: Re: bulk deletes
>
>Does it work? :)
>
>How did you do the deletes before?I assume you used the
>HTable.delete(List) API?
>
>(Doesn't really help you, but) In 0.92+ you could hook up a coprocessor
>into the compactions and simply filter out any KVs you w
We need to do deletes pretty regularly and sometimes we could have hundreds of
millions of cells to delete. TTLs won't work for us because we have a fair
amount of bizlogic around the deletes.
Given their current implemention (we are on 0.90.4), this delete process can
take a really long time
Keys in hbase are a combination of rowkey/column/timestamp.
Two records with the same rowkey but different column will result in two
different cells with the same rowkey which is probably what you expect.
For two records with the same rowkey and same column, the timestamp will
normally differenti
Though I haven't personally tried it yet, I have been told that the
enabling the shortcut for local-client reads is very effective at speeding
up random reads in hbase. More here:
https://issues.apache.org/jira/browse/HDFS-2246
We are using the cloudera package which includes this patch in versio
Did you adjust the writebuffer to a larger size and/or turn off autoFlush
for the Htable? I've found that both of those settings can have a profound
impact on write performance. You might also look at adjusting the handler
count for the regionservers which by default is pretty low. You should
also
Hi Rita
By default, the export that ships with hbase writes KeyValue objects to a
sequence file. It is a very simple app and it wouldn't be hard to roll
your own export program to write to whatever format you wanted (its a very
simple app). You can use the current export program as a basis and jus
Probably because M/R requires a key and because you want M/R to sort on
that key which is required for writing hfiles.
On 8/4/12 8:22 AM, "Ioakim Perros" wrote:
>Hi,
>
>Does anyone knows why at HFileOutputFormat the API (
>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/HFileOu
In case anyone is interested in hbase and disaster recovery, here is a
writeup I just posted:
http://bruteforcedata.blogspot.com/2012/08/hbase-disaster-recovery-and-whisky.html
Feedback appreciated.
Thanks,
Paul
I've seen this when writing when exporting to s3 and assumed it was
related to write performance. We set hbase.regionserver.lease.period to
the same values as the task timeout and it helped reduce the # of failures
though we still get occasional task timeouts. I haven't seen this when
writing to lo
One thing we observed with a similar setup was that if we added a reducer
and then used something like HRegionPartitioner to partition the data, our
GET performance improved dramatically. While you take a hit for adding the
reducer, it was worth it in our case. We never quite figured out why that
h
For our setup we went with 2 clusters. We call one our "hbase cluster" and
the other our "analytics cluster". For M/R jobs where hbase is the source
and/or sink we usually run the jobs on the "hbase cluster" and so far its
been fine (and you definitely want the data locality for these jobs). We
als
Thanks for the tip Doug. Does that boost come largely from the HDFS
improvements?
On 5/2/12 7:52 PM, "Doug Meil" wrote:
>
>re: "with lackluster performance for random reads"
>
>You want to be on CDH3u3 for sure if you want to boost random read
>performance
I think the answer to this is "no", but I am hoping someone with more
experience can confirm this… we are on hbase 0.90.4 (from cdh3u2). Some of our
storefiles have grown into the 3-4GB range (we have 100GB max region size).
Ignoring compactions, do large storefiles like this have a negative imp
ssage-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Monday, February 20, 2012 4:29 PM
To: user@hbase.apache.org
Subject: Re: export/import for backup
On Mon, Feb 20, 2012 at 1:20 PM, Paul Mackles wrote:
> We are on hbase 0.90.4 (cd3u2). We are using the sta
and minor). Should we
disable compactions while the import is running and then do it all at the end?
We have our region-size set to 100GB right now so we can manage splitting.
Thanks in advance for any recommendations.
--
Paul Mackles, Senior Manager, Adobe
If you can chmod a+w the directory /user/dorner/bulkload/output/Tsp, hbase
should be able to do what it needs to do (I am assuming the error is coming
from completebulkload). It is trying to rename the files.
-Original Message-
From: Christopher Dorner [mailto:christopher.dor...@gmail.co
16 matches
Mail list logo