Re: bulk deletes

2012-10-08 Thread Paul Mackles
org >Subject: Re: bulk deletes > >Does it work? :) > >How did you do the deletes before?I assume you used the >HTable.delete(List) API? > >(Doesn't really help you, but) In 0.92+ you could hook up a coprocessor >into the compactions and simply filter out any KVs you w

bulk deletes

2012-10-05 Thread Paul Mackles
We need to do deletes pretty regularly and sometimes we could have hundreds of millions of cells to delete. TTLs won't work for us because we have a fair amount of bizlogic around the deletes. Given their current implemention (we are on 0.90.4), this delete process can take a really long time

Re: Bulk Loads and Updates

2012-10-03 Thread Paul Mackles
Keys in hbase are a combination of rowkey/column/timestamp. Two records with the same rowkey but different column will result in two different cells with the same rowkey which is probably what you expect. For two records with the same rowkey and same column, the timestamp will normally differenti

Re: Tuning HBase for random reads

2012-09-26 Thread Paul Mackles
Though I haven't personally tried it yet, I have been told that the enabling the shortcut for local-client reads is very effective at speeding up random reads in hbase. More here: https://issues.apache.org/jira/browse/HDFS-2246 We are using the cloudera package which includes this patch in versio

Re: Mass dumping of data has issues

2012-09-24 Thread Paul Mackles
Did you adjust the writebuffer to a larger size and/or turn off autoFlush for the Htable? I've found that both of those settings can have a profound impact on write performance. You might also look at adjusting the handler count for the regionservers which by default is pretty low. You should also

Re: backup strategies

2012-08-16 Thread Paul Mackles
Hi Rita By default, the export that ships with hbase writes KeyValue objects to a sequence file. It is a very simple app and it wouldn't be hard to roll your own export program to write to whatever format you wanted (its a very simple app). You can use the current export program as a basis and jus

Re: Bulk import - key, value ambiguity

2012-08-04 Thread Paul Mackles
Probably because M/R requires a key and because you want M/R to sort on that key which is required for writing hfiles. On 8/4/12 8:22 AM, "Ioakim Perros" wrote: >Hi, > >Does anyone knows why at HFileOutputFormat the API ( >http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/HFileOu

hbase and disaster recovery

2012-08-01 Thread Paul Mackles
In case anyone is interested in hbase and disaster recovery, here is a writeup I just posted: http://bruteforcedata.blogspot.com/2012/08/hbase-disaster-recovery-and-whisky.html Feedback appreciated. Thanks, Paul

Re: MR hbase export is failing

2012-07-24 Thread Paul Mackles
I've seen this when writing when exporting to s3 and assumed it was related to write performance. We set hbase.regionserver.lease.period to the same values as the task timeout and it helped reduce the # of failures though we still get occasional task timeouts. I haven't seen this when writing to lo

Re: performance of Get from MR Job

2012-06-19 Thread Paul Mackles
One thing we observed with a similar setup was that if we added a reducer and then used something like HRegionPartitioner to partition the data, our GET performance improved dramatically. While you take a hit for adding the reducer, it was worth it in our case. We never quite figured out why that h

Re: Shared Cluster between HBase and MapReduce

2012-06-05 Thread Paul Mackles
For our setup we went with 2 clusters. We call one our "hbase cluster" and the other our "analytics cluster". For M/R jobs where hbase is the source and/or sink we usually run the jobs on the "hbase cluster" and so far its been fine (and you definitely want the data locality for these jobs). We als

Re: region size

2012-05-02 Thread Paul Mackles
Thanks for the tip Doug. Does that boost come largely from the HDFS improvements? On 5/2/12 7:52 PM, "Doug Meil" wrote: > >re: "with lackluster performance for random reads" > >You want to be on CDH3u3 for sure if you want to boost random read >performance

region size

2012-05-02 Thread Paul Mackles
I think the answer to this is "no", but I am hoping someone with more experience can confirm this… we are on hbase 0.90.4 (from cdh3u2). Some of our storefiles have grown into the 3-4GB range (we have 100GB max region size). Ignoring compactions, do large storefiles like this have a negative imp

RE: export/import for backup

2012-02-20 Thread Paul Mackles
ssage- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Monday, February 20, 2012 4:29 PM To: user@hbase.apache.org Subject: Re: export/import for backup On Mon, Feb 20, 2012 at 1:20 PM, Paul Mackles wrote: > We are on hbase 0.90.4 (cd3u2). We are using the sta

export/import for backup

2012-02-20 Thread Paul Mackles
and minor). Should we disable compactions while the import is running and then do it all at the end? We have our region-size set to 100GB right now so we can manage splitting. Thanks in advance for any recommendations. -- Paul Mackles, Senior Manager, Adobe

RE: bulkload on fully distributed mode - permissions

2011-12-13 Thread Paul Mackles
If you can chmod a+w the directory /user/dorner/bulkload/output/Tsp, hbase should be able to do what it needs to do (I am assuming the error is coming from completebulkload). It is trying to rename the files. -Original Message- From: Christopher Dorner [mailto:christopher.dor...@gmail.co