Re: Determining regions with low HDFS locality index

2014-12-27 Thread Rahul Ravindran
investigate. -- Lars       From: Rahul Ravindran To: "user@hbase.apache.org" Sent: Thursday, December 25, 2014 11:37 PM Subject: Determining regions with low HDFS locality index   Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the regions get moved off that

Determining regions with low HDFS locality index

2014-12-25 Thread Rahul Ravindran
Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the regions get moved off that machine to other Region Servers. However, since the new region servers do not have the backing HFiles, data locality for the newly transitioned regions is not great and hence some of our job

Experience with HBASE-8283 and lots of small hfile

2014-04-03 Thread Rahul Ravindran
Hi,    We are currently on 0.94.2(CDH 4.2.1) and would likely upgrade to 0.94.15 (CDH 4.6) primarily to use the above fix. We have turned off automatic major compactions. We load data into an hbase table every 2 minutes. Currently, we are not using bulk load since it created compaction issues. W

Downside of too many HFiles

2013-06-12 Thread Rahul Ravindran
Hello, I am trying to understand the downsides of having a large number of hfiles by having a large hbase.hstore.compactionThreshold   This delays major compaction. However, the amount of data that needs to be read and re-written as a single hfile during major compaction will remain the same un

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
hook for an earlier version of the row?  Thanks, ~Rahul. From: Asaf Mesika To: "user@hbase.apache.org" ; Rahul Ravindran Sent: Tuesday, June 4, 2013 10:51 PM Subject: Re: Scan + Gets are disk bound On Tuesday, June 4, 2013, Rahul Ravindran

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
. From: Anoop John To: user@hbase.apache.org; Rahul Ravindran Cc: anil gupta Sent: Tuesday, June 4, 2013 10:44 PM Subject: Re: Scan + Gets are disk bound When you set time range on Scan, some files can get skipped based on the max min ts values in that file. Said this, when u

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
hotspotting. ~Rahul. From: anil gupta To: user@hbase.apache.org; Rahul Ravindran Sent: Tuesday, June 4, 2013 9:31 PM Subject: Re: Scan + Gets are disk bound On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran wrote: Hi, > >We are relatively new to Hbase,

Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Hi, We are relatively new to Hbase, and we are hitting a roadblock on our scan performance. I searched through the email archives and applied a bunch of the recommendations there, but they did not improve much. So, I am hoping I am missing something which you could guide me towards. Thanks in a

Re: Using HBase for Deduping

2013-02-19 Thread Rahul Ravindran
From: Michael Segel To: user@hbase.apache.org; Rahul Ravindran Sent: Friday, February 15, 2013 9:24 AM Subject: Re: Using HBase for Deduping Interesting.  Surround with a Try Catch?  But it sounds like you're on the right path.  Happy Coding! On Feb 15, 2013, at 11:12 AM,

Re: Using HBase for Deduping

2013-02-15 Thread Rahul Ravindran
@hbase.apache.org Cc: Rahul Ravindran Sent: Friday, February 15, 2013 4:36 AM Subject: Re: Using HBase for Deduping On Feb 15, 2013, at 3:07 AM, Asaf Mesika wrote: > Michael, this means read for every write? > Yes and no. At the macro level, a read for every write would mean that your

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
#'s ? >> >> It will be more helpful if you describe your final use case of the computed >> data too. Given the amount of back and forth, we can take it off list too >> and summarize the conversation for the list. >> >> On Thu, Feb 14, 2013 at 1:07 PM, Rahul R

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
We can't rely on the the assumption event dupes will not dupe outside an hour boundary. So, your take is that, doing a lookup per event within the MR job is going to be bad? From: Viral Bajaria To: Rahul Ravindran Cc: "user@hbase.apache.o

Using Hbase for Dedupping

2013-02-14 Thread Rahul Ravindran
Hi,    We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job which would perform a lookup for each UUID on HBase and then emit the event only if the UUID was absent and w

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
Most will be in the same hour. Some will be across 3-6 hours. Sent from my phone.Excuse the terseness. On Feb 14, 2013, at 12:19 PM, Viral Bajaria wrote: > Are all these dupe events expected to be within the same hour or they > can happen over multiple hours ? > > Viral &

Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
Hi,    We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job which would perform a lookup for each UUID on HBase and then emit the event only if the UUID was absent and w