Re: Row get very slow

2011-11-14 Thread Arvind Jayaprakash
On Nov 13, Stack wrote: >On Sun, Nov 13, 2011 at 7:13 AM, Arvind Jayaprakash >wrote: >> A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that >> MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume >> BLOCKSIZE represents that value. >We s

Re: Row get very slow

2011-11-13 Thread Arvind Jayaprakash
A common confusion is b/w MAX_FILESIZE and BLOCKSIZE. Given that MAX_FILESIZE is not listed on :60010/master.jsp, one tends to assume BLOCKSIZE represents that value. On Nov 10, lars hofhansl wrote: >"BLOCKSIZE => '536870912'" > > >You set your blocksize to 512mb? The default is 64k (65536), try t

Re: Counter columns in hbase

2011-09-09 Thread Arvind Jayaprakash
On Sep 06, sagar naik wrote: >I can dedup based on timestamp of the event. >Can I increment the counter value and assign the version as the timestamp of >this event ? Is it because you have an infinitesimally fine grained timestamp, you assume two events wont happen at the "same time" (as defined

Re: HBase Vs CitrusLeaf?

2011-09-07 Thread Arvind Jayaprakash
On Sep 07, lars hofhansl wrote: >Hi Arvind, > >This is interesting: > >> * Multiple machines can concurrently/actively handle requests for the >> same key, so the loss of one server does not mean that a range of keys >> is temporarily unavailable. A hbase cluster does have a partial, >> temporary o

Re: HBase Vs CitrusLeaf?

2011-09-07 Thread Arvind Jayaprakash
On Sep 06, Something Something wrote: >Anyway, before I spent a lot of time on it, I thought I should check if >anyone has compared HBase against CitrusLeaf. If you've, I would greatly >appreciate it if you would share your experiences. Disclaimer: I was an early evaluator/tester of citrusleaf ab

Re: Counter columns in hbase

2011-09-04 Thread Arvind Jayaprakash
On Sep 02, sagar naik wrote: >We are counting events for our application. >Sometimes, the same event arrives multiple times. >This leads to counting of same event multiple times. >Is there a way I can avoid this ? >(Say timestamp on value or filters ?) A basic data model question is do you want a

per table region size

2011-08-08 Thread Arvind Jayaprakash
It is possible to control the region size (hstore size) on a per table basis? I have certain applications where the overall keyspace is small but I'd like the data to spread nicely over many region servers that use a certain table and another one that has potentially 2 orders of magnitude of data a

Re: data structure

2011-07-17 Thread Arvind Jayaprakash
On Jul 14, Andre Reiter wrote: >new we are running mapreduce jobs, to generate a report: for example we >want to know how many impressions were done by all users in last x >days. therefore the scan of the MR job is running over all data in our >hbase table for the particular family. this takes at t

Re: Hbase performance with HDFS

2011-07-11 Thread Arvind Jayaprakash
On Jul 07, Andrew Purtell wrote: >> Since HDFS is mostly write once how are updates/deletes handled? > >Not mostly, only write once. > >Deletes are just another write, but one that writes tombstones >"covering" data with older timestamps.  > >When serving queries, HBase searches store files back in

Re: follow up question on row key schema design

2011-06-06 Thread Arvind Jayaprakash
On Jun 02, Sam Seigal wrote: > - > >My eventId can be one of 12 distinct values (let us say from A-L) , and I >have a 4 node cluster running HBase right now. > >After doing some research in our OLTP database, I found that the majority >(about 45% of the data) from the last 6 months written in the

Re: Harvesting empty regions

2011-05-31 Thread Arvind Jayaprakash
On May 31, Ferdy Galema wrote: >You can use the merge tool to combine adjacent regions. It requires a >bit of manual work because you need to specify the regions by hand. The >cluster also needs to be offline (I recommend to keep zookeeper running >though). Check if merging succeeded with the hb

Harvesting empty regions

2011-05-30 Thread Arvind Jayaprakash
My setup seems to have a lot of regions with no data that just keep accumulating over time. Here are some details: I have time-series data (created by opentsdb) being inserted into hbase every minute. Since the data has little value after say 15 days, I go ahead and delete all old data. When I lo