Re: hbase insertion optimisation:

2011-03-20 Thread Oleg Ruchovets
On Sun, Mar 20, 2011 at 5:58 PM, Ted Yu wrote: > For 1), if you apply hashing to _, date prefix wouldn't be > useful. > You should evaluate the distribution of as row key. Assuming > distribution is uneven, you can apply hashing function to row key. > Using MurmurHash is as simple as: > MurmurHa

Re: hbase insertion optimisation:

2011-03-20 Thread Ted Yu
For 1), if you apply hashing to _, date prefix wouldn't be useful. You should evaluate the distribution of as row key. Assuming distribution is uneven, you can apply hashing function to row key. Using MurmurHash is as simple as: MurmurHash.getInstance().hash(rowkey, 0, rowkey.length, seed) For 2)

Re: hbase insertion optimisation:

2011-03-20 Thread Oleg Ruchovets
I took org.apache.hadoop.hbase.util.MurmurHash class and want to use it for my hashing. Till now I had key , value pairs (key format _) , Using MurmurHash I get hashing for my key. My questions is : 1) What is the way to use hashing. Meaning how code should be written so that inst

Re: hbase insertion optimisation:

2011-03-19 Thread Ted Yu
Timestamp is in every key value pair. Take a look at this method in Scan: public Scan setTimeRange(long minStamp, long maxStamp) Cheers On Sat, Mar 19, 2011 at 3:43 PM, Oleg Ruchovets wrote: > Good point , > let me explain the process. We choose the keys _ > because after insertion w

Re: hbase insertion optimisation:

2011-03-19 Thread Oleg Ruchovets
Good point , let me explain the process. We choose the keys _ because after insertion we run scans and want to analyse data which is related to the specific date. Can you provide more details using hashing and how can I scan hbase data per specific date using it. Oleg. On Sun, Mar 20,

Re: hbase insertion optimisation:

2011-03-19 Thread Ted Yu
I guess you chose date prefix for query consideration. You should introduce hashing so that the row keys are not clustered together. On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets wrote: > We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append). > currently we have ~ 10 millio

hbase insertion optimisation:

2011-03-19 Thread Oleg Ruchovets
We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append). currently we have ~ 10 million records per day.We use map/reduce to prepare data , and write it to hbase using chunks of data (5000 puts every chunk) All process takes 1h 20 minutes. Making some tests verified that wri