On Sun, Mar 20, 2011 at 5:58 PM, Ted Yu wrote:
> For 1), if you apply hashing to _, date prefix wouldn't be
> useful.
> You should evaluate the distribution of as row key. Assuming
> distribution is uneven, you can apply hashing function to row key.
> Using MurmurHash is as simple as:
> MurmurHa
For 1), if you apply hashing to _, date prefix wouldn't be
useful.
You should evaluate the distribution of as row key. Assuming
distribution is uneven, you can apply hashing function to row key.
Using MurmurHash is as simple as:
MurmurHash.getInstance().hash(rowkey, 0, rowkey.length, seed)
For 2)
I took org.apache.hadoop.hbase.util.MurmurHash class and want to use it for
my hashing.
Till now I had key , value pairs (key format _) ,
Using MurmurHash I get hashing for my key.
My questions is :
1) What is the way to use hashing. Meaning how code should be written
so that inst
Timestamp is in every key value pair.
Take a look at this method in Scan:
public Scan setTimeRange(long minStamp, long maxStamp)
Cheers
On Sat, Mar 19, 2011 at 3:43 PM, Oleg Ruchovets wrote:
> Good point ,
> let me explain the process. We choose the keys _
> because after insertion w
Good point ,
let me explain the process. We choose the keys _
because after insertion we run scans and want to analyse data which is
related to the specific date.
Can you provide more details using hashing and how can I scan hbase data per
specific date using it.
Oleg.
On Sun, Mar 20,
I guess you chose date prefix for query consideration.
You should introduce hashing so that the row keys are not clustered
together.
On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets wrote:
> We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append).
> currently we have ~ 10 millio
We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append).
currently we have ~ 10 million records per day.We use map/reduce to prepare
data , and write it to hbase using chunks of data (5000 puts every chunk)
All process takes 1h 20 minutes. Making some tests verified that wri