I guess you chose date prefix for query consideration.
You should introduce hashing so that the row keys are not clustered
together.

On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets <oruchov...@gmail.com>wrote:

>   We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append).
> currently we have ~ 10 million records per day.We use map/reduce to prepare
> data , and write it to hbase using chunks of data (5000 puts  every chunk)
>   All process takes 1h 20 minutes. Making some tests verified that writing
> to hbase takes ~ 1 hour.
>
> I have couple of questions:
>  1) Reducers is writing  data which has a key like : <date>_<some_text> ,
> the strange is that   all records were written to a one node.
>
>    Is it correct behaviour? What is the way to get better distributions
> accross the cluster? Simply during insertion process  I saw that most load
> get that specific node where all data were inserted and all other nodes
> almost has no any resources utilisations (cpu , I/O ...).
>
> Oleg.
>

Reply via email to