I guess you chose date prefix for query consideration. You should introduce hashing so that the row keys are not clustered together.
On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets <oruchov...@gmail.com>wrote: > We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append). > currently we have ~ 10 million records per day.We use map/reduce to prepare > data , and write it to hbase using chunks of data (5000 puts every chunk) > All process takes 1h 20 minutes. Making some tests verified that writing > to hbase takes ~ 1 hour. > > I have couple of questions: > 1) Reducers is writing data which has a key like : <date>_<some_text> , > the strange is that all records were written to a one node. > > Is it correct behaviour? What is the way to get better distributions > accross the cluster? Simply during insertion process I saw that most load > get that specific node where all data were inserted and all other nodes > almost has no any resources utilisations (cpu , I/O ...). > > Oleg. >