1. It is good idea to manage the region splits manually. For best practice, read
http://hbase.apache.org/book.html - 2.8.2.7. Managed Splitting

2. default hbase mapreduce splitter create a map-tasks for each of the regions, 
read more details at http://hbase.apache.org/book.html#splitter

Saurabh.

-----Original Message-----
From: roal...@gmail.com [mailto:roal...@gmail.com] On Behalf Of Roberto Alonso 
CIPF
Sent: Tuesday, March 27, 2012 5:07 AM
To: user@hbase.apache.org
Subject: questions about splits in regions

Hello All,

I have some doubts about hbase that hopefully you can help me.
My architecture is the next:
I have 4 servers(server_{1,2,3,4}) with 6GB Ram and 2 cores. I installed
hadoop in all of them, this is the configuration:
- server_1 is namenode, datanode and secondarynamenode, jobtracker
- server_2, server_3, server_4: datanodes, tasktracker
- server_2: zookeeper
The storage is aroung 500GB

I have a file with around 22000000 of records (it will grow) and I want to
put it in a table
So,
1. I create a table from code, should I split by myself the regions?in this
case, should I follow any strategy?  or let Hbase splits the region by
itself? what is it better?
2. After I put this table in Hbase I have a map reduce code that reads all
the rows and takes some rows of interest and it writes a file in the disk
(FileOutputFormat.setOutputPath(job, new Path( tmpPath )); it doesn't do
the reduce part). As I see in an htop to my servers, Hbase is reading the
table sequentially even if the table is splitted in the servers, so should
I configure my map reduce job to take the regions and do it in parallel
anyhow?
3. Also, I was wondering if I could use traditional Threads to throw more
than one map reduce job, or is it weird?

Thanks a lot, I am stack...




--
Roberto Alonso
Bioinformatics and Genomics Department
Centro de Investigacion Principe Felipe (CIPF)
C/E.P. Avda. Autopista del Saler, 16-3 (junto Oceanografico)
46012 Valencia, Spain
Tel: +34 963289680 Ext. 1021
Fax: +34 963289574
E-Mail: ralo...@cipf.es

Reply via email to