1. It is good idea to manage the region splits manually. For best practice, read http://hbase.apache.org/book.html - 2.8.2.7. Managed Splitting
2. default hbase mapreduce splitter create a map-tasks for each of the regions, read more details at http://hbase.apache.org/book.html#splitter Saurabh. -----Original Message----- From: roal...@gmail.com [mailto:roal...@gmail.com] On Behalf Of Roberto Alonso CIPF Sent: Tuesday, March 27, 2012 5:07 AM To: user@hbase.apache.org Subject: questions about splits in regions Hello All, I have some doubts about hbase that hopefully you can help me. My architecture is the next: I have 4 servers(server_{1,2,3,4}) with 6GB Ram and 2 cores. I installed hadoop in all of them, this is the configuration: - server_1 is namenode, datanode and secondarynamenode, jobtracker - server_2, server_3, server_4: datanodes, tasktracker - server_2: zookeeper The storage is aroung 500GB I have a file with around 22000000 of records (it will grow) and I want to put it in a table So, 1. I create a table from code, should I split by myself the regions?in this case, should I follow any strategy? or let Hbase splits the region by itself? what is it better? 2. After I put this table in Hbase I have a map reduce code that reads all the rows and takes some rows of interest and it writes a file in the disk (FileOutputFormat.setOutputPath(job, new Path( tmpPath )); it doesn't do the reduce part). As I see in an htop to my servers, Hbase is reading the table sequentially even if the table is splitted in the servers, so should I configure my map reduce job to take the regions and do it in parallel anyhow? 3. Also, I was wondering if I could use traditional Threads to throw more than one map reduce job, or is it weird? Thanks a lot, I am stack... -- Roberto Alonso Bioinformatics and Genomics Department Centro de Investigacion Principe Felipe (CIPF) C/E.P. Avda. Autopista del Saler, 16-3 (junto Oceanografico) 46012 Valencia, Spain Tel: +34 963289680 Ext. 1021 Fax: +34 963289574 E-Mail: ralo...@cipf.es