TableInputFormat and number of mappers == number of regions

2014-08-27 Thread Fahri Surucu
Hi, I would like to find out how I can really reduce number of mappers to less than the number of regions in the hbase table. Could someone please let me know how to do that in pig while using load command as: LOAD 'hbase://$HBASE_TABLE' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage

Re: TableInputFormat and number of mappers == number of regions

2014-08-27 Thread Jean-Marc Spaggiari
Hi Fahri, It will be one mapper per region, but if you want to have less mappers running at the same time, you can reduce the size of your queue? That way you will still have X mappers in total, but only Y mappers will run in parallel. You can not configure X, you can configure Y. JM

Re: TableInputFormat and number of mappers == number of regions

2011-04-11 Thread Avery Ching
I found the code still exists in this code base for the old mapred interfaces src/main/java/org/apache/hadoop/hbase/mapred/TableInputFormatBase.java I'll adapt it for my needs. Thanks! Avery On Apr 9, 2011, at 9:55 AM, Jean-Daniel Cryans wrote: It's weird, I thought we already did something

TableInputFormat and number of mappers == number of regions

2011-04-09 Thread Avery Ching
Hi, First off, I'd like to say thanks to the developers for HBase, it's been fun to work with. I've been using TableInputFormat to run a Map-Reduce job and ran into an issue. Exception in thread main org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.IOException: The number

Re: TableInputFormat and number of mappers == number of regions

2011-04-09 Thread Jean-Daniel Cryans
You cannot have more mappers than you have regions, but you can have less. Try going that way. Also 149,624 regions is insane, is that really the case? I don't think i've ever seen such a large deploy and it's probably bound to hit some issues... J-D On Sat, Apr 9, 2011 at 9:15 AM, Avery Ching

Re: TableInputFormat and number of mappers == number of regions

2011-04-09 Thread Avery Ching
The number of regions is pretty insane, but not under my control unfortunately. The workaround I suggested is to write another InputFormat and InputSplit such that each InputSplit is responsible for a configurable number of regions. For example, if i have 100k regions and I configure each

Re: TableInputFormat and number of mappers == number of regions

2011-04-09 Thread Stack
Yes, you could make a different Splitter. Would be nice in the splitter if you could keep the locality where we have the Map task running on the TaskTracker that is adjacent to the hosting RegionServer. That shouldn't be hard. Study the current splitter and see how it juggles locations. Can

Re: TableInputFormat and number of mappers == number of regions

2011-04-09 Thread Jean-Daniel Cryans
It's weird, I thought we already did something like that and it seems that the old TableInputFormatBase does it but not the new one. From it's javadoc: * Splits are created in number equal to the smallest between numSplits and * the number of {@link HRegion}s in the table. If the number of