Hi,
I would like to find out how I can really reduce number of mappers to less
than the number of regions in the hbase table.
Could someone please let me know how to do that in pig while using load
command as:
LOAD 'hbase://$HBASE_TABLE' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage
Hi Fahri,
It will be one mapper per region, but if you want to have less mappers
running at the same time, you can reduce the size of your queue? That way
you will still have X mappers in total, but only Y mappers will run in
parallel. You can not configure X, you can configure Y.
JM
I found the code still exists in this code base for the old mapred interfaces
src/main/java/org/apache/hadoop/hbase/mapred/TableInputFormatBase.java
I'll adapt it for my needs. Thanks!
Avery
On Apr 9, 2011, at 9:55 AM, Jean-Daniel Cryans wrote:
It's weird, I thought we already did something
Hi,
First off, I'd like to say thanks to the developers for HBase, it's been fun to
work with.
I've been using TableInputFormat to run a Map-Reduce job and ran into an issue.
Exception in thread main org.apache.hadoop.ipc.RemoteException:
java.io.IOException: java.io.IOException: The number
You cannot have more mappers than you have regions, but you can have
less. Try going that way.
Also 149,624 regions is insane, is that really the case? I don't think
i've ever seen such a large deploy and it's probably bound to hit some
issues...
J-D
On Sat, Apr 9, 2011 at 9:15 AM, Avery Ching
The number of regions is pretty insane, but not under my control unfortunately.
The workaround I suggested is to write another InputFormat and InputSplit such
that each InputSplit is responsible for a configurable number of regions. For
example, if i have 100k regions and I configure each
Yes, you could make a different Splitter. Would be nice in the
splitter if you could keep the locality where we have the Map task
running on the TaskTracker that is adjacent to the hosting
RegionServer. That shouldn't be hard. Study the current splitter and
see how it juggles locations.
Can
It's weird, I thought we already did something like that and it seems
that the old TableInputFormatBase does it but not the new one. From
it's javadoc:
* Splits are created in number equal to the smallest between numSplits and
* the number of {@link HRegion}s in the table. If the number of