if you have strong data locality demands, then try
http://peregrine_mapreduce.bitbucket.org/ Its 2x faster then hadoop for
multipass job types. It has also very fast node recovery. I plan to do
this for hdfs, concept is similar to "virtual nodes".
Its not hadoop or HDFS compatible and it has n
Tejay,
The way the scheduler works you are not guaranteed to get one reducer per
node. Reducers are not scheduled based off of locality of any kind, and
even if they were the scheduler typically treats rack local the same as
node local. The partitioner interface only allows you to say what numer
Hi Tejay,
Building a consolidated index file for all your source files (for terms
within the source files) may not be doable this way. On the other hand,
building one index file per node is doable if you run a Reducer per Node &
use a Partitioner.
- Run one Reducer per node
- Let Mapper output ca
you need custom outputcomitter
First, I hope I'm posting this to the right list. I wasn't sure if developer
questions belonged here or on the user list.
Second, thanks for your thoughts.
So I have a situation in which I'm building an index across many files. I
don't want to send ALL the data across the wire by using reducer