Re: One output file per node

2012-12-13 Thread Radim Kolar
if you have strong data locality demands, then try http://peregrine_mapreduce.bitbucket.org/ Its 2x faster then hadoop for multipass job types. It has also very fast node recovery. I plan to do this for hdfs, concept is similar to "virtual nodes". Its not hadoop or HDFS compatible and it has n

Re: One output file per node

2012-12-13 Thread Robert Evans
Tejay, The way the scheduler works you are not guaranteed to get one reducer per node. Reducers are not scheduled based off of locality of any kind, and even if they were the scheduler typically treats rack local the same as node local. The partitioner interface only allows you to say what numer

Re: One output file per node

2012-12-12 Thread Aloke Ghoshal
Hi Tejay, Building a consolidated index file for all your source files (for terms within the source files) may not be doable this way. On the other hand, building one index file per node is doable if you run a Reducer per Node & use a Partitioner. - Run one Reducer per node - Let Mapper output ca

Re: One output file per node

2012-12-12 Thread Radim Kolar
you need custom outputcomitter

One output file per node

2012-12-12 Thread Cardon, Tejay E
First, I hope I'm posting this to the right list. I wasn't sure if developer questions belonged here or on the user list. Second, thanks for your thoughts. So I have a situation in which I'm building an index across many files. I don't want to send ALL the data across the wire by using reducer