Evening all, I ended up doing a map using the hashCode of the host's ip address; giving me reductions by machine. However, I am now experiencing memory problems processing sequence files of large TwoDWritableArrays; specifically it seems to process normally until its about to write the result -- then crashes. Is there any thing I can do when processing large sequence files other than increasing available heap?
Arni On Wed, Sep 2, 2015 at 5:08 PM, Arni Sumarlidason <sumarlida...@gmail.com> wrote: > I'm having problems getting my data reduced evenly across nodes. > > -> map a 200,000 line single text file and output <0L,line> > -> custom partitioner returning static member i++%numPartitions in an > attempt to distribute each line to as many reducers as possible > -> reduce; I end up with 13 or 18 nodes busy of 100 nodes. > > My hope is to have 300 containers on 100 nodes; each with ~666 lines each. > How can i achieve this? >