Re: How to order all the output file if I use more than one reduce node?

2008-08-06 Thread Taeho Kang
You may want to write a partitioner that partitions the output from mappers in a way that fits your definition of sorted data (e.g. all keys in part-1 are greater than those in part-0.) Once you've done it, just merging all the reduce output from 0 to N will give you a sorted result file.

Re: How to order all the output file if I use more than one reduce node?

2008-08-06 Thread Kevin
I suppose you meant to sort the result globally across files. AFAIK, This is not currently supported unless you have only one reducer. It is said that version 0.19 will introduce such capability. -Kevin On Wed, Aug 6, 2008 at 6:01 PM, Xing <[EMAIL PROTECTED]> wrote: > If I use one node for redu

How to order all the output file if I use more than one reduce node?

2008-08-06 Thread Xing
If I use one node for reduce, hadoop can sort the result. If I use 30 nodes for reduce, the result is part-0 ~ part-00029. How make all the 30 parts sort globally and all the files in part-1 are greater that part-0 ? Thanks a lot Xing