On Fri, May 28, 2010 at 10:58 PM, juber patel wrote:
> Hello,
>
> Can Hadoop take advantage of the fact that the output of each map task
> is almost sorted?
The map output sort is efficient for mostly sorted output. The
dominant cost is the transfer costs of the shuffle and that won't be
helped b
Check out https://issues.apache.org/jira/browse/HADOOP-3442
and https://issues.apache.org/jira/browse/HADOOP-3308
On Fri, May 28, 2010 at 10:58 PM, juber patel wrote:
> Hello,
>
> Can Hadoop take advantage of the fact that the output of each map task
> is almost sorted?
>
> On a related note, D
Hello,
Can Hadoop take advantage of the fact that the output of each map task
is almost sorted?
On a related note, Does Hadoop's Quicksort implementation give worst
case performance on almost sorted data? Should I use Heapsort in its
place?
thanks,
Juber