Re: almost sorted map output

2010-05-31 Thread Owen O'Malley
On Fri, May 28, 2010 at 10:58 PM, juber patel wrote: > Hello, > > Can Hadoop take advantage of the fact that the output of each map task > is almost sorted? The map output sort is efficient for mostly sorted output. The dominant cost is the transfer costs of the shuffle and that won't be helped b

Re: almost sorted map output

2010-05-30 Thread Ted Yu
Check out https://issues.apache.org/jira/browse/HADOOP-3442 and https://issues.apache.org/jira/browse/HADOOP-3308 On Fri, May 28, 2010 at 10:58 PM, juber patel wrote: > Hello, > > Can Hadoop take advantage of the fact that the output of each map task > is almost sorted? > > On a related note, D

almost sorted map output

2010-05-28 Thread juber patel
Hello, Can Hadoop take advantage of the fact that the output of each map task is almost sorted? On a related note, Does Hadoop's Quicksort implementation give worst case performance on almost sorted data? Should I use Heapsort in its place? thanks, Juber