Thank you!


[email protected]
 
From: Rajesh Balamohan
Date: 2015-06-03 10:43
To: user
Subject: Re: What is the difference between PipelinedSorter and DefaultSorter?
DefaultSorter is the same sorter implementation used in MapReduce world and is 
single threaded.  PipelinedSorter on the other hand works based on 
divide/conquer approach and works on multiple sort-spans which can be sorted by 
different threads. More details can be found in 
http://people.apache.org/~gopalv/PipelinedSorter.pdf.  

It is not possible to increase sort.mb to greater than 2 GB with defaultsorter 
implementation. With pipelinedsorter, it is possible to allocate more than 2 GB 
as sort buffer. This could be useful in scenarios where you have large 
containers and can allocate more than 2 GB for sort buffer to avoid potential 
disk spills. It is possible to control the number of threads allocated for 
sorting in PipelinedSorter using "tez.runtime.pipelined.sorter.sort.threads" 
(defaults to 2). Setting this to lot higher value might not be useful as it 
depends on the number of processors available in the system and the number of 
containers running on the system.  Depending on workloads, 2-4 could be a 
sweetspot. Starting Tez 0.7, PipelinedSorter has been made the defacto-sorter, 
though users can switch back to DefaultSorter (mapreduce world implementation) 
by setting "tez.runtime.sorter.class=LEGACY" 

~Rajesh.B

On Wed, Jun 3, 2015 at 7:18 AM, [email protected] <[email protected]> wrote:
In OrderedPartitionedKVOutput ,I see 
if (this.conf.getInt(TezRuntimeConfiguration.TEZ_RUNTIME_SORT_THREADS, 
    TezRuntimeConfiguration.TEZ_RUNTIME_SORT_THREADS_DEFAULT) > 1) { 
   sorter = new PipelinedSorter(getContext(), conf, getNumPhysicalOutputs(), 
   memoryUpdateCallbackHandler.getMemoryAssigned());
} else { 
    sorter = new DefaultSorter(getContext(), conf, getNumPhysicalOutputs(), 
   memoryUpdateCallbackHandler.getMemoryAssigned()); 
}

When set  tez.runtime.sort.threads >1  will choose PipelinedSorter .


[email protected]



-- 
~Rajesh.B

Reply via email to