subject:"Performance tuning of sort"

Re: Performance tuning of sort

2010-06-17 Thread 李钰

Hi Jeff, Thanks a lot for your explanation. It really helps for understanding the details of job workflow. Hi all, Thanks a lot for your help. One more question, through monitoring data I find the iowait% is quite high. Do you think this normal for there's a lot of data read and written, as well

Re: Performance tuning of sort

2010-06-17 Thread Jeff Zhang

The scale of each reducer depends on the Partitioner. You can think of Partitioner as a Hash Function, and the reducer as bucket, So you can not expect that each bucket has same number of items. Skewed data distribution will make a few reducers cost much more time. 2010/6/18 李钰 : > Hi Jeff and

Re: Performance tuning of sort

2010-06-17 Thread 李钰

Hi Jeff and Amogh, Thanks for your comments! In my understanding, in the partitioning phase before spilling to disk, the threads will divide the data into partitions corresponding to the number of reducers, as described int the Definitive Guide. So I think the scale of input data should be the sam

Re: Performance tuning of sort

2010-06-17 Thread Amogh Vasekar

>>Since the scale of input data and operations of each reduce task is the same, >>what may cause the execution time of reduce tasks different? You should consider looking at the copy, shuffle and reduce times separately from JT UI to get better info. Many (dynamic) considerations like network

Re: Performance tuning of sort

2010-06-17 Thread Jeff Zhang

The input of each reducer is not same, it depends on the input data distribution and Partitioner. And the running time of each reducer consist of three phases: copy, sort and reducer. 2010/6/18 李钰 : > Hi Todd and Jeff, > > Thanks a lot for your discussion, it's really helpful to me. I'd like to >

Re: Performance tuning of sort

2010-06-17 Thread 李钰

Hi Todd and Jeff, Thanks a lot for your discussion, it's really helpful to me. I'd like to express my especial appreciation for Todd's patient explanation, you help me see more clearly about the working mechanism of SORT. And Jeff, really thank you for reminding me that sort uses TotalOrderPartiti

Re: Performance tuning of sort

2010-06-17 Thread Todd Lipcon

On Thu, Jun 17, 2010 at 9:37 AM, Jeff Zhang wrote: > Todd, > > Why's there a sorting in map task, the sorting here seems useless in my > opinion. > > For map-only jobs there isn't. For jobs with reduce, typically the number of reduce tasks is smaller than the number of map tasks, so parallelizing

Re: Performance tuning of sort

2010-06-17 Thread Jeff Zhang

Todd, Why's there a sorting in map task, the sorting here seems useless in my opinion. On Thu, Jun 17, 2010 at 9:26 AM, Todd Lipcon wrote: > On Thu, Jun 17, 2010 at 12:43 AM, Jeff Zhang wrote: > >> Your understanding of Sort is not right. The key concept of Sort is >> the TotalOrderPartitione

Re: Performance tuning of sort

2010-06-17 Thread Todd Lipcon

On Thu, Jun 17, 2010 at 12:43 AM, Jeff Zhang wrote: > Your understanding of Sort is not right. The key concept of Sort is > the TotalOrderPartitioner. Actually before the map-reduce job, client > side will do sampling of input data to estimate the distribution of > input data. And the mapper do n

Re: Performance tuning of sort

2010-06-17 Thread 李钰

Hi Jeff, Really thank you for your reply. It really helps! I'll take a look at TotalOrderPartitioner carefully. BTW, what's your opinion of where the bottleneck lies in SORT, and which parameters impact the performance of SORT most? Looking forward to your reply, thanks. Dear all, Any other comm

Re: Performance tuning of sort

2010-06-17 Thread Jeff Zhang

Your understanding of Sort is not right. The key concept of Sort is the TotalOrderPartitioner. Actually before the map-reduce job, client side will do sampling of input data to estimate the distribution of input data. And the mapper do nothing, each reducer will fetch its data according the TotalOr

Performance tuning of sort

2010-06-17 Thread 李钰

Hi all, I'm doing some tuning of the sort benchmark of hadoop. To be more specified, running test against the org.apache.hadoop.examples.Sort class. As looking through the source code, I think the map tasks take responsibility of sorting the input data, and the reduce tasks just merge the map outp

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Re: Performance tuning of sort

Performance tuning of sort

12 matches

Site Navigation

Mail list logo

Footer information