// sort by 2nd element
Sorting.quickSort(pairs)(Ordering.by[(String, Int, Int), Int](_._2))
// sort by the 3rd element, then 1st
Sorting.quickSort(pairs)(Ordering[(Int, String)].on(x => (x._3, x._1)))



On Tue, Oct 20, 2015 at 11:33 AM, Carol McDonald <cmcdon...@maprtech.com>
wrote:

> this works
>
> val top10 = logs.filter(log => log.responseCode != 200).map(log =>
> (log.endpoint, 1)).reduceByKey(_ + _).top(10)(Ordering.by(_._2))
>
> or
>
> val top10 = logs.filter(log => log.responseCode != 200).map(log =>
> (log.endpoint, 1)).reduceByKey(_ + _).top(10)(Ordering.by(_._2))
>
> On Tue, Oct 20, 2015 at 11:07 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> I believe it will be most efficient to let top(n) do the work, rather
>> than sort the whole RDD and then take the first n. The reason is that top
>> and takeOrdered know they need at most n elements from each partition, and
>> then just need to merge those. It's never required to sort the whole thing.
>>
>> I also believe it will be marginally faster to provide an Ordering rather
>> than swap pairs just to use the natural Ordering, but, I don't know if it's
>> significant.
>>
>> Note that I think you can write "Ordering.by(_._2)" to be more concise
>> (not 100% sure about the syntax off the top of my head).
>>
>>
>>
>> On Tue, Oct 20, 2015 at 3:56 PM, Carol McDonald <cmcdon...@maprtech.com>
>> wrote:
>>
>>> To find the top 10 counts , which is better using top(10) with Ordering
>>> on the value,
>>> or swapping the key value and ordering on the key ?  For example which
>>> is better below ?
>>> Or does it matter
>>>
>>>  val top10 = logs.filter(log => log.responseCode != 200).map(log =>
>>> (log.endpoint, 1)).reduceByKey(_ + _).top(10)(Ordering[Long].on(x=>x._2))
>>>
>>>
>>>  val top10 = logs.filter(log => log.responseCode != 200).map(log =>
>>> (log.endpoint,
>>> 1)).reduceByKey((x,y)=>x+y).map(x=>(x._2,x._1)).sortByKey(false).take(10)
>>>
>>>
>>>  val top10 = logs.filter(log => log.responseCode != 200).map(log =>
>>> (log.endpoint, 1)).reduceByKey((x,y)=>x+y).map(pair => pair.swap).top(10)
>>>
>>>
>>
>

Reply via email to