// sort by 2nd element Sorting.quickSort(pairs)(Ordering.by[(String, Int, Int), Int](_._2)) // sort by the 3rd element, then 1st Sorting.quickSort(pairs)(Ordering[(Int, String)].on(x => (x._3, x._1)))
On Tue, Oct 20, 2015 at 11:33 AM, Carol McDonald <cmcdon...@maprtech.com> wrote: > this works > > val top10 = logs.filter(log => log.responseCode != 200).map(log => > (log.endpoint, 1)).reduceByKey(_ + _).top(10)(Ordering.by(_._2)) > > or > > val top10 = logs.filter(log => log.responseCode != 200).map(log => > (log.endpoint, 1)).reduceByKey(_ + _).top(10)(Ordering.by(_._2)) > > On Tue, Oct 20, 2015 at 11:07 AM, Sean Owen <so...@cloudera.com> wrote: > >> I believe it will be most efficient to let top(n) do the work, rather >> than sort the whole RDD and then take the first n. The reason is that top >> and takeOrdered know they need at most n elements from each partition, and >> then just need to merge those. It's never required to sort the whole thing. >> >> I also believe it will be marginally faster to provide an Ordering rather >> than swap pairs just to use the natural Ordering, but, I don't know if it's >> significant. >> >> Note that I think you can write "Ordering.by(_._2)" to be more concise >> (not 100% sure about the syntax off the top of my head). >> >> >> >> On Tue, Oct 20, 2015 at 3:56 PM, Carol McDonald <cmcdon...@maprtech.com> >> wrote: >> >>> To find the top 10 counts , which is better using top(10) with Ordering >>> on the value, >>> or swapping the key value and ordering on the key ? For example which >>> is better below ? >>> Or does it matter >>> >>> val top10 = logs.filter(log => log.responseCode != 200).map(log => >>> (log.endpoint, 1)).reduceByKey(_ + _).top(10)(Ordering[Long].on(x=>x._2)) >>> >>> >>> val top10 = logs.filter(log => log.responseCode != 200).map(log => >>> (log.endpoint, >>> 1)).reduceByKey((x,y)=>x+y).map(x=>(x._2,x._1)).sortByKey(false).take(10) >>> >>> >>> val top10 = logs.filter(log => log.responseCode != 200).map(log => >>> (log.endpoint, 1)).reduceByKey((x,y)=>x+y).map(pair => pair.swap).top(10) >>> >>> >> >