Re: Implementing top() using treeReduce()

DB Tsai Wed, 17 Jun 2015 17:14:25 -0700

You need to build the spark assembly with your modification and deploy
into cluster.


Sincerely,

DB Tsai
----------------------------------------------------------
Blog: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Wed, Jun 17, 2015 at 5:11 PM, Raghav Shankar <[email protected]> wrote:
> I’ve implemented this in the suggested manner. When I build Spark and attach
> the new spark-core jar to my eclipse project, I am able to use the new
> method. In order to conduct the experiments I need to launch my app on a
> cluster. I am using EC2. When I setup my master and slaves using the EC2
> setup scripts, it sets up spark, but I think my custom built spark-core jar
> is not being used. How do it up on EC2 so that my custom version of
> Spark-core is used?
>
> Thanks,
> Raghav
>
> On Jun 9, 2015, at 7:41 PM, DB Tsai <[email protected]> wrote:
>
> Having the following code in RDD.scala works for me. PS, in the following
> code, I merge the smaller queue into larger one. I wonder if this will help
> performance. Let me know when you do the benchmark.
>
> def treeTakeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] =
> withScope {
>   if (num == 0) {
>     Array.empty
>   } else {
>     val mapRDDs = mapPartitions { items =>
>       // Priority keeps the largest elements, so let's reverse the ordering.
>       val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
>       queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
>       Iterator.single(queue)
>     }
>     if (mapRDDs.partitions.length == 0) {
>       Array.empty
>     } else {
>       mapRDDs.treeReduce { (queue1, queue2) =>
>         if (queue1.size > queue2.size) {
>           queue1 ++= queue2
>           queue1
>         } else {
>           queue2 ++= queue1
>           queue2
>         }
>       }.toArray.sorted(ord)
>     }
>   }
> }
>
> def treeTop(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope {
>   treeTakeOrdered(num)(ord.reverse)
> }
>
>
>
> Sincerely,
>
> DB Tsai
> ----------------------------------------------------------
> Blog: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
> On Tue, Jun 9, 2015 at 10:09 AM, raggy <[email protected]> wrote:
>>
>> I am trying to implement top-k in scala within apache spark. I am aware
>> that
>> spark has a top action. But, top() uses reduce(). Instead, I would like to
>> use treeReduce(). I am trying to compare the performance of reduce() and
>> treeReduce().
>>
>> The main issue I have is that I cannot use these 2 lines of code which are
>> used in the top() action within my Spark application.
>>
>> val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
>> queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
>>
>> How can I go about implementing top() using treeReduce()?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-top-using-treeReduce-tp23227.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Implementing top() using treeReduce()

Reply via email to