Re: Implementing top() using treeReduce()

DB Tsai Wed, 17 Jun 2015 17:17:26 -0700

all of them.

Sincerely,


DB Tsai
----------------------------------------------------------
Blog: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Wed, Jun 17, 2015 at 5:15 PM, Raghav Shankar <raghav0110...@gmail.com> wrote:
> So, I would add the assembly jar to the just the master or would I have to 
> add it to all the slaves/workers too?
>
> Thanks,
> Raghav
>
>> On Jun 17, 2015, at 5:13 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>>
>> You need to build the spark assembly with your modification and deploy
>> into cluster.
>>
>> Sincerely,
>>
>> DB Tsai
>> ----------------------------------------------------------
>> Blog: https://www.dbtsai.com
>> PGP Key ID: 0xAF08DF8D
>>
>>
>> On Wed, Jun 17, 2015 at 5:11 PM, Raghav Shankar <raghav0110...@gmail.com> 
>> wrote:
>>> I’ve implemented this in the suggested manner. When I build Spark and attach
>>> the new spark-core jar to my eclipse project, I am able to use the new
>>> method. In order to conduct the experiments I need to launch my app on a
>>> cluster. I am using EC2. When I setup my master and slaves using the EC2
>>> setup scripts, it sets up spark, but I think my custom built spark-core jar
>>> is not being used. How do it up on EC2 so that my custom version of
>>> Spark-core is used?
>>>
>>> Thanks,
>>> Raghav
>>>
>>> On Jun 9, 2015, at 7:41 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>>>
>>> Having the following code in RDD.scala works for me. PS, in the following
>>> code, I merge the smaller queue into larger one. I wonder if this will help
>>> performance. Let me know when you do the benchmark.
>>>
>>> def treeTakeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] =
>>> withScope {
>>>  if (num == 0) {
>>>    Array.empty
>>>  } else {
>>>    val mapRDDs = mapPartitions { items =>
>>>      // Priority keeps the largest elements, so let's reverse the ordering.
>>>      val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
>>>      queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
>>>      Iterator.single(queue)
>>>    }
>>>    if (mapRDDs.partitions.length == 0) {
>>>      Array.empty
>>>    } else {
>>>      mapRDDs.treeReduce { (queue1, queue2) =>
>>>        if (queue1.size > queue2.size) {
>>>          queue1 ++= queue2
>>>          queue1
>>>        } else {
>>>          queue2 ++= queue1
>>>          queue2
>>>        }
>>>      }.toArray.sorted(ord)
>>>    }
>>>  }
>>> }
>>>
>>> def treeTop(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope {
>>>  treeTakeOrdered(num)(ord.reverse)
>>> }
>>>
>>>
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> ----------------------------------------------------------
>>> Blog: https://www.dbtsai.com
>>> PGP Key ID: 0xAF08DF8D
>>>
>>> On Tue, Jun 9, 2015 at 10:09 AM, raggy <raghav0110...@gmail.com> wrote:
>>>>
>>>> I am trying to implement top-k in scala within apache spark. I am aware
>>>> that
>>>> spark has a top action. But, top() uses reduce(). Instead, I would like to
>>>> use treeReduce(). I am trying to compare the performance of reduce() and
>>>> treeReduce().
>>>>
>>>> The main issue I have is that I cannot use these 2 lines of code which are
>>>> used in the top() action within my Spark application.
>>>>
>>>> val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
>>>> queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
>>>>
>>>> How can I go about implementing top() using treeReduce()?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-top-using-treeReduce-tp23227.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Implementing top() using treeReduce()

Reply via email to