Re: Implementing top() using treeReduce()

Raghav Shankar Wed, 17 Jun 2015 17:17:04 -0700

So, I would add the assembly jar to the just the master or would I have to add 
it to all the slaves/workers too?


Thanks,
Raghav

> On Jun 17, 2015, at 5:13 PM, DB Tsai <dbt...@dbtsai.com> wrote:
> 
> You need to build the spark assembly with your modification and deploy
> into cluster.
> 
> Sincerely,
> 
> DB Tsai
> ----------------------------------------------------------
> Blog: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
> 
> 
> On Wed, Jun 17, 2015 at 5:11 PM, Raghav Shankar <raghav0110...@gmail.com> 
> wrote:
>> I’ve implemented this in the suggested manner. When I build Spark and attach
>> the new spark-core jar to my eclipse project, I am able to use the new
>> method. In order to conduct the experiments I need to launch my app on a
>> cluster. I am using EC2. When I setup my master and slaves using the EC2
>> setup scripts, it sets up spark, but I think my custom built spark-core jar
>> is not being used. How do it up on EC2 so that my custom version of
>> Spark-core is used?
>> 
>> Thanks,
>> Raghav
>> 
>> On Jun 9, 2015, at 7:41 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>> 
>> Having the following code in RDD.scala works for me. PS, in the following
>> code, I merge the smaller queue into larger one. I wonder if this will help
>> performance. Let me know when you do the benchmark.
>> 
>> def treeTakeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] =
>> withScope {
>>  if (num == 0) {
>>    Array.empty
>>  } else {
>>    val mapRDDs = mapPartitions { items =>
>>      // Priority keeps the largest elements, so let's reverse the ordering.
>>      val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
>>      queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
>>      Iterator.single(queue)
>>    }
>>    if (mapRDDs.partitions.length == 0) {
>>      Array.empty
>>    } else {
>>      mapRDDs.treeReduce { (queue1, queue2) =>
>>        if (queue1.size > queue2.size) {
>>          queue1 ++= queue2
>>          queue1
>>        } else {
>>          queue2 ++= queue1
>>          queue2
>>        }
>>      }.toArray.sorted(ord)
>>    }
>>  }
>> }
>> 
>> def treeTop(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope {
>>  treeTakeOrdered(num)(ord.reverse)
>> }
>> 
>> 
>> 
>> Sincerely,
>> 
>> DB Tsai
>> ----------------------------------------------------------
>> Blog: https://www.dbtsai.com
>> PGP Key ID: 0xAF08DF8D
>> 
>> On Tue, Jun 9, 2015 at 10:09 AM, raggy <raghav0110...@gmail.com> wrote:
>>> 
>>> I am trying to implement top-k in scala within apache spark. I am aware
>>> that
>>> spark has a top action. But, top() uses reduce(). Instead, I would like to
>>> use treeReduce(). I am trying to compare the performance of reduce() and
>>> treeReduce().
>>> 
>>> The main issue I have is that I cannot use these 2 lines of code which are
>>> used in the top() action within my Spark application.
>>> 
>>> val queue = new BoundedPriorityQueue[T](num)(ord.reverse)
>>> queue ++= util.collection.Utils.takeOrdered(items, num)(ord)
>>> 
>>> How can I go about implementing top() using treeReduce()?
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-top-using-treeReduce-tp23227.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> 
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Implementing top() using treeReduce()

Reply via email to