all of them. Sincerely,
DB Tsai ---------------------------------------------------------- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Wed, Jun 17, 2015 at 5:15 PM, Raghav Shankar <raghav0110...@gmail.com> wrote: > So, I would add the assembly jar to the just the master or would I have to > add it to all the slaves/workers too? > > Thanks, > Raghav > >> On Jun 17, 2015, at 5:13 PM, DB Tsai <dbt...@dbtsai.com> wrote: >> >> You need to build the spark assembly with your modification and deploy >> into cluster. >> >> Sincerely, >> >> DB Tsai >> ---------------------------------------------------------- >> Blog: https://www.dbtsai.com >> PGP Key ID: 0xAF08DF8D >> >> >> On Wed, Jun 17, 2015 at 5:11 PM, Raghav Shankar <raghav0110...@gmail.com> >> wrote: >>> I’ve implemented this in the suggested manner. When I build Spark and attach >>> the new spark-core jar to my eclipse project, I am able to use the new >>> method. In order to conduct the experiments I need to launch my app on a >>> cluster. I am using EC2. When I setup my master and slaves using the EC2 >>> setup scripts, it sets up spark, but I think my custom built spark-core jar >>> is not being used. How do it up on EC2 so that my custom version of >>> Spark-core is used? >>> >>> Thanks, >>> Raghav >>> >>> On Jun 9, 2015, at 7:41 PM, DB Tsai <dbt...@dbtsai.com> wrote: >>> >>> Having the following code in RDD.scala works for me. PS, in the following >>> code, I merge the smaller queue into larger one. I wonder if this will help >>> performance. Let me know when you do the benchmark. >>> >>> def treeTakeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = >>> withScope { >>> if (num == 0) { >>> Array.empty >>> } else { >>> val mapRDDs = mapPartitions { items => >>> // Priority keeps the largest elements, so let's reverse the ordering. >>> val queue = new BoundedPriorityQueue[T](num)(ord.reverse) >>> queue ++= util.collection.Utils.takeOrdered(items, num)(ord) >>> Iterator.single(queue) >>> } >>> if (mapRDDs.partitions.length == 0) { >>> Array.empty >>> } else { >>> mapRDDs.treeReduce { (queue1, queue2) => >>> if (queue1.size > queue2.size) { >>> queue1 ++= queue2 >>> queue1 >>> } else { >>> queue2 ++= queue1 >>> queue2 >>> } >>> }.toArray.sorted(ord) >>> } >>> } >>> } >>> >>> def treeTop(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope { >>> treeTakeOrdered(num)(ord.reverse) >>> } >>> >>> >>> >>> Sincerely, >>> >>> DB Tsai >>> ---------------------------------------------------------- >>> Blog: https://www.dbtsai.com >>> PGP Key ID: 0xAF08DF8D >>> >>> On Tue, Jun 9, 2015 at 10:09 AM, raggy <raghav0110...@gmail.com> wrote: >>>> >>>> I am trying to implement top-k in scala within apache spark. I am aware >>>> that >>>> spark has a top action. But, top() uses reduce(). Instead, I would like to >>>> use treeReduce(). I am trying to compare the performance of reduce() and >>>> treeReduce(). >>>> >>>> The main issue I have is that I cannot use these 2 lines of code which are >>>> used in the top() action within my Spark application. >>>> >>>> val queue = new BoundedPriorityQueue[T](num)(ord.reverse) >>>> queue ++= util.collection.Utils.takeOrdered(items, num)(ord) >>>> >>>> How can I go about implementing top() using treeReduce()? >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-top-using-treeReduce-tp23227.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>> >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org