I haven't tried spark 0.8 but I had similar problems with bringing up
the master node on previous versions of spark (0.7.x). I'm using this
command to start the master and it works for me:
./run spark.deploy.master.Master
Thanks,
Meisam
On Thu, Oct 10, 2013 at 5:14 AM, vinayak navale wrote:
> H
Hi,
I'm trying to use Spark to aggregate data.
I am doing something similar to this right now.
val groupByRdd = rdd.groupBy(x => (x._1,))
val aggregateRdd = groupByRdd map(x => (x._2.sum)
This works fine for smaller datasets but runs OOM for larger datasets
(the groupBy operation runs o
yKey(...) without having to manually wrap your RDD into a
> PairRDDFunctions; just add import org.apache.spark.SparkContext._ to your
> imports.
>
>
>
> On Mon, Nov 11, 2013 at 1:35 PM, Meisam Fathi
> wrote:
>>
>> Hi,
>>
>> I'm trying to use
Hi Community,
When an RDD in the application becomes unreachable and gets garbage
collected, how does Spark remove RDD's data from BlockManagers on the
worker nodes?
Thanks,
Meisam
anager removes data from the cache in a least-recently-used
> fashion as space fills up. If you’d like to remove an RDD manually before
> that, you can call rdd.unpersist().
>
> Matei
>
> On Nov 13, 2013, at 8:15 PM, Meisam Fathi wrote:
>
>> Hi Community,
>>
>&g
Hi Valentin,
data.filter() and rdd map() do not actually do the computation. When
you call count() or collect(), your RDD first dies the filter(), then
the map() and then the count() or collect().
See this for more info:
https://github.com/mesos/spark/wiki/Spark-Programming-Guide#transformations
Hi Jiacheng,
Each RDD has a partitioner. You can define your own partitioner if the
default partitioner does not suit your purpose.
You can take a look at this
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf.
Thanks,
Meisam
On Fri, Nov 15, 20
ss than a second!
> The problem comes when working with 1400k elements -
> .take(Int.MaxValue).size is not so quik.
> Best regards,
> Valentin
>
> 2013/11/14 Meisam Fathi :
>> Hi Valentin,
>>
>> data.filter() and rdd map() do not actually do the computation. When
>
).flatMapValues(
> x=>x). But I'm a bit worried whether this will create additional temp object
> collection, as result is first made into Seq the an collection of tupples.
> Any suggestion?
>
> Best Regards,
> Jiahcheng Guo
>
>
> On Sat, Nov 16, 2013 at 12:24 AM, Me
I asked the same question from Spark community a while ago
(http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCAByMnGtm2s2tyqLzw%2BMdGqgNBLbfhE6-kkZ4OPY4ANfZaDSu7Q%40mail.gmail.com%3E).
This is my understanding of how Spark works but I'd like one of the
Spark maintainers
10 matches
Mail list logo