Hi all,
So far as I known, a SparkContext instance take in charge of some resources of
a cluster the master assigned to. And It is hardly shared with different
sparkcontexts. meanwhile, schedule between applications is also not easier.
To address this without introducing extra resource schedule
Hi all,
I got a strange problem, I submit a reduce job(any one split), it finished
normally on Executor, log is:
14/07/15 21:08:56 INFO Executor: Serialized size of result for 0 is 10476031
14/07/15 21:08:56 INFO Executor: Sending result for 0 directly to driver
14/07/15 21:08:56 INFO Executor:
Hi hequn, a relative question, is that mean the memory usage will doubled? And
further more, if the compute function in a rdd is not idempotent, rdd will
changed during the job running, is that right?
-原始邮件-
发件人: hequn cheng chenghe...@gmail.com
发送时间: 2014/3/25 9:35
收件人:
and the memory will be free soon.
Only cache() will persist your RDD in memory for a long time.
Second question:
Once RDD be created, it can not be changed due to the immutable feature.You can
only create a new RDD from the existing RDD or from file system.
2014-03-25 9:45 GMT+08:00 林武康
Large memory is need to build spark, I think you should make xmx larger, 2g for
example.
-原始邮件-
发件人: Bharath Bhushan manku.ti...@outlook.com
发送时间: 2014/3/22 12:50
收件人: user@spark.apache.org user@spark.apache.org
主题: unable to build spark - sbt/sbt: line 50: killed
I am getting the
Hi all, I changed spark.closure.serializer to kryo, when I try count action in
spark shell the Task obj deserialize in Executor return null, src line is:
override def run(){
..
task = ser.deserializer[Task[Any]](...)
..
}
Where task is null
Can any one help me? Thank you!
hi, I am a newbie of spark, the question below may seems fool, but I really
want some advices:
As load data from disk to generate an rdd is very cost in my applications, I
hope I can generate it once and cache it in memory, then any other spark
applications can refer to this rdd. Can this