Can SparkContext shared across nodes/drivers

2014-09-21 Thread
Hi all, So far as I known, a SparkContext instance take in charge of some resources of a cluster the master assigned to. And It is hardly shared with different sparkcontexts. meanwhile, schedule between applications is also not easier. To address this without introducing extra resource schedule

Driver cannot receive StatusUpdate message for FINISHED

2014-07-15 Thread
Hi all, I got a strange problem, I submit a reduce job(any one split), it finished normally on Executor, log is: 14/07/15 21:08:56 INFO Executor: Serialized size of result for 0 is 10476031 14/07/15 21:08:56 INFO Executor: Sending result for 0 directly to driver 14/07/15 21:08:56 INFO Executor: F

issue of driver's HA

2014-04-08 Thread
Hi all, We got some troubles on the issue of driver's HA. we run a long-live driver on spark standalone mode which service as server that submit jobs as requests arrived. therefore we come across the issue of driver process's HA problem, like how to resume jobs after the driver process failed.

答复: 答复: RDD usage

2014-03-24 Thread
s stage and the memory will be free soon. Only cache() will persist your RDD in memory for a long time. Second question: Once RDD be created, it can not be changed due to the immutable feature.You can only create a new RDD from the existing RDD or from file system. 2014-03-25 9:45 GMT+08:00

答复: RDD usage

2014-03-24 Thread
Hi hequn, a relative question, is that mean the memory usage will doubled? And further more, if the compute function in a rdd is not idempotent, rdd will changed during the job running, is that right? -原始邮件- 发件人: "hequn cheng" 发送时间: ‎2014/‎3/‎25 9:35 收件人: "user@spark.apache.org" 主题: R

答复: unable to build spark - sbt/sbt: line 50: killed

2014-03-22 Thread
Large memory is need to build spark, I think you should make xmx larger, 2g for example. -原始邮件- 发件人: "Bharath Bhushan" 发送时间: ‎2014/‎3/‎22 12:50 收件人: "user@spark.apache.org" 主题: unable to build spark - sbt/sbt: line 50: killed I am getting the following error when trying to build spark.

答复: What's the lifecycle of an rdd? Can I control it?

2014-03-19 Thread
unpersist:Mark the RDD as non-persistent, and remove all blocks for it from memory and disk 2014-03-19 16:40 GMT+08:00 林武康 : Hi, can any one tell me about the lifecycle of an rdd? I search through the official website and still can't figure it out. Can I use an rdd in some stages and destroy it

What's the lifecycle of an rdd? Can I control it?

2014-03-19 Thread
Hi, can any one tell me about the lifecycle of an rdd? I search through the official website and still can't figure it out. Can I use an rdd in some stages and destroy it in order to release memory because that no stages ahead will use this rdd any more. Is it possible? Thanks! Sincerely Lin

KryoSerializer return null when deserialize Task obj in Executor

2014-03-18 Thread
Hi all, I changed spark.closure.serializer to kryo, when I try count action in spark shell the Task obj deserialize in Executor return null, src line is: override def run(){ .. task = ser.deserializer[Task[Any]](...) .. } Where task is null Can any one help me? Thank you!

Can two spark applications share rdd?

2014-03-14 Thread
hi, I am a newbie of spark, the question below may seems fool, but I really want some advices: As load data from disk to generate an rdd is very cost in my applications, I hope I can generate it once and cache it in memory, then any other spark applications can refer to this rdd. Can this possib

答复: Can spark-streaming work with spark-on-yarn mode?

2014-02-24 Thread
on YARN. Here is the documentation. TD On Thu, Feb 20, 2014 at 11:16 PM, 林武康 wrote: hi all, I am a very newbie of apache spark, recently I have tried spark on yarn, it works for batch process. Now we want to try streaming process using spark-streaming, and still, use yarn for resource scheduler as we