subject:"Lifecycle of RDD in spark\-streaming"

Re: Lifecycle of RDD in spark-streaming

2014-11-27 Thread Harihar Nahak

email] <http://user/SendEmail.jtp?type=node&node=19835&i=1>* >> > > > > -- > > > Thanks & Regards, > > *[hidden email] <http://user/SendEmail.jtp?type=node&node=19835&i=2>* > > > -- > If you re

Re: Lifecycle of RDD in spark-streaming

2014-11-27 Thread Tathagata Das

If it regularly fails after 8 hours then could you get me the log4j logs? To limit the size, set default log level to Warn and the level of logs for all classes in package o.a.s.streaming to Debug. Then I can take a look. On Nov 27, 2014 11:01 AM, "Bill Jay" wrote: > Gerard, > > That is a good ob

Re: Lifecycle of RDD in spark-streaming

2014-11-27 Thread Bill Jay

Gerard, That is a good observation. However, the strange thing I meet is if I use "MEMORY_AND_DISK_SER, the job even fails earlier. In my case, it takes 10 seconds to process my data of every batch, which is one minute. It fails after 10 hours with the "cannot compute split" error. Bill On Thu,

Re: Lifecycle of RDD in spark-streaming

2014-11-27 Thread Gerard Maas

Hi TD, We also struggled with this error for a long while. The recurring scenario is when the job takes longer to compute than the job interval and a backlog starts to pile up. Hint: Check If the DStream storage level is set to "MEMORY_ONLY_SER" and memory runs out, then you will get a 'Cannot c

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread Bill Jay

Hi TD, I am using Spark Streaming to consume data from Kafka and do some aggregation and ingest the results into RDS. I do use foreachRDD in the program. I am planning to use Spark streaming in our production pipeline and it performs well in generating the results. Unfortunately, we plan to have a

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread Tathagata Das

Can you elaborate on the usage pattern that lead to "cannot compute split" ? Are you using the RDDs generated by DStream, outside the DStream logic? Something like running interactive Spark jobs (independent of the Spark Streaming ones) on RDDs generated by DStreams? If that is the case, what is ha

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread Bill Jay

Just add one more point. If Spark streaming knows when the RDD will not be used any more, I believe Spark will not try to retrieve data it will not use any more. However, in practice, I often encounter the error of "cannot compute split". Based on my understanding, this is because Spark cleared ou

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread Tathagata Das

Let me further clarify Lalit's point on when RDDs generated by DStreams are destroyed, and hopefully that will answer your original questions. 1. How spark (streaming) guarantees that all the actions are taken on each input rdd/batch. This is isnt hard! By the time you call streamingContext.start

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread tian zhang

I have found this paper seems to answer most of questions about life duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf Tian On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha wrote: Hey Experts, I wanted to understand in detail about the lifecycle

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread lalit1303

/Lifecycle-of-RDD-in-spark-streaming-tp19749p19850.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Lifecycle of RDD in spark-streaming

2014-11-25 Thread Mukesh Jha

Any pointers guys? On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha wrote: > Hey Experts, > > I wanted to understand in detail about the lifecycle of rdd(s) in a > streaming app. > > From my current understanding > - rdd gets created out of the realtime input stream. > - Transform(s) functions are ap

Lifecycle of RDD in spark-streaming

2014-11-25 Thread Mukesh Jha

Hey Experts, I wanted to understand in detail about the lifecycle of rdd(s) in a streaming app. >From my current understanding - rdd gets created out of the realtime input stream. - Transform(s) functions are applied in a lazy fashion on the RDD to transform into another rdd(s). - Actions are tak

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Re: Lifecycle of RDD in spark-streaming

Lifecycle of RDD in spark-streaming

12 matches

Site Navigation

Mail list logo

Footer information