email] <http://user/SendEmail.jtp?type=node&node=19835&i=1>*
>>
>
>
>
> --
>
>
> Thanks & Regards,
>
> *[hidden email] <http://user/SendEmail.jtp?type=node&node=19835&i=2>*
>
>
> --
> If you re
If it regularly fails after 8 hours then could you get me the log4j logs?
To limit the size, set default log level to Warn and the level of logs for
all classes in package o.a.s.streaming to Debug. Then I can take a look.
On Nov 27, 2014 11:01 AM, "Bill Jay" wrote:
> Gerard,
>
> That is a good ob
Gerard,
That is a good observation. However, the strange thing I meet is if I use
"MEMORY_AND_DISK_SER, the job even fails earlier. In my case, it takes 10
seconds to process my data of every batch, which is one minute. It fails
after 10 hours with the "cannot compute split" error.
Bill
On Thu,
Hi TD,
We also struggled with this error for a long while. The recurring scenario
is when the job takes longer to compute than the job interval and a backlog
starts to pile up.
Hint: Check
If the DStream storage level is set to "MEMORY_ONLY_SER" and memory runs
out, then you will get a 'Cannot c
Hi TD,
I am using Spark Streaming to consume data from Kafka and do some
aggregation and ingest the results into RDS. I do use foreachRDD in the
program. I am planning to use Spark streaming in our production pipeline
and it performs well in generating the results. Unfortunately, we plan to
have a
Can you elaborate on the usage pattern that lead to "cannot compute
split" ? Are you using the RDDs generated by DStream, outside the
DStream logic? Something like running interactive Spark jobs
(independent of the Spark Streaming ones) on RDDs generated by
DStreams? If that is the case, what is ha
Just add one more point. If Spark streaming knows when the RDD will not be
used any more, I believe Spark will not try to retrieve data it will not
use any more. However, in practice, I often encounter the error of "cannot
compute split". Based on my understanding, this is because Spark cleared
ou
Let me further clarify Lalit's point on when RDDs generated by
DStreams are destroyed, and hopefully that will answer your original
questions.
1. How spark (streaming) guarantees that all the actions are taken on
each input rdd/batch.
This is isnt hard! By the time you call streamingContext.start
I have found this paper seems to answer most of questions about life
duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf
Tian
On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha
wrote:
Hey Experts,
I wanted to understand in detail about the lifecycle
/Lifecycle-of-RDD-in-spark-streaming-tp19749p19850.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Any pointers guys?
On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha wrote:
> Hey Experts,
>
> I wanted to understand in detail about the lifecycle of rdd(s) in a
> streaming app.
>
> From my current understanding
> - rdd gets created out of the realtime input stream.
> - Transform(s) functions are ap
Hey Experts,
I wanted to understand in detail about the lifecycle of rdd(s) in a
streaming app.
>From my current understanding
- rdd gets created out of the realtime input stream.
- Transform(s) functions are applied in a lazy fashion on the RDD to
transform into another rdd(s).
- Actions are tak
12 matches
Mail list logo