Re: Spark closures behavior in local mode in IDEs

2021-02-26 Thread Sheel Pancholi
code to explain why the > spark-sql version operates differently, as that also appears to be local. > To be clear this 'shouldn't' work, just happens to not fail in local > execution. > > On Fri, Feb 26, 2021 at 10:40 AM Sheel Pancholi > wrote: > >> >> I

Re: Spark closures behavior in local mode in IDEs

2021-02-26 Thread Sheel Pancholi
s possible > to write code that works in local mode, but doesn’t when you run > distributed. > > > > *From: *Sheel Pancholi > *Date: *Friday, February 26, 2021 at 4:24 AM > *To: *user > *Subject: *[EXTERNAL] Spark closures behavior in local mode in IDEs > > > >

Spark closures behavior in local mode in IDEs

2021-02-26 Thread Sheel Pancholi
Hi , I am observing weird behavior of spark and closures in local mode on my machine v/s a 3 node cluster (Spark 2.4.5). Following is the piece of code object Example { val num=5 def myfunc={ sc.parallelize(1 to 4).map(_+num).foreach(println) } } I expected this to fail regardless since

Re: Appropriate checkpoint interval in a spark streaming application

2020-08-15 Thread Sheel Pancholi
Guys any inputs explaining the rationale on the below question will really help. Requesting some expert opinion. Regards, Sheel On Sat, 15 Aug, 2020, 1:47 PM Sheel Pancholi, wrote: > Hello, > > I am trying to figure an appropriate checkpoint interval for my spark > streaming appl

Appropriate checkpoint interval in a spark streaming application

2020-08-15 Thread Sheel Pancholi
Hello, I am trying to figure an appropriate checkpoint interval for my spark streaming application. Its Spark Kafka integration based on Direct Streams. If my *micro batch interval is 2 mins*, and let's say *each microbatch takes only 15 secs to process* then shouldn't my checkpoint interval also

unsubscribe

2019-06-18 Thread Sheel Pancholi

[Spark Streaming]: Spark Checkpointing: Content, Recovery and Idempotency

2019-05-29 Thread Sheel Pancholi
Hello, I am trying to understand the *content* of a checkpoint and corresponding recovery; understanding the process of checkpointing is obviously the natural way of going about it and so I went over the following list: - medium post

Spark Streaming: Checkpointing Recovery and Idempotency

2019-05-29 Thread Sheel Pancholi
Hello, I am trying to understand the *content* of a checkpoint and corresponding recovery; understanding the process of checkpointing is obviously the natural way of going about it and so I went over the following list: - medium post

[Structured Streaming]: Are Spark Dataframes mutable in Structured Streaming?

2019-05-16 Thread Sheel Pancholi
Tagging mail to hopefully get a quicker response On Thu 16 May, 2019, 3:08 PM Sheel Pancholi, wrote: > Hello, > > Along with what I sent before, I want to add that I went over the > documentation at > https://github.com/apache/spark/blob/master/docs/structured-streaming-progr

Re: Are Spark Dataframes mutable in Structured Streaming?

2019-05-16 Thread Sheel Pancholi
is kind of sounding contradictory in my head. Could you please clarify what is it ultimately supposed to be? Regards Sheel On Thu, May 16, 2019 at 2:44 PM Sheel Pancholi wrote: > Hello Russell, > > Thanks for clarifying. I went over the Catalyst Optimizer Deep Dive video > at https:

Re: Are Spark Dataframes mutable in Structured Streaming?

2019-05-16 Thread Sheel Pancholi
e plan refer to static pieces of data, others > refer to data which is pulled in on each iteration. None of this changes > the DataFrame objects themselves. > > > > > On Wed, May 15, 2019 at 1:34 PM Sheel Pancholi > wrote: > >> Hi >> Structured Stream

Are Spark Dataframes mutable in Structured Streaming?

2019-05-15 Thread Sheel Pancholi
Hi Structured Streaming treats a stream as an unbounded table in the form of a DataFrame. Continuously flowing data from the stream keeps getting added to this DataFrame (which is the unbounded table) which warrants a change to the DataFrame which violates the vary basic nature of a DataFrame since

Re: Kafka Spark Streaming integration : Relationship between DStreams and Tasks

2019-05-12 Thread Sheel Pancholi
, 2019, 12:28 AM Sheel Pancholi, wrote: > Hello Everyone > I am trying to understand the internals of Spark Streaming (not Structured > Streaming), specifically the way tasks see the DStream. I am going over the > source code of Spark in scala, here <https://github.com/apache/spark>.

Kafka Spark Streaming integration : Relationship between DStreams and Tasks

2019-05-12 Thread Sheel Pancholi
Hello Everyone I am trying to understand the internals of Spark Streaming (not Structured Streaming), specifically the way tasks see the DStream. I am going over the source code of Spark in scala, here . I understand the call stack: ExecutorCoarseGrainedBackend (ma