Re: Need some Clarification on checkpointing w.r.t Spark Structured Streaming

Michael Armbrust Mon, 11 Sep 2017 14:26:52 -0700

Checkpoints record what has been processed for a specific query, and as
such only need to be defined when writing (which is how you "start" a
query).

You can use the DataFrame created with readStream to start multiple
queries, so it wouldn't really make sense to have a single checkpoint there.

On Mon, Sep 11, 2017 at 2:36 AM, kant kodali <kanth...@gmail.com> wrote:

> Hi All,
>
> I was wondering if we need to checkpoint both read and write streams when
> reading from Kafka and inserting into a target store?
>
> for example
>
> sparkSession.readStream().option("checkpointLocation", "hdfsPath").load()
>
> vs
>
> dataSet.writeStream().option("checkpointLocation", "hdfsPath")
>
> Thanks!
>

Re: Need some Clarification on checkpointing w.r.t Spark Structured Streaming

Reply via email to