SPARK-20325 - Spark Structured Streaming documentation Update: checkpoint configuration

Katherin Eri Fri, 14 Apr 2017 01:16:18 -0700

Hello, guys.

I have initiated the ticket
https://issues.apache.org/jira/browse/SPARK-20325 ,


My case was: I launch two streams from one source stream *streamToProcess *like
this


streamToProcess

        .groupBy(metric)

        .agg(count(metric))

        .writeStream

        .outputMode("complete")

        .option("checkpointLocation", checkpointDir)

        .foreach(kafkaWriter)

        .start()


After that I’ve got an exception:

        Cannot start query with id bf6a1003-6252-4c62-8249-c6a189701255 as
another query with same id is already active. Perhaps you are attempting to
restart a query from checkpoint that is already active.


It is caused by that *StreamingQueryManager.scala* get the checkpoint dir
from stream’s configuration, and because my streams have equal
checkpointDirs, the second stream tries to recover instead of creating of
new one.For more details watch the ticket: SPARK-20325


1)  could we update documentation for Structured Streaming and describe
that checkpointing could be specified by
spark.sql.streaming.checkpointLocation on SparkSession level and thus
automatically checkpoint dirs will be created per foreach query?

2) Do we really need to specify the checkpoint dir per query? what the
reason for this? finally we will be forced to write some checkpointDir name
generator, for example associate it with some particular named query and so
on?

-- 

*Yours faithfully, *

*Kate Eri.*

SPARK-20325 - Spark Structured Streaming documentation Update: checkpoint configuration

Reply via email to