Hello all,

I’m just trying to build a pipeline reading data from a streaming source
and write to orc file. But I don’t see any file that is written to the file
system nor any exceptions

Here is an example

val df = spark.readStream.format(“...")
      .option(
        “Topic",
        "Some topic"
      )
      .load()
    val q = df.writeStream.format("orc").option("path", "gs://testdata/raw")
      .option("checkpointLocation",
"gs://testdata/raw_chk").trigger(Trigger.ProcessingTime(5,
TimeUnit.SECONDS)).start
    q.awaitTermination(1200000)
    q.stop()


I couldn’t find any file until 1200 seconds are over
Does it mean all the data is cached in memory. If I keep the pipeline
running I see no file would be flushed in the file system.

How do I control how often spark streaming write to disk?

Thanks!

Reply via email to