Hello all, I’m just trying to build a pipeline reading data from a streaming source and write to orc file. But I don’t see any file that is written to the file system nor any exceptions
Here is an example val df = spark.readStream.format(“...") .option( “Topic", "Some topic" ) .load() val q = df.writeStream.format("orc").option("path", "gs://testdata/raw") .option("checkpointLocation", "gs://testdata/raw_chk").trigger(Trigger.ProcessingTime(5, TimeUnit.SECONDS)).start q.awaitTermination(1200000) q.stop() I couldn’t find any file until 1200 seconds are over Does it mean all the data is cached in memory. If I keep the pipeline running I see no file would be flushed in the file system. How do I control how often spark streaming write to disk? Thanks!