Hi,
I want to save an aggregate to a file without using any window, watermark
or groupBy. So, my aggregation is at entire column level.
df = spark.sql("select avg(col1) as aver from ds")
Now, the challenge is as follows -
1) If I use outputMode = Append, but "*Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark*"
query2 = df \
.writeStream \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()
2) If I use outputMode = Complete, but "*Data source parquet does not
support Complete output mode;*"
query2 = df \
.writeStream \
.outputMode("complete") \
.format("parquet") \
.option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
.option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
.trigger(processingTime='3 seconds') \
.start()
What to do? How to go about it?
Thanks,
Aakash.