Re: Bug in How to Monitor Streaming Queries in PySpark

2024-03-11 Thread 刘唯
Oh I see why the confusion. microbatch_data = event.progress means that microbatch_data is a StreamingQueryProgress instance, it's not a dictionary, so you should use ` microbatch_data.processedRowsPerSecond`, instead of the `get` method which is used for dictionaries. But weirdly, for query.las

Data ingestion into elastic failing using pyspark

2024-03-11 Thread Karthick Nk
Hi @all, I am using pyspark program to write the data into elastic index by using upsert operation (sample code snippet below). def writeDataToES(final_df): write_options = { "es.nodes": elastic_host, "es.net.ssl": "false", "es.nodes.wan.only": "true", "es.net.http.auth.user"

Re: Bugs with joins and SQL in Structured Streaming

2024-03-11 Thread Andrzej Zera
Hi, Do you think there is any chance for this issue to get resolved? Should I create another bug report? As mentioned in my message, there is one open already: https://issues.apache.org/jira/browse/SPARK-45637 but it covers only one of the problems. Andrzej wt., 27 lut 2024 o 09:58 Andrzej Zera

Re: Bug in How to Monitor Streaming Queries in PySpark

2024-03-11 Thread Mich Talebzadeh
Hi, Thank you for your advice This is the amended code def onQueryProgress(self, event): print("onQueryProgress") # Access micro-batch data microbatch_data = event.progress #print("microbatch_data received") # Check if data is received #print(microbatc