Re: StructuredStreaming - processing data based on new events in Kafka topic

2022-03-13 Thread karan alang
Hi Mich, The code I sent for the function 'convertToDictForEachBatch' is not the complete code. It does use the DF to do a bunch of transformations/operations. Specific to the problem I sent the email for : One piece of the code reloads the prediction data from Bigquery based on the 'event' in

Re: StructuredStreaming - processing data based on new events in Kafka topic

2022-03-12 Thread Mich Talebzadeh
There are a number of flaws here. You have defined your trigger based processing time within Spark Structured Streaming (SSS) as below trigger(processingTime='4 minutes') SSS will trigger every 4 minutes, in other words within a micro-batch of 4 minutes. This is what is known as micro-batch

Re: StructuredStreaming - processing data based on new events in Kafka topic

2022-03-12 Thread Mich Talebzadeh
How do you check if new data is in the topic and what happens if not? On Sat, 12 Mar 2022 at 00:40, karan alang wrote: > Hello All, > > I have a structured Streaming program, which reads data from Kafka topic, > and does some processing, and finally puts data into target Kafka Topic. > > Note :