Re: [Spark Structured Streaming] retry/replay failed messages

2021-07-09 Thread Bruno Oliveira
n this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 9 Jul 2021 at 23:53, Bruno Oliveira wrote: > >> That is exactly the case,

Re: [Spark Structured Streaming] retry/replay failed messages

2021-07-09 Thread Bruno Oliveira
? > > Right now you are discarding those transactions that didn't match so you > instead would need to persist them somewhere and either reinject them into > the job that does lookup (say after x minutes) > > Is this what you are looking for? > > On Fri, 9 Jul 2021, 9:44 pm Bruno

Re: [Spark Structured Streaming] retry/replay failed messages

2021-07-09 Thread Bruno Oliveira
aimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages aris

Re: [Spark Structured Streaming] retry/replay failed messages

2021-07-09 Thread Bruno Oliveira
ion say *sendToSink *you can get the df and batchId > > def sendToSink(df, batchId): > if(len(df.take(1))) > 0: > print(f"""md batchId is {batchId}""") > df.show(100,False) > df. persist() > # write to BigQuery

Re: [Spark Structured Streaming] retry/replay failed messages

2021-07-09 Thread Bruno Oliveira
g on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 9 Jul 2021 at 13:44, Bruno Oliveira wrote: > >> Hello guys, >> >>

[Spark Structured Streaming] retry/replay failed messages

2021-07-09 Thread Bruno Oliveira
Hello guys, I've been struggling with this for some days now, without success, so I would highly appreciate any enlightenment. The simplified scenario is the following: - I've got 2 topics in Kafka (it's already like that in production, can't change it) - transactions-created,