subject:"Spark structured streaming \+ offset management in kafka \+ kafka headers"

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Gabor Somogyi

Just to be crystal clear Dstreams will be deprecated sooner or later and there will be no support so highly advised to migrate... G On Sun, 4 Apr 2021, 19:23 Ali Gouta, wrote: > Thanks Mich ! > > Ali Gouta. > > On Sun, Apr 4, 2021 at 6:44 PM Mich Talebzadeh > wrote: > >> Hi Ali, >> >> The old

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Ali Gouta

Thanks Mich ! Ali Gouta. On Sun, Apr 4, 2021 at 6:44 PM Mich Talebzadeh wrote: > Hi Ali, > > The old saying of one experiment is worth a hundred hypotheses, still > stands. > > As per Test driven approach have a go at it and see what comes out. Forum > members including myself have reported on

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Mich Talebzadeh

Hi Ali, The old saying of one experiment is worth a hundred hypotheses, still stands. As per Test driven approach have a go at it and see what comes out. Forum members including myself have reported on SSS in Spark user group, so you are at home on this. HTH, view my Linkedin profile

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Ali Gouta

Great, so SSS provides also an api that allows handling RDDs through dataFrames using foreachBatch. Still that I am not sure this is a good practice in general right ? Well, it depends on the use case in any way. Thank you so much for the hints ! Best regards, Ali Gouta. On Sun, Apr 4, 2021 at 6

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Mich Talebzadeh

Hi Ali, On a practical side, I have used both the old DStreams and the newer Spark structured streaming (SSS). SSS does a good job at micro-batch level in the form of foreachBatch(SendToSink) "foreach" performs custom write logic on each row and "foreachBatch" *performs custom write logic

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Ali Gouta

Thank you guys for your answers, I will dig more this new way of doing things and why not consider leaving the old Dstreams and use instead structured streaming. Hope that strucrured streaming + spark on Kubernetes works well and the combination is production ready. Best regards, Ali Gouta. Le di

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Jacek Laskowski

Hi, Just to add it to Gabor's excellent answer that checkpointing and offsets are infrastructure-related and should not really be in the hands of Spark devs who should instead focus on the business purpose of the code (not offsets that are very low-level and not really important). BTW That's what

Re: Spark structured streaming + offset management in kafka + kafka headers

2021-04-04 Thread Gabor Somogyi

There is no way to store offsets in Kafka and restart from the stored offset. Structured Streaming stores offset in checkpoint and it restart from there without any user code. Offsets can be stored with a listener but it can be only used for lag calculation. BR, G On Sat, 3 Apr 2021, 21:09 Ali

Spark structured streaming + offset management in kafka + kafka headers

2021-04-03 Thread Ali Gouta

Hello, I was reading the spark docs about spark structured streaming, since we are thinking about updating our code base that today uses Dstreams, hence spark streaming. Also, one main reason for this change that we want to realize is that reading headers in kafka messages is only supported in spa

Re: Spark structured streaming + offset management in kafka + kafka headers

Re: Spark structured streaming + offset management in kafka + kafka headers

Re: Spark structured streaming + offset management in kafka + kafka headers

Re: Spark structured streaming + offset management in kafka + kafka headers

Re: Spark structured streaming + offset management in kafka + kafka headers

Re: Spark structured streaming + offset management in kafka + kafka headers

Re: Spark structured streaming + offset management in kafka + kafka headers

Re: Spark structured streaming + offset management in kafka + kafka headers

Spark structured streaming + offset management in kafka + kafka headers

9 matches

Site Navigation

Mail list logo

Footer information