Just to be crystal clear Dstreams will be deprecated sooner or later and
there will be no support so highly advised to migrate...
G
On Sun, 4 Apr 2021, 19:23 Ali Gouta, wrote:
> Thanks Mich !
>
> Ali Gouta.
>
> On Sun, Apr 4, 2021 at 6:44 PM Mich Talebzadeh
> wrote:
>
>> Hi Ali,
>>
>> The old
Thanks Mich !
Ali Gouta.
On Sun, Apr 4, 2021 at 6:44 PM Mich Talebzadeh
wrote:
> Hi Ali,
>
> The old saying of one experiment is worth a hundred hypotheses, still
> stands.
>
> As per Test driven approach have a go at it and see what comes out. Forum
> members including myself have reported on
Hi Ali,
The old saying of one experiment is worth a hundred hypotheses, still
stands.
As per Test driven approach have a go at it and see what comes out. Forum
members including myself have reported on SSS in Spark user group, so you
are at home on this.
HTH,
view my Linkedin profile
Great, so SSS provides also an api that allows handling RDDs through
dataFrames using foreachBatch. Still that I am not sure this is a
good practice in general right ? Well, it depends on the use case in any
way.
Thank you so much for the hints !
Best regards,
Ali Gouta.
On Sun, Apr 4, 2021 at 6
Hi Ali,
On a practical side, I have used both the old DStreams and the newer Spark
structured streaming (SSS).
SSS does a good job at micro-batch level in the form of
foreachBatch(SendToSink)
"foreach" performs custom write logic on each row and "foreachBatch" *performs
custom write logic
Thank you guys for your answers, I will dig more this new way of doing
things and why not consider leaving the old Dstreams and use instead
structured streaming. Hope that strucrured streaming + spark on Kubernetes
works well and the combination is production ready.
Best regards,
Ali Gouta.
Le di
Hi,
Just to add it to Gabor's excellent answer that checkpointing and offsets
are infrastructure-related and should not really be in the hands of Spark
devs who should instead focus on the business purpose of the code (not
offsets that are very low-level and not really important).
BTW That's what
There is no way to store offsets in Kafka and restart from the stored
offset. Structured Streaming stores offset in checkpoint and it restart
from there without any user code.
Offsets can be stored with a listener but it can be only used for lag
calculation.
BR,
G
On Sat, 3 Apr 2021, 21:09 Ali
Hello,
I was reading the spark docs about spark structured streaming, since we are
thinking about updating our code base that today uses Dstreams, hence spark
streaming. Also, one main reason for this change that we want to realize is
that reading headers in kafka messages is only supported in spa