Sorry I have no idea on Delta Lake. You may get a better answer from Delta
Lake mailing list.
One thing is clear that stateful processing is simply an essential feature
on almost every streaming framework. If you're struggling with something
around the state feature and trying to find a workaround
Jungtaek,
How would you contrast stateful streaming with checkpoint vs. the idea of
writing updates to a Delta Lake table, and then using the Delta Lake table
as a streaming source for our state stream?
Thank you,
Bryan
On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh
wrote:
> Thank You Jungta
Thank You Jungtaek and Amit ! This is very helpful indeed !
Cheers,
Debu
On Mon, Sep 28, 2020 at 5:33 AM Jungtaek Lim
wrote:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
>
> You would need to implem
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala
You would need to implement CheckpointFileManager by yourself, which is
tightly integrated with HDFS (parameters and return types of methods are
mostly from HDFS
Hi,
As far as I know, it depends on whether you are using spark streaming or
structured streaming.
In spark streaming you can write your own code to checkpoint.
But in case of structured streaming it should be file location.
But main question in why do you want to checkpoint in
Nosql, as it's even