Re: Query around Spark Checkpoints

2020-09-29 Thread Jungtaek Lim
Sorry I have no idea on Delta Lake. You may get a better answer from Delta Lake mailing list. One thing is clear that stateful processing is simply an essential feature on almost every streaming framework. If you're struggling with something around the state feature and trying to find a workaround

Re: Query around Spark Checkpoints

2020-09-29 Thread Bryan Jeffrey
Jungtaek, How would you contrast stateful streaming with checkpoint vs. the idea of writing updates to a Delta Lake table, and then using the Delta Lake table as a streaming source for our state stream? Thank you, Bryan On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh wrote: > Thank You Jungta

Re: Query around Spark Checkpoints

2020-09-28 Thread Debabrata Ghosh
Thank You Jungtaek and Amit ! This is very helpful indeed ! Cheers, Debu On Mon, Sep 28, 2020 at 5:33 AM Jungtaek Lim wrote: > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala > > You would need to implem

Re: Query around Spark Checkpoints

2020-09-27 Thread Jungtaek Lim
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala You would need to implement CheckpointFileManager by yourself, which is tightly integrated with HDFS (parameters and return types of methods are mostly from HDFS

Re: Query around Spark Checkpoints

2020-09-27 Thread Amit Joshi
Hi, As far as I know, it depends on whether you are using spark streaming or structured streaming. In spark streaming you can write your own code to checkpoint. But in case of structured streaming it should be file location. But main question in why do you want to checkpoint in Nosql, as it's even