Hello,
I have a use case where I need to read events(non correlated) from a kafka
topic, then correlate and push forward to another topic.
I use spark structured streaming with FlatMapGroupsWithStateFunction along
with GroupStateTimeout.ProcessingTimeTimeout() . After each timeout, I do
some co
Hi All,
I had a question about modeling a user session kind of analytics use-case
in Spark Structured Streaming. Is there a way to model something like this
using Arbitrary stateful Spark streaming
User session -> reads a few FAQS on a website and then decides to create a
ticket or not
FAQ Deflec
No, it won't be reused.
You should reuse the dateframe for reusing the shuffle blocks (and cached
data).
I know this because the two actions will lead to building a two separate
DAGs, but I will show you a way how you could check this on your own (with a
small simple spark application).
For this
I am currently building a spark structured streaming application where I am
doing a batch-stream join. And the source for the batch data gets updated
periodically.
So, I am planning to do a persist/unpersist of that batch data periodically.
Below is a sample code which I am using to persist and u