Hi,

That's not explained in the SS guide doc but explained in the scala API doc.
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html

The statement being quoted from the scala API doc answers your question.

The timeout is reset every time the function is called on a group, that is,
> when the group has new data, or the group has timed out. So the user has to
> set the timeout duration every time the function is called, otherwise there
> will not be any timeout set.


Simply saying, you'd want to always set timeout unless you remove state for
the group (key).

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)

‪On Mon, Oct 5, 2020 at 6:16 PM ‫Yuri Oleynikov (יורי אולייניקוב‬‎ <
yur...@gmail.com> wrote:‬

> Hi all, I have following question:
>
> What happens to the state (in terms of expiration) if I’m updating the
> state without setting timeout?
>
>
> E.g. in FlatMapGroupsWithStateFunction
>
>    1. first batch:
>
> state.update(myObj)
>
> state.setTimeoutDuration(timeout)
>
>    1. second batch:
>
> state.update(myObj)
>
>    1. third batch (no data for a long time):
>       1. ???? state timed-out after initial timeout  expired? Not
>       timed-out?
>
>
  • Arbitrary stateful... יורי אולייניקוב <yur...@gmail.com>
    • Re: Arbitrary... Jungtaek Lim
      • Re: Arbit... Yuri Oleynikov (‫יורי אולייניקוב‬‎)

Reply via email to