It is okay to collect the iterator. That will not break Spark. However, collecting it requires memory in the executor, so you may cause OOMs if a group has a LOT of new data.
On Wed, Oct 31, 2018 at 3:44 AM Antonio Murgia - antonio.murg...@studio.unibo.it <antonio.murg...@studio.unibo.it> wrote: > Hi all, > > I'm currently developing a Spark Structured Streaming job and I'm > performing flatMapGroupsWithState. > > I'm concerned about the laziness of the Iterator[V] that is passed to my > custom function (func: (K, Iterator[V], GroupState[S]) => Iterator[U]). > > Is it ok to collect that iterator (with a toList, for example)? I have a > logic that is practically impossible to perform on a Iterator, but I do not > want to break Spark lazy chain, obviously. > > > Thank you in advance. > > > #A.M. >