+1 For Raising all this. +1 For Queryable State (SPARK-16738 [3]) On Thu, Oct 18, 2018 at 9:59 PM Jungtaek Lim <kabh...@gmail.com> wrote:
> Small correction: "timeout" in map/flatmapGroupsWithState would not work > similar as State TTL when event time and watermark is set. So timeout in > map/flatmapGroupsWithState is to guarantee removal of state when the state > will not be used, as similar as what we do with streaming aggregation, > whereas State TTL is just work as its name is represented > (self-explanatory). Hence State TTL looks valid for all the cases. > > 2018년 10월 19일 (금) 오후 12:20, Jungtaek Lim <kabh...@gmail.com>님이 작성: > >> Hi devs, >> >> While Spark 2.4.0 is still in progress of release votes, I'm seeing some >> pull requests on non-SS are being reviewed and merged into master branch, >> so I guess discussion about next release is OK. >> >> Looks like there's a major TODO left on structured streaming: allowing >> stateful operation in continuous mode (watermark, stateful exactly-once) >> and no other major milestone is shared. (Please let me know if I'm missing >> here!) As a structured streaming contributor's point of view, there're >> another features we could discuss and see which are good to have, and >> prioritize if possible (NOTE: just a brainstorming and some items might not >> be valid for structured streaming): >> >> * Native support on session window (SPARK-10816 [1]) >> ** patch available >> * Support delegation token on Kafka (SPARK-25501 [2]) >> ** patch available >> * Queryable State (SPARK-16738 [3]) >> ** some discussion took place, but no action is taken yet >> * End to end exactly-once with Kafka sink >> ** given Kafka is the first class on streaming source/sink nowadays >> * Custom window / custom watermark >> * Physically scale (up/down) streaming state >> * State TTL (especially for non-watermark state) >> ** "timeout" in map/flatmapGroupsWithState fits it, but just to check >> whether we want to have it for normal streaming aggregation >> * Provide discarded events due to late via side output or similar feature >> ** for me it looks like tricky one, since Spark's RDD as well as SQL >> semantic provide one output >> * more? >> >> Would like to hear others opinions about this. Please also share if >> there're ongoing efforts on other items for structured streaming. Happy to >> help out if it needs another hand. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 1. https://issues.apache.org/jira/browse/SPARK-10816 >> 2. https://issues.apache.org/jira/browse/SPARK-25501 >> 3. https://issues.apache.org/jira/browse/SPARK-16738 >> >>