[ https://issues.apache.org/jira/browse/BEAM-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266262#comment-15266262 ]
Aljoscha Krettek edited comment on BEAM-241 at 5/2/16 5:33 PM: --------------------------------------------------------------- I think the short-term (and maybe also long-term) solution is to do everything via {{DoFnRunners}}. Then we automatically get the correct behavior. What do you think? Should I open an issue for the Flink runner? I also want to unify how we treat {{DoFn}} s and windowing operations, or do you think that we shouldn't use a {{GroupAlsoByWindowViaWindowSetDoFn}} once the new runner API is in? was (Author: aljoscha): I think the short-term (and maybe also long-term) solution is to do everything via {{DoFnRunners}}. Then we automatically get the correct behavior. What do you think? Should I open an issue for the Flink runner? I also want to unify how we treat {{DoFn}}s and windowing operations, or do you think that we shouldn't use a {{GroupAlsoByWindowViaWindowSetDoFn}} once the new runner API is in? > Not easy for runners to get late-data dropping > ---------------------------------------------- > > Key: BEAM-241 > URL: https://issues.apache.org/jira/browse/BEAM-241 > Project: Beam > Issue Type: Bug > Components: runner-core > Reporter: Mark Shields > Assignee: Frances Perry > > Quite by accident realized the Flink runner delegates to > GroupAlsoByWindowViaWindowSetDoFn for GBK, which in turn delegates to > ReduceFnRunner. The latter now assumes no messages will arrive beyond the > 'garbage collection' time of their target window(s). > The Dataflow runner interposes a LateDataDroppingDoFnRunner into the path so > as to drop those too-late messages. That's done (I think) using > DoFnRunners.createDefault. > I don't think the Flink runner does that. > We need a nice runner-friendly way of dealing with the too-late data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)