Eugene Kirpichov created BEAM-3030: -------------------------------------- Summary: watchForNewFiles() can emit a file multiple times if it's growing Key: BEAM-3030 URL: https://issues.apache.org/jira/browse/BEAM-3030 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Eugene Kirpichov Assignee: Eugene Kirpichov Fix For: 2.3.0
TextIO and AvroIO watchForNewFiles(), as well as FileIO.match().continuously(), use Watch transform under the hood, and watch the set of Metadata matching a filepattern. Two Metadata's with the same filename but different size are not considered equal, so if these transforms observe the same file multiple times with different sizes, they'll read the file multiple times. This is likely not yet a problem for production users, because these features require SDF, it's supported only in Dataflow runner, and users of the Dataflow runner are likely to use only files on GCS which doesn't support appends. However, this needs to be fixed still. -- This message was sent by Atlassian JIRA (v6.4.14#64029)