Eugene Kirpichov created BEAM-3030:
--------------------------------------

             Summary: watchForNewFiles() can emit a file multiple times if it's 
growing
                 Key: BEAM-3030
                 URL: https://issues.apache.org/jira/browse/BEAM-3030
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov
             Fix For: 2.3.0


TextIO and AvroIO watchForNewFiles(), as well as FileIO.match().continuously(), 
use Watch transform under the hood, and watch the set of Metadata matching a 
filepattern.

Two Metadata's with the same filename but different size are not considered 
equal, so if these transforms observe the same file multiple times with 
different sizes, they'll read the file multiple times.

This is likely not yet a problem for production users, because these features 
require SDF, it's supported only in Dataflow runner, and users of the Dataflow 
runner are likely to use only files on GCS which doesn't support appends. 
However, this needs to be fixed still.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to