static-max created FLINK-16574:
----------------------------------

             Summary: StreamingFileSink should rename files or fail if 
destination file already exists
                 Key: FLINK-16574
                 URL: https://issues.apache.org/jira/browse/FLINK-16574
             Project: Flink
          Issue Type: Bug
          Components: Connectors / FileSystem
    Affects Versions: 1.9.1
         Environment: We're using Flink 1.9.1 on YARN with Horton HDP 2.7.3.
            Reporter: static-max


I switched from BucketingSink to StreamingFileSink so my state could not be 
restored after starting from a savepoint.

Upon start of the job there were already part-0-0 and part-0-1 files in the 
HDFS destination folder. The StreamingFileSink then creates a file like 
.part-0-0.inprogress.d1849354-39d4-4634-8fb3-dfb8e2083857{color:#172b4d}. When 
the file is rolled Flink tries to rename it to part-0-0, but that file already 
exists. NameNode logs "WARN hdfs.StateChange 
(FSDirRenameOp.java:unprotectedRenameTo(174)) - DIR* 
FSDirectory.unprotectedRenameTo: failed to rename XXXX to XXXXbeca
use destination exists".{color}

Flink does not care and creates a new file like .part-0-1.inprogress.d 
{color:#172b4d}for the next bucket and the game continues until the part index 
counter is so high the file can be renamed. But now I'm left with a lot of 
.part-xxx.inprogress.xxx that I need to rename by hand if I don't want to lose 
the data.{color}

 

I would expect Flink to either fail if the file cannot be renamed, or 
auto-rename it to filename that does not exists yet.

The same happens when not starting from a savepoint. IIRC the BucketingFileSink 
did not have this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to