Stephan Ewen created FLINK-5789:
-----------------------------------

             Summary: Make Bucketing Sink independent of Hadoop's FileSysten
                 Key: FLINK-5789
                 URL: https://issues.apache.org/jira/browse/FLINK-5789
             Project: Flink
          Issue Type: Bug
          Components: Streaming Connectors
    Affects Versions: 1.1.4, 1.2.0
            Reporter: Stephan Ewen
             Fix For: 1.3.0


The {{BucketingSink}} is hard wired to Hadoop's FileSystem, bypassing Flink's 
file system abstraction.

This causes several issues:
  - The bucketing sink will behave different than other file sinks with respect 
to configuration
  - Directly supported file systems (not through hadoop) like the MapR File 
System does not work in the same way with the BuketingSink as other file systems
  - The previous point is all the more problematic in the effort to make Hadoop 
an optional dependency and with in other stacks (Mesos, Kubernetes, AWS, GCE, 
Azure) with ideally no Hadoop dependency.

We should port the {{BucketingSink}} to use Flink's FileSystem classes.

To support the *truncate* functionality that is needed for the exactly-once 
semantics of the Bucketing Sink, we should extend Flink's FileSystem 
abstraction to have the methods
  - {{boolean supportsTruncate()}}
  - {{void truncate(Path, long)}}







--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to