Arnaud Linz created FLINK-2580:
----------------------------------

             Summary: HadoopDataOutputStream does not expose enough methods of 
org.apache.hadoop.fs.FSDataOutputStream
                 Key: FLINK-2580
                 URL: https://issues.apache.org/jira/browse/FLINK-2580
             Project: Flink
          Issue Type: Improvement
          Components: Hadoop Compatibility
            Reporter: Arnaud Linz
            Priority: Minor


I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write 
into a hdfs file, calling 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a  
HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream 
(under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).
 
However, FSDataOutputStream exposes many methods like flush,   getPos etc, but 
HadoopDataOutputStream only wraps write & close.
 
For instance, flush() calls the default, empty implementation of OutputStream 
instead of the hadoop one, and that’s confusing. Moreover, because of the 
restrictive OutputStream interface, hsync() and hflush() are not exposed to 
Flink.

I see two options:

- complete the class to wrap all methods of OutputStream and add a 
getWrappedStream() to access other stuff like hsync().

- get rid of the Hadoop wrapping and directly use Hadoop file system objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to