Arnaud Linz created FLINK-2580: ---------------------------------- Summary: HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream Key: FLINK-2580 URL: https://issues.apache.org/jira/browse/FLINK-2580 Project: Flink Issue Type: Improvement Components: Hadoop Compatibility Reporter: Arnaud Linz Priority: Minor
I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper). However, FSDataOutputStream exposes many methods like flush, getPos etc, but HadoopDataOutputStream only wraps write & close. For instance, flush() calls the default, empty implementation of OutputStream instead of the hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not exposed to Flink. I see two options: - complete the class to wrap all methods of OutputStream and add a getWrappedStream() to access other stuff like hsync(). - get rid of the Hadoop wrapping and directly use Hadoop file system objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)