Hi Rob, DFSInputStream: InterfaceAudience for this class is private and you should not use this class directly. This class mainly implements actual core functionality of read. And this is DFS specific implementation only. HdfsDataInputStream : InterfaceAudience for this class is public and you can use this class. In fact, you will get the object of HdfsDataInputStream when you open the file for read. This wrapper provides you some additional DFS specific api implementations like getVisibleLength etc which are may not be the intended apis for normal FS.
Similar way for write: I hope this will help you for clarifying your doubts. Regards, Uma From: Rob Blah [mailto:tmp5...@gmail.com] Sent: 01 October 2013 03:39 To: user@hadoop.apache.org Subject: When to use DFSInputStream and HdfsDataInputStream Hi What is the use case difference between: - DFSInputStream and HdfsDataInputStream - DFSOutputStream and HdfsDataOutputStream When one should be preferred over other? From sources I see they have similar functionality, only HdfsData*Stream "follows" Data*Stream instead of *Stream. Also is DFS*Stream more general than HdfsData*Stream, in the sense it works on higher abstraction layer, can work with other Distributed FS (even though it contact HDFS specific components), or its just naming convention? Which one should I chose to read/write data from/to HDFS and why (sounds like academic question ;) )? * -> means both Input and Output regards tmp