Hi Rob,

DFSInputStream:  InterfaceAudience for this class is private and  you should 
not use this class directly. This class mainly implements actual core 
functionality of read. And this is DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and you can 
use this class. In fact, you will get the object of HdfsDataInputStream when 
you open the file for read. This wrapper provides you some additional DFS 
specific api implementations like getVisibleLength etc which are may not be the 
intended apis for normal FS.

Similar way for write:
I hope this will help you for clarifying your doubts.

Regards,
Uma

From: Rob Blah [mailto:tmp5...@gmail.com]
Sent: 01 October 2013 03:39
To: user@hadoop.apache.org
Subject: When to use DFSInputStream and HdfsDataInputStream

Hi
What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have similar 
functionality, only HdfsData*Stream "follows" Data*Stream instead of *Stream. 
Also is DFS*Stream more general than HdfsData*Stream, in the sense it works on 
higher abstraction layer, can work with other Distributed FS (even though it 
contact HDFS specific components), or its just naming convention?
Which one should I chose to read/write data from/to HDFS and why (sounds like 
academic question ;) )?

* -> means both Input and Output

regards
tmp

Reply via email to