That is not true. HDFS writes are not staged to a local disk first before being written onto the DataNodes. The old architecture docs seem to suggest that the writes get staged to a local disk but thats not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
Also worth noting that a HDFS client behaves the same way in almost all contexts, whether its invoked from an MR framework or directly from shell. On Fri, May 17, 2013 at 3:38 AM, John Lilley <john.lil...@redpoint.net> wrote: > I seem to recall reading that when a MapReduce task writes a file, the > blocks of the file are always written to local disk, and replicated to other > nodes. If this is true, is this also true for non-MR applications writing > to HDFS from Hadoop worker nodes? What about clients outside of the cluster > doing a file load? > > Thanks > > John > > -- Harsh J