Hi Anfernee,

You will achieve improved performance with federation only if you stripe files 
across the multiple NNs.  Federation basically shares DN storage with multiple 
NNs with the expectation the namespace load will be distributed across the 
multiple NNs.  If everything writes to the exact same parent directory then no 
benefit is achieved over a single NN.  You will need to partition your jobs so 
some write to one NN, other jobs write to the other NN(s).

I hope this helps!

Daryn

On Jan 28, 2014, at 12:04 PM, Anfernee Xu 
<anfernee...@gmail.com<mailto:anfernee...@gmail.com>>
 wrote:

Hi,

Based on 
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html#Key_Benefits,
 the overall performance can be improved by federation, but I'm not sure 
federation address my usercase, could someone elaborate it?

My usercase is I have one single NM and several DN, and I have bunch of 
concurrent MR jobs which will create new files(plan files and sub-directory) 
under the same parent directory, the questions are:

1) Will these concurrent writes(new file, plan files and sub-directory under 
the same parent directory) run in sequential because WRITE-once control govened 
by single NM?

I need this answer to estimate the necessity of moving to HDFS federation.

Thanks

--
--Anfernee

Reply via email to