Dear Sir,

I have a question about Hadoop, when I use Hadoop and Mapreduce to finish a job 
(only one job in here), can I control the file to work in which node?

For example, I have only one job and this job have 10 files (10 mapper need to 
run). Also in my severs, I have one head node and four working node. My 
question is: can I control those 10 files to working in which node? Such as: 
No.1 file work in node1, No.3 file work in node2, No.5 file work in node3 and 
No.8 file work in node4.

If I can do this, that means I can control the task. Is that means I still can 
control this file in next around (I have a loop in head node;I can do another 
mapreduce work). For example, I can set up No.5 file in 1st around worked node3 
and I also can set up No.5 file work in node 2 in 2nd around.

If I cannot, is that means, for Hadoop, the file will work in which node just 
like a “black box”, the user cannot control the file will work in which node, 
because you think the user do not need control it, just let HDFS help them to 
finish the parallel work. 
Therefore, the Hadoop cannot control the task in one job, but can control the 
multiple jobs.

Thank you so much!



Fan Bai
PhD Candidate
Computer Science Department
Georgia State University
Atlanta, GA 30303
  • [no subject] Fan Bai
    • RE: David Parks
      • Re: Ted Dunning

Reply via email to