Dear Sir, I have a question about Hadoop, when I use Hadoop and Mapreduce to finish a job (only one job in here), can I control the file to work in which node?
For example, I have only one job and this job have 10 files (10 mapper need to run). Also in my severs, I have one head node and four working node. My question is: can I control those 10 files to working in which node? Such as: No.1 file work in node1, No.3 file work in node2, No.5 file work in node3 and No.8 file work in node4. If I can do this, that means I can control the task. Is that means I still can control this file in next around (I have a loop in head node;I can do another mapreduce work). For example, I can set up No.5 file in 1st around worked node3 and I also can set up No.5 file work in node 2 in 2nd around. If I cannot, is that means, for Hadoop, the file will work in which node just like a “black box”, the user cannot control the file will work in which node, because you think the user do not need control it, just let HDFS help them to finish the parallel work. Therefore, the Hadoop cannot control the task in one job, but can control the multiple jobs. Thank you so much! Fan Bai PhD Candidate Computer Science Department Georgia State University Atlanta, GA 30303