The local filesystem has no sense of being 'distributed'. If you run a distributed mode of Hadoop over file:// (Local FS), then unless the file:// points being used itself is distributed (such as an NFS), then your jobs will fail their tasks on all the nodes the referenced files cannot be found on.
Essentially, for a distributed operation, MR relies on a distributed file system and local filesystem is opposite of that. On Sat, Jan 26, 2013 at 9:19 PM, Sundeep Kambhampati <kambh...@cse.ohio-state.edu> wrote: > Hi Users, > I am kind of new to MapReduce programming I am trying to understand the > integration between MapReduce and HDFS. > I could understand MapReduce can use HDFS for data access. But is possible > not to use HDFS at all and run MapReduce programs? > HDFS does file replication and partitioning. But if I use the following > command to run the Example MaxTemperature > > bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature > file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4 > > instead of > > bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature > usr/local/ncdcinput/sample.txt usr/local/out4 ->> this will use hdfs > file system. > > it uses local file system files and writing to local file system when I run > in pseudo distributed mode. Since it is single node there is no problem of > non local data. > What happens in a fully distributed mode. Will the files be copied to other > machines or will it throw errors? will the files be replicated and will they > be partitioned for running MapReduce if i use Localfile system? > > Can someone please explain. > > Regards > Sundeep > > > > -- Harsh J