Hi,
Recently I followed a blog to run Hadoop on a single node cluster. I wanted to ask that in a single node set-up of Hadoop is it necessary to have the data copied into Hadoop's HDFS before running a MR on it. Can I run MR on my local file system too without copying the data to HDFS? In the Hadoop source code I saw there are implementations of other file systems too like S3, KFS, FTP, etc. so how does exactly a MR happen on S3 data store ? How does JobTracker or Tasktracker run in S3 ? I would be very thankful to get a reply to this. Thanks & Regards, Nikhil