Thanks Devin, Yong, and Chris for your replies and suggestions. I will test the suggestions made by Yong and Devin and get back to you guys.
As on the bottlenecking issue, I agree, but I am trying to run few MR jobs on a traditional NAS server. I can live with a few bottlenecks, so long as I don't have to move the data to a dedicated HDFS cluster. On Sat, Dec 21, 2013 at 8:06 AM, Chris Mawata <chris.maw...@gmail.com>wrote: > Yong raises an important issue: You have thrown out the I/O advantages > of HDFS and also thrown out the advantages of data locality. It would be > interesting to know why you are taking this approach. > Chris > > > On 12/20/2013 9:28 AM, java8964 wrote: > > I believe the "-fs local" should be removed too. The reason is that even > you have a dedicated JobTracker after removing "-jt local", but with "-fs > local", I believe that all the mappers will be run sequentially. > > "-fs local" will force the mapreducer run in "local" mode, which is > really a test mode. > > What you can do is to remove both "-fs local -jt local", but give the > FULL URI of the input and output path, to tell Hadoop that they are local > filesystem instead of HDFS. > > "hadoop jar > /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar > wordcount file:///hduser/mount_point file:///results" > > Keep in mind followings: > > 1) The NFS mount need to be available in all your Task Nodes, and > mounted in the same way. > 2) Even you can do that, but your sharing storage will be your bottleneck. > NFS won't work well for scalability. > > Yong > > ------------------------------ > Date: Fri, 20 Dec 2013 09:01:32 -0500 > Subject: Re: Running Hadoop v2 clustered mode MR on an NFS mounted > filesystem > From: dsui...@rdx.com > To: user@hadoop.apache.org > > I think most of your problem is coming from the options you are setting: > > "hadoop jar > /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar > wordcount *-fs local -jt local* /hduser/mount_point/ /results" > > You appear to be directing your namenode to run jobs in the *LOCAL* job > runner and directing it to read from the *LOCAL* filesystem. Drop the > *-jt* argument and it should run in distributed mode if your cluster is > set up right. You don't need to do anything special to point Hadoop towards > a NFS location, other than set up the NFS location properly and make sure > if you are directing to it by name that it will resolve to the right > address. Hadoop doesn't care where it is, as long as it can read from and > write to it. The fact that you are telling it to read/write from/to a NFS > location that happens to be mounted as a local filesystem object doesn't > matter - you could direct it to the local /hduser/ path and set the -fs > local option, and it would end up on the NFS mount, because that's where > the NFS mount actually exists, or you could direct it to the absolute > network location of the folder that you want, it shouldn't make a > difference. > > *Devin Suiter* > Jr. Data Solutions Software Engineer > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 > Google Voice: 412-256-8556 | www.rdx.com > > > On Fri, Dec 20, 2013 at 5:27 AM, Atish Kathpal <atish.kath...@gmail.com>wrote: > > Hello > > The picture below describes the deployment architecture I am trying to > achieve. > However, when I run the wordcount example code with the below > configuration, by issuing the command from the master node, I notice only > the master node spawning map tasks and completing the submitted job. Below > is the command I used: > > *hadoop jar > /hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar > wordcount -fs local -jt local /hduser/mount_point/ /results* > > *Question: How can I leverage both the hadoop nodes for running MR, > while serving my data from the common NFS mount point running my filesystem > at the backend? Has any one tried such a setup before?* > [image: Inline image 1] > > Thanks! > > > >
<<image/png>>