How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Agarwal, Nikhil
Hi, I have a Hadoop cluster up and running. I want to submit an MR job to it but the input data is kept on an external server (outside the hadoop cluster). Can anyone please suggest how do I tell my hadoop cluster to load the input data from the external servers and then do a MR on it ?

Re: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Nitin Pawar
you are looking at a two step workflow here first unit of your workflow will download the file from external server and write it to DFS and return the file path second unit of your workflow will read the input path and process the data according to your business logic in MR you can look at

RE: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Agarwal, Nikhil
Hi, Thanks for your reply. I do not know about cascading. Should I google it as cascading in hadoop? Also, what I was thinking is to implement a file system which overrides the functions provided by fs.FileSystem interface in Hadoop. I tried to write some portions of the filesystem (for my

RE: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Azuryy Yu
can you addInputPath(hdfs://……),dont change fs.default.name, It cannot solve your problem. On Mar 26, 2013 7:03 PM, Agarwal, Nikhil nikhil.agar...@netapp.com wrote: Hi, Thanks for your reply. I do not know about cascading. Should I google it as “cascading in hadoop”? Also, what I was

Re: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Hemanth Yamijala
The stack trace indicates the job client is trying to submit a job to the MR cluster and it is failing. Are you certain that at the time of submitting the job, the JobTracker is running ? (On localhost:54312) ? Regarding using a different file system - it depends a lot on what file system you are