Hi,
I have a Hadoop cluster up and running. I want to submit an MR job to it but
the input data is kept on an external server (outside the hadoop cluster). Can
anyone please suggest how do I tell my hadoop cluster to load the input data
from the external servers and then do a MR on it ?
you are looking at a two step workflow here
first unit of your workflow will download the file from external server and
write it to DFS and return the file path
second unit of your workflow will read the input path and process the data
according to your business logic in MR
you can look at
Hi,
Thanks for your reply. I do not know about cascading. Should I google it as
cascading in hadoop? Also, what I was thinking is to implement a file system
which overrides the functions provided by fs.FileSystem interface in Hadoop. I
tried to write some portions of the filesystem (for my
can you addInputPath(hdfs://……),dont change fs.default.name, It cannot
solve your problem.
On Mar 26, 2013 7:03 PM, Agarwal, Nikhil nikhil.agar...@netapp.com
wrote:
Hi,
Thanks for your reply. I do not know about cascading. Should I google it
as “cascading in hadoop”? Also, what I was
The stack trace indicates the job client is trying to submit a job to the
MR cluster and it is failing. Are you certain that at the time of
submitting the job, the JobTracker is running ? (On localhost:54312) ?
Regarding using a different file system - it depends a lot on what file
system you are