Re: Combine data from different HDFS FS

2013-04-08 Thread Pedro Sá da Costa
Maybe there is some FileInputFormat class that allows to define input files from different locations. What I would like to know, is if it's possible to read input data from different HDFS FS. E.g., run the wordcount with the input files from HDFS FS in HOST1 and HOST2 (the FS in HOST1 and HOST2 are

Re: Combine data from different HDFS FS

2013-04-08 Thread Pedro Sá da Costa
I'm invoking the wordcount example in host1 with this command, but I got an error. HOST1:$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount hdfs://HOST2:54310/gutenberg gutenberg-output 13/04/08 22:02:55 ERROR security.UserGroupInformation: PriviledgedActionException as:ubuntu cause:org.apache

Re: Combine data from different HDFS FS

2013-04-08 Thread Harsh J
You should be able to add fully qualified HDFS paths from N clusters into the same job via FileInputFormat.addInputPath(…) calls. Caveats may apply for secure environments, but for non-secure mode this should work just fine. Did you try this and did it not work? On Mon, Apr 8, 2013 at 9:56 PM, Ped

Combine data from different HDFS FS

2013-04-08 Thread Pedro Sá da Costa
Hi, I want to combine the data that are in different HDFS filesystems, for them to be executed in one job. Is it possible to do this with MR, or there is another Apache tool that allows me to do this? Eg. Hdfs data in Cluster1 v Hdfs data in Cluster2 -> this job reads the data from Cluster1,