hey Rahul, thanks for pointing me to that page. It's definately worth a read. Need both clusters to be at least V2.3 for that?
I was digging also a little bit further. There is the property setting fs.defaultFS whchi might be the exact setting I was actually looking for. Unfortuantely MapR restricts access to the CLDB and not directly to the Namenode, which makes this command right now useless (we have a lot of data in a MapR Cluster, but want to access it in another way) for us. Thanks everyone, who helped here. Cheers Wolli 2014-07-03 18:33 GMT+02:00 Rahul Chaudhari <rahulchaudhari0...@gmail.com>: > Fabian, > I see this as the classic case of federation of hadoop clusters. The MR > or job can refer to the specific hdfs://<file location> as input but at the > same time run on another cluster. > You can refer to following link for further details on federation. > > > http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html > > Regards, > Rahul Chaudhari > > > On Thu, Jul 3, 2014 at 9:06 PM, fab wol <darkwoll...@gmail.com> wrote: > >> Hey Nitin, >> >> I'm not talking about concept-wise. I'm takling about how to actually do >> it technically and how to set it up. Imagine this: I have two clusters, >> both running fine and they are both (setup-wise) the same, besides that one >> has way more tasktrackers/Nodemanagers than the other one. Now I want to >> incorporate some data from the small cluster in the analysis of the big >> cluster. How could i access the data natively (Just giving the input job >> another HDFS folder)? In MapR I configure the specified file and then i >> have another folder in the MapRFS with all the content from the other >> cluster ... Could i somehow specify one Namenode to lookup another Namenode >> and incorporate all the uncommon files? >> >> Cheers >> Fabian >> >> >> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <nitinpawar...@gmail.com>: >> >> Nothing is stopping you to implement cluster the way you want. >>> You can have storage only nodes for your HDFS and do not run >>> tasktrackers on them. >>> >>> Start bunch of machines with High RAM and high CPUs but no storage. >>> >>> Only thing to worry then would be network bandwidth to carry data from >>> hdfs to tasks and back to hdfs. >>> >>> >>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <darkwoll...@gmail.com> wrote: >>> >>>> hey everyone, >>>> >>>> MapR is offering the possibility to acces from one cluster (e.g. a >>>> compute only cluster without much storage capabilities) another cluster's >>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). >>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this >>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and >>>> HDP, I'm looking at you ;- ) ) or with even without this help from those >>>> distributors? Any Hacks and Tricks or even specific Functions are welcome. >>>> If this is not possible, has anyone issued this as a Ticket or >>>> something?`Ticket Number forwarding is also appreciated ... >>>> >>>> Cheers >>>> Wolli >>>> >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> > > > -- > Regards, > Rahul Chaudhari >