Re: Cross-data centre DFS communication?

2008-02-28 Thread Steve Sapovits
Owen O'Malley wrote: Sure, the info server on the name node of HDFS has a read-only interface that lists directories in xml and allows the client to read files over http. There is a FileSystem implementation that provides the client side interface to the xml/http access. To use it, you need

Re: Cross-data centre DFS communication?

2008-02-28 Thread Owen O'Malley
On Feb 28, 2008, at 8:20 AM, Steve Sapovits wrote: Can you further explain the hftp part of this? I'm not familiar with that. We have a similar need to go cross-data center. Sure, the info server on the name node of HDFS has a read-only interface that lists directories in xml and allows t

Re: Cross-data centre DFS communication?

2008-02-28 Thread Steve Sapovits
Owen O'Malley wrote: To copy between clusters, there is a tool called distcp. Look at "bin/hadoop distcp". It runs a map/reduce job that copies a group of files. It can also be used to copy between versions of hadoop, if the source file system is hftp, which uses xml to read hdfs. Can you fu

Re: Cross-data centre DFS communication?

2008-02-28 Thread Owen O'Malley
On Feb 28, 2008, at 2:43 AM, Miles Osborne wrote: Currently, we have the following setup: --cluster A, running Nutch: small RAM per node --cluster B, just running Hadoop: lots of RAM per node At some point in the future we will want cluster B to talk to cluster A, and ideally this should