Circumventing Hadoop's data placement policy

2009-05-23 Thread Brian Bockelman
Hey all, Had a problem I wanted to ask advice on. The Caltech site I work with currently have a few GridFTP servers which are on the same physical machines as the Hadoop datanodes, and a few that aren't. The GridFTP server has a libhdfs backend which writes incoming network data into HD

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread jason hadoop
Can you give your machines multiple IP addresses, and bind the grid server to a different IP than the datanode With solaris you could put it in a different zone, On Sat, May 23, 2009 at 10:13 AM, Brian Bockelman wrote: > Hey all, > > Had a problem I wanted to ask advice on. The Caltech site I wo

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Tom White
You can't use it yet, but https://issues.apache.org/jira/browse/HADOOP-3799 (Design a pluggable interface to place replicas of blocks in HDFS) would enable you to write your own policy so blocks are never placed locally. Might be worth following its development to check it can meet your need? Chee

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Raghu Angadi
As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name). Alternately these clients could use a socks proxy. The amount of traffic to NN is not much and tunneling should not affect performance. Raghu. Brian Bockelman wrote: Hey all

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Raghu Angadi
Raghu Angadi wrote: As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name). Alternately these clients could use a socks proxy. Socks proxy would not be useful since you don't want datanode traffic to go through the proxy. Raghu