Re: Could hadoop do word count in all files under two-level sub folders?

2009-05-23 Thread Zhengguo 'Mike' SUN
Check TextInputFormat. You could override it to achieve that. From: Kunsheng Chen To: core-user@hadoop.apache.org Sent: Saturday, May 23, 2009 5:04:50 PM Subject: Could hadoop do word count in all files under two-level sub folders? Hello everyone, I referre

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Raghu Angadi
Raghu Angadi wrote: As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name). Alternately these clients could use a socks proxy. Socks proxy would not be useful since you don't want datanode traffic to go through the proxy. Raghu

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Raghu Angadi
As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name). Alternately these clients could use a socks proxy. The amount of traffic to NN is not much and tunneling should not affect performance. Raghu. Brian Bockelman wrote: Hey all

Could hadoop do word count in all files under two-level sub folders?

2009-05-23 Thread Kunsheng Chen
Hello everyone, I referred to the hadoop tutorial online and found that wordcount example, it seems to me that all files have to be under a certain folder to make it work. I am not sure whether that workcount example could work for multiple subfolders. For example, if the input folder is 'in

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Tom White
You can't use it yet, but https://issues.apache.org/jira/browse/HADOOP-3799 (Design a pluggable interface to place replicas of blocks in HDFS) would enable you to write your own policy so blocks are never placed locally. Might be worth following its development to check it can meet your need? Chee

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread jason hadoop
Can you give your machines multiple IP addresses, and bind the grid server to a different IP than the datanode With solaris you could put it in a different zone, On Sat, May 23, 2009 at 10:13 AM, Brian Bockelman wrote: > Hey all, > > Had a problem I wanted to ask advice on. The Caltech site I wo

Circumventing Hadoop's data placement policy

2009-05-23 Thread Brian Bockelman
Hey all, Had a problem I wanted to ask advice on. The Caltech site I work with currently have a few GridFTP servers which are on the same physical machines as the Hadoop datanodes, and a few that aren't. The GridFTP server has a libhdfs backend which writes incoming network data into HD