Check TextInputFormat. You could override it to achieve that.
From: Kunsheng Chen
To: core-user@hadoop.apache.org
Sent: Saturday, May 23, 2009 5:04:50 PM
Subject: Could hadoop do word count in all files under two-level sub folders?
Hello everyone,
I referre
Raghu Angadi wrote:
As hack, you could tunnel NN traffic from GridFTP clients through a
different machine (by changing fs.default.name).
Alternately these
clients could use a socks proxy.
Socks proxy would not be useful since you don't want datanode traffic to
go through the proxy.
Raghu
As hack, you could tunnel NN traffic from GridFTP clients through a
different machine (by changing fs.default.name). Alternately these
clients could use a socks proxy.
The amount of traffic to NN is not much and tunneling should not affect
performance.
Raghu.
Brian Bockelman wrote:
Hey all
Hello everyone,
I referred to the hadoop tutorial online and found that wordcount example, it
seems to me that all files have to be under a certain folder to make it work.
I am not sure whether that workcount example could work for multiple subfolders.
For example, if the input folder is 'in
You can't use it yet, but
https://issues.apache.org/jira/browse/HADOOP-3799 (Design a pluggable
interface to place replicas of blocks in HDFS) would enable you to
write your own policy so blocks are never placed locally. Might be
worth following its development to check it can meet your need?
Chee
Can you give your machines multiple IP addresses, and bind the grid server
to a different IP than the datanode
With solaris you could put it in a different zone,
On Sat, May 23, 2009 at 10:13 AM, Brian Bockelman wrote:
> Hey all,
>
> Had a problem I wanted to ask advice on. The Caltech site I wo
Hey all,
Had a problem I wanted to ask advice on. The Caltech site I work with
currently have a few GridFTP servers which are on the same physical
machines as the Hadoop datanodes, and a few that aren't. The GridFTP
server has a libhdfs backend which writes incoming network data into
HD