data locality in HDFS

2008-06-18 Thread Ian Holsman (Lists)
hi. I want to run a distributed cluster, where i have say 20 machines/slaves in 3 seperate data centers that belong to the same cluster. Ideally I would like the other machines in the data center to be able to upload files (apache log files in this case) onto the local slaves and then have

Re: dfs put fails

2008-06-18 Thread Alexander Arimond
Thank you, first tried the put from the master machine, which leads to the error. The put from the slave machine works. Guess youre right with the configuration parameters. Appears a bit strange to me, because the firewall settings and the hadoop-site.xml on both machines are equal. On Tue,

Re: is there a way to to debug hadoop from Eclipse

2008-06-18 Thread Brian Vargas
JMock also works rather well, using its cglib extensions, for mocking out fake FileSystem implementations, if you're expecting your code to make calls directly to the filesystem for some reason. Brian Matt Kent wrote: JMock is a unit testing tool for creating mock objects. I use it to mock

hadoop file system error

2008-06-18 Thread 晋光峰
Dears, I use hadoop-0.16.4 to do some work and found a error which i can't get the reasons. The scenario is like this: In the reduce step, instead of using OutputCollector to write result, i use FSDataOutputStream to write result to files on HDFS(becouse i want to split the result by some

Re: data locality in HDFS

2008-06-18 Thread Dhruba Borthakur
HDFS uses the network topology to distribute and replicate data. An admin has to configure a script that describes the network topology to HDFS. This is specified by using the parameter topology.script.file.name in the Configuration file. This has been tested when nodes are on different subnets in

how can i save the JobClient info?

2008-06-18 Thread Daniel
Hi all, I'm new to Hadoop framework, i want to know when one MapReduce task is finished, is there any easy way to save the total number of input/output records to some file or variables? Thanks.

Re: Internet-Based Secure Clustered FS?

2008-06-18 Thread Chris Collins
Have you considered Amazon S3? I dont know how secure your requirements are. There are lots of companies using this for just offsite data storage and also with EC2. C On Jun 17, 2008, at 6:48 PM, Kenneth Miller wrote: All, I'm looking for a solution that would allow me to securely

Re: dfs put fails

2008-06-18 Thread Alexander Arimond
Got a similar error when doing a mapreduce job on the master machine. Mapping job is ok and in the end there are the right results in my output folder, but the reduce hangs at 17% a very long time. Found this in one of the task logs a view times: ... 2008-06-18 17:31:02,297 INFO

Re: hadoop file system error

2008-06-18 Thread Konstantin Shvachko
Did you close those files? If not they may be empty. ??? wrote: Dears, I use hadoop-0.16.4 to do some work and found a error which i can't get the reasons. The scenario is like this: In the reduce step, instead of using OutputCollector to write result, i use FSDataOutputStream to write

Re: hadoop file system error

2008-06-18 Thread 晋光峰
i'm sure i close all the files in the reduce step. Any other reasons cause this problem? 2008/6/18 Konstantin Shvachko [EMAIL PROTECTED]: Did you close those files? If not they may be empty. ??? wrote: Dears, I use hadoop-0.16.4 to do some work and found a error which i can't get the