Re: Loading data into HDFS

2007-08-02 Thread Dennis Kubes
You can copy data from any node, so if you can do it from multiple nodes your performance would be better (although be sure not to overlap files). The master node is updated once a the block is copied it replication number of times. So if default replication is 3 then the 3 replicates must be

Re: Loading data into HDFS

2007-08-02 Thread Dmitry
I am not sure that you are following the right techniqs. We have the same issue concerning loading master/slave, still trying to find some more details how to do it better but could not advice you now.. keep posting probably sombody can give you the correct answer, good questions actually th

Re: Loading data into HDFS

2007-08-02 Thread Venkates .P.B.
Am I missing something very fundamental ? Can someone comment on these queries ? Thanks, Venkates P B On 8/1/07, Venkates .P.B. <[EMAIL PROTECTED]> wrote: > > > Few queries regarding the way data is loaded into HDFS. > > -Is it a common practice to load the data into HDFS only through the > maste

Re: problem scaling cluster up to 2 slaves

2007-08-02 Thread Jeff Hammerbacher
"HADOOP-1374" refers to the JIRA (Hadoop's issue-tracking software of choice) ticket that details a bug that may be related to yours. Check it out here: http://issues.apache.org/jira/browse/HADOOP-1374. Regards, Jeff On 8/2/07, Samuel LEMOINE <[EMAIL PROTECTED]> wrote: > > Thanks for your answer

Pig Latin - Store command?

2007-08-02 Thread Shane Butler
Hi, Page 3 of the "Hacking Pig" documentation suggests it is possible to LOAD a file, do stuff with it and then STORE it out... Is this correct or does it have to be done in Java? Regards, Shane PS. Apologies if this is the wrong list.

Re: problem scaling cluster up to 2 slaves

2007-08-02 Thread Konstantin Shvachko
Samuel LEMOINE wrote: Thanks for your answer, you give me hope :) TO answer your interrogations, I use the haddop script "start-all.sh" to launch hadoop, and each plateform is a JAVA 1.5.0 jdk running on ubuntu 7.0.4. Thank you this was important. Your few advices raise as many questions fo

Re: Error reporting from map function

2007-08-02 Thread ojh06
Hi Michael, Thanks for the reply. I've tried to write some code to do this now but its not working. I was wondering if there's anything obviously wrong? After my runJob() I put (just as a test): JobClient aJC = new JobClient(); String jobid = jobConf.get("mapred.job.id"); aJC.setConf(jobConf

Re: Running tasktrackers on datanodes

2007-08-02 Thread Lucas Nazário dos Santos
Hello, There really was a problem with SSH, but it's ok now. When I issue the command start-all.sh from the master node (after formating the namenode), it properly connects to the slave. The problem is that the datanode, as well as the tasktracker are not initialized. SSH is working (at least, it

Re: Error reporting from map function

2007-08-02 Thread Michael Bieniosek
On 8/2/07 5:20 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > I've found the > getMapTaskReports method in the JobClient class, but can't work out > how to access it other than by creating a new instance of JobClient - > but then that JobClient would be a differnt one to the one that was

Re: mapred.compress.map.output

2007-08-02 Thread Marco Nicosia
I have some purely subjective experience. I invite anyone with empirical evidence to pipe up if possible. It can be used, but there are a couple of current important caveats: 1] If your maps have a tremendous amount of output, the TaskTrackers will start producing OutOfMemory exceptions (and depe

mapred.compress.map.output

2007-08-02 Thread Emmanuel
I notice that the process reduce > copy is very slow. I would like to configure hadoop to compress the map ouput. mapred.compress.map.output true map.output.compression.type RECORD I'm wondering if someone already use it or if you have some statistics about the improvement. A

Re: Running tasktrackers on datanodes

2007-08-02 Thread Lucas Nazário dos Santos
Very insightful your response. The problem was with the SSH server. It wasn't working. Now everything seems to be running ok. Thanks Ollie, Lucas On 7/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Hi Lucas, > > Sounds strange, it should work. As long as you have all the names of > the

Re: Error reporting from map function

2007-08-02 Thread ojh06
Hi Doug, Thanks for the reply. Could you possibly explain how my program would get access to the task reports from each tracker? I've found the getMapTaskReports method in the JobClient class, but can't work out how to access it other than by creating a new instance of JobClient - but the

Re: problem scaling cluster up to 2 slaves

2007-08-02 Thread Samuel LEMOINE
Thanks for your answer, you give me hope :) TO answer your interrogations, I use the haddop script "start-all.sh" to launch hadoop, and each plateform is a JAVA 1.5.0 jdk running on ubuntu 7.0.4. Your few advices raise as many questions for me : you mention a discussion named "HADOOP-1374", w