Re: Logging in Hadoop Stream jobs

2009-05-10 Thread Billy Pearson
When I was looking to capture debugging data about my scripts I would just write to stderr stream in php it like fwrite(STDERR,message you want here); then it get captured in the task logs when you view the detail of each task. Billy Mayuran Yogarajah mayuran.yogara...@casalemedia.com wrote

Re: Namenode failed to start with FSNamesystem initialization failed error

2009-05-10 Thread Tamir Kamara
Filed HADOOP-5798. On Wed, May 6, 2009 at 9:53 PM, Raghu Angadi rang...@yahoo-inc.com wrote: Tamir Kamara wrote: Hi Raghu, The thread you posted is my original post written when this problem first happened on my cluster. I can file a JIRA but I wouldn't be able to provide information

Can I run the testcase in local

2009-05-10 Thread zjffdu
Hi all, I'd like to know more about the hadoop, so I want to debug the testcase in local. But I found the errors below: Can anyone help to solve this problem, thank you very much.

Re: Can I run the testcase in local

2009-05-10 Thread zhang jianfeng
PS, I run it in windows machine On Sun, May 10, 2009 at 4:11 PM, zjffdu zjf...@gmail.com wrote: Hi all, I’d like to know more about the hadoop, so I want to debug the testcase in local. But I found the errors below: Can anyone help to solve this problem, thank you very much.

Native (GZIP) decompress not faster than builtin

2009-05-10 Thread Jens Riboe
Hi, During the past week I decided to use native decompress for a Hadoop job (using 0.20.0). But before implementing it I decided to write a small benchmark just so understand how much faster (better) it was. The result came out as a surprise May 6, 2009 10:56:47 PM

RE: Can I run the testcase in local

2009-05-10 Thread zjffdu
I found it can only work on linux, not windows. So is there any way I can run it on windows. From: zhang jianfeng [mailto:zjf...@gmail.com] Sent: 2009年5月10日 16:39 To: core-user@hadoop.apache.org Subject: Re: Can I run the testcase in local PS, I run it in windows machine On

Re: Can I run the testcase in local

2009-05-10 Thread Iman
Zhang, You will need cygwin. There is also a hadoop virtual machine that you can use. Check this tutorials for more details: http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html zjffdu wrote: I found it can only work on linux, not windows. So is there any way I can run it on

Re: Native (GZIP) decompress not faster than builtin

2009-05-10 Thread Stefan Podkowinski
Jens, As your test shows, using a native codec won't make much sense for small files, since the involved JNI overhead will likely out-weight any possible gains. With all the performance improvements in java 5 + 6 its reasonable to ask whether the native implementation does really improve

Re: large files vs many files

2009-05-10 Thread Stefan Podkowinski
You just can't have many distributed jobs write into the same file without locking/synchronizing these writes. Even with append(). Its not different than using a regular file from multiple processes in this respect. Maybe you need to collect your data in front before processing them in hadoop?

Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-10 Thread stack
Writing a file, our application spends a load of time here: at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2964) - locked 0x7f11054c2b68 (a

Re: sub 60 second performance

2009-05-10 Thread jason hadoop
You can cache the block in your task, in a pinned static variable, when you are reusing the jvms. On Sun, May 10, 2009 at 2:30 PM, Matt Bowyer mattbowy...@googlemail.comwrote: Hi, I am trying to do 'on demand map reduce' - something which will return in reasonable time (a few seconds). My

Re: sub 60 second performance

2009-05-10 Thread Matt Bowyer
Thanks Jason, how can I get access to the particular block? do you mean create a static map inside the task (add the values).. and check if populated on the next run? or is there a more elegant/triedtested solution? thanks again On Mon, May 11, 2009 at 12:41 AM, jason hadoop

Re: Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-10 Thread Raghu Angadi
It should not be waiting unnecessarily. But the client has to, if any of the datanodes in the pipeline is not able to receive the as fast as client is writing. IOW writing goes as fast as the slowest of nodes involved in the pipeline (1 client and 3 datanodes). But based on what your case

Re: Huge DataNode Virtual Memory Usage

2009-05-10 Thread Raghu Angadi
what do 'jmap' and 'jmap -histo:live' show?. Raghu. Stefan Will wrote: Chris, Thanks for the tip ... However I'm already running 1.6_10: java version 1.6.0_10 Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) Do you know of a