Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-09-27 Thread Vitaliy Semochkin
Hi, [..]if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-09-27 Thread Bradford Stephens
It turned out to be a deployment issue of an old version. Ted and Chris's suggestions were spot-on. I can't believe how BRILLIANT these combiners from Cascading are. It's cut my processing time down from 20 hours to 50 minutes. AND I cut out about 80% of my hand-crafted code. Bravo. I look smart

Re: hd fs -head?

2010-09-27 Thread Edward Capriolo
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com wrote: Is there a particularly good reason for why the hadoop fs command supports -cat and -tail, but not -head? Keith Wiley    

Re: A new way to merge up those small files!

2010-09-27 Thread Edward Capriolo
Ted, Good point. Patches are welcome :) I will add it onto my to-do list. Edward On Sat, Sep 25, 2010 at 12:05 PM, Ted Yu yuzhih...@gmail.com wrote: Edward: Thanks for the tool. I think the last parameter can be omitted if you follow what hadoop fs -text does. It looks at a file's magic

Re: Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-27 Thread pig
HI Sriguru, Thank you for the tips. Just to clarify a few things. Our machines have 32 GB of RAM. I'm planning on setting each machine to run 12 mappers and 2 reducers with the heap size set to 2048MB so total memory usage for the heap at 28GB. If this is the case should io.sort.mb be set to

Re: hd fs -head?

2010-09-27 Thread Keith Wiley
On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote: On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com wrote: Is there a particularly good reason for why the hadoop fs command supports -cat and -tail, but not -head? Tail is needed to be done efficiently but head you can

Re: Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-27 Thread Ted Yu
The setting should be fs.inmemory.size.mb On Mon, Sep 27, 2010 at 7:15 AM, pig hadoopn...@gmail.com wrote: HI Sriguru, Thank you for the tips. Just to clarify a few things. Our machines have 32 GB of RAM. I'm planning on setting each machine to run 12 mappers and 2 reducers with the

Re: Can not upload local file to HDFS

2010-09-27 Thread He Chen
Thanks, but I think you goes too far to focus on the problem itself. On Sun, Sep 26, 2010 at 11:43 AM, Nan Zhu zhunans...@gmail.com wrote: Have you ever check the log file in the directory? I always find some important information there, I suggest you to recompile hadoop with ant since

Re: Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-27 Thread ed
Ah okay, I did not the fs.inmemory.size.mb setting in any of the default config files located here: http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html http://hadoop.apache.org/common/docs/r0.20.2/core-default.html http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html

Re: Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-27 Thread Ted Yu
The default is 100MB for InMemory File System: int size = Integer.parseInt(conf.get(fs.inmemory.size.mb, 100)); ./src/core/org/apache/hadoop/fs/InMemoryFileSystem.java If you want to change its value, you can put it in core-site.xml On Mon, Sep 27, 2010 at 9:29 AM, ed hadoopn...@gmail.com

Re: Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-27 Thread Srigurunath Chakravarthi
Ed, Your math is right - 1400 MB would be a good setting for io.sort.mb. fs.inmemorysize.mb - what Hadoop version are you using? I suspect that this may be deprecated. If it is supported, you can set it to 1400 MB too. I know it is recognized by .21 (and older versions). Increasing

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2010-09-27 Thread Bharath Mundlapudi
Couple of things you can try. 1. Increase the Heap Size for the tasks. 2. Since, your OOM happening randomly, try setting -XX:+HeapDumpOnOutOfMemoryError for your child JVM parameters. Atleast you can detect, why your heap growing -is it due to a leak ? or if you need to increase the heap

Too many fetch-failures

2010-09-27 Thread Pramy Bhats
Hello, I am trying to run a biagram count on a 12-node cluster setup. For an input file of 135 splits (around 7.5 GB), the job fails for some of the runs. The error that I get on the jobtracker that out of 135 mappers, 1 of the mapper fails because of Too many fetch-failures Too many

Re: Too many fetch-failures

2010-09-27 Thread Joe Stein
I have seen this before if your hosts are not setup so every data node can contact every other data node by the host name registered. Typically when adding a new node ( or initial cluster setup ) making sure the internal DNS is updated or that the hosts file is updated on all data nodes as such

Re: Too many fetch-failures

2010-09-27 Thread Pramy Bhats
Hi Jon, I am not sure, If i get the last part correctly. The job doesn't fail consistently. However, DNS setup seems perfectly fine. Could you give me some pointers to debug the problem for the exact cause. thanks in advance. On Mon, Sep 27, 2010 at 9:01 PM, Joe Stein

Re: hd fs -head?

2010-09-27 Thread Edward Capriolo
On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley kwi...@keithwiley.com wrote: On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote: On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com wrote: Is there a particularly good reason for why the hadoop fs command supports -cat and

How to change logging from DRFA to RFA? Is it a good idea?

2010-09-27 Thread Leo Alekseyev
We are looking for ways to prevent Hadoop daemon logs from piling up (over time they can reach several tens of GB and become a nuisance). Unfortunately, the log4j DRFA class doesn't seem to provide an easy way to limit the number of files it creates. I would like to try switching to RFA with set

Re: hd fs -head?

2010-09-27 Thread Keith Wiley
On Sep 27, 2010, at 13:46 , Edward Capriolo wrote: On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley kwi...@keithwiley.com wrote: On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote: On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com wrote: Is there a particularly good reason

Re: Questions about BN and CN

2010-09-27 Thread Todd Lipcon
On Thu, Sep 23, 2010 at 6:20 PM, Konstantin Shvachko s...@yahoo-inc.comwrote: Hi Shen, Why do we need CheckpointNode? 1. First of all it is a compatible replacement of SecondaryNameNode. 2. Checkpointing is also needed for periodically compacting edits. You can do it with CN or BN, but CN