Hi,
[..]if more than 98% of the total time is spent in garbage collection
and less than 2% of the heap is recovered, an OutOfMemoryError will be
thrown. This feature is designed to prevent applications from running
for an extended period of time while making little or no progress
because the heap
It turned out to be a deployment issue of an old version. Ted and
Chris's suggestions were spot-on.
I can't believe how BRILLIANT these combiners from Cascading are. It's
cut my processing time down from 20 hours to 50 minutes. AND I cut out
about 80% of my hand-crafted code.
Bravo. I look smart
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com wrote:
Is there a particularly good reason for why the hadoop fs command supports
-cat and -tail, but not -head?
Keith Wiley
Ted,
Good point. Patches are welcome :) I will add it onto my to-do list.
Edward
On Sat, Sep 25, 2010 at 12:05 PM, Ted Yu yuzhih...@gmail.com wrote:
Edward:
Thanks for the tool.
I think the last parameter can be omitted if you follow what hadoop fs -text
does.
It looks at a file's magic
HI Sriguru,
Thank you for the tips. Just to clarify a few things.
Our machines have 32 GB of RAM.
I'm planning on setting each machine to run 12 mappers and 2 reducers with
the heap size set to 2048MB so total memory usage for the heap at 28GB.
If this is the case should io.sort.mb be set to
On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com
wrote:
Is there a particularly good reason for why the hadoop fs command
supports
-cat and -tail, but not -head?
Tail is needed to be done efficiently but head you can
The setting should be fs.inmemory.size.mb
On Mon, Sep 27, 2010 at 7:15 AM, pig hadoopn...@gmail.com wrote:
HI Sriguru,
Thank you for the tips. Just to clarify a few things.
Our machines have 32 GB of RAM.
I'm planning on setting each machine to run 12 mappers and 2 reducers with
the
Thanks, but I think you goes too far to focus on the problem itself.
On Sun, Sep 26, 2010 at 11:43 AM, Nan Zhu zhunans...@gmail.com wrote:
Have you ever check the log file in the directory?
I always find some important information there,
I suggest you to recompile hadoop with ant since
Ah okay,
I did not the fs.inmemory.size.mb setting in any of the default config files
located here:
http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html
http://hadoop.apache.org/common/docs/r0.20.2/core-default.html
http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
The default is 100MB for InMemory File System:
int size = Integer.parseInt(conf.get(fs.inmemory.size.mb, 100));
./src/core/org/apache/hadoop/fs/InMemoryFileSystem.java
If you want to change its value, you can put it in core-site.xml
On Mon, Sep 27, 2010 at 9:29 AM, ed hadoopn...@gmail.com
Ed,
Your math is right - 1400 MB would be a good setting for io.sort.mb.
fs.inmemorysize.mb - what Hadoop version are you using? I suspect that this
may be deprecated. If it is supported, you can set it to 1400 MB too. I know it
is recognized by .21 (and older versions).
Increasing
Couple of things you can try.
1. Increase the Heap Size for the tasks.
2. Since, your OOM happening randomly, try setting
-XX:+HeapDumpOnOutOfMemoryError for your child JVM parameters. Atleast you can
detect, why your heap growing -is it due to a leak ? or if you need to increase
the heap
Hello,
I am trying to run a biagram count on a 12-node cluster setup. For an input
file of 135 splits (around 7.5 GB), the job fails for some of the runs.
The error that I get on the jobtracker that out of 135 mappers, 1 of the
mapper fails because of
Too many fetch-failures
Too many
I have seen this before if your hosts are not setup so every data node can
contact every other data node by the host name registered. Typically when
adding a new node ( or initial cluster setup ) making sure the internal DNS is
updated or that the hosts file is updated on all data nodes as such
Hi Jon,
I am not sure, If i get the last part correctly. The job doesn't fail
consistently. However, DNS setup seems perfectly fine.
Could you give me some pointers to debug the problem for the exact cause.
thanks in advance.
On Mon, Sep 27, 2010 at 9:01 PM, Joe Stein
On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley kwi...@keithwiley.com wrote:
On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com
wrote:
Is there a particularly good reason for why the hadoop fs command
supports
-cat and
We are looking for ways to prevent Hadoop daemon logs from piling up
(over time they can reach several tens of GB and become a nuisance).
Unfortunately, the log4j DRFA class doesn't seem to provide an easy
way to limit the number of files it creates. I would like to try
switching to RFA with set
On Sep 27, 2010, at 13:46 , Edward Capriolo wrote:
On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley kwi...@keithwiley.com wrote:
On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley kwi...@keithwiley.com
wrote:
Is there a particularly good reason
On Thu, Sep 23, 2010 at 6:20 PM, Konstantin Shvachko s...@yahoo-inc.comwrote:
Hi Shen,
Why do we need CheckpointNode?
1. First of all it is a compatible replacement of SecondaryNameNode.
2. Checkpointing is also needed for periodically compacting edits.
You can do it with CN or BN, but CN
19 matches
Mail list logo