After adding nodes to 0.20.2 cluster, getting "Could not complete file" errors and hung JobTracker

2010-10-15 Thread Bobby Dennett
Hi all, We are currently in the process of replacing the servers in our Hadoop 0.20.2 production cluster and in the last couple of days have experienced an error similar to the following (from the JobTracker log) several times, which then appears to hang the JobTracker: 2010-10-15 04:13:38,980 IN

Scheduler recommendation

2010-08-11 Thread Bobby Dennett
Hi all, >From what I've read/seen, it appears that, if not the "default" scheduler, most installations are using Hadoop's Fair Scheduler. Based on features and our requirements, we're leaning towards using the Capacity Scheduler; however, there is some concern that it may not be as "stable" as the

Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Bobby Dennett
t; > Outside of making the changes to the mapred-site.xml file, with your > setup would do you view as the biggest pain point? > > Josh Patterson > Cloudera > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett > wrote: >> We are looking to enable LZO compression of the map

Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Bobby Dennett
We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your thoughts are regarding which one would be best for our Hadoop distribution and OS (Ubuntu 8.04 64-bit). In particular, had

Client can override "final" dfs.replication value

2010-08-03 Thread Bobby Dennett
We have recently decreased the dfs.replication value on our cluster from 3 to 2 and see behavior similar to that described in issue HADOOP-2270 (https://issues.apache.org/jira/browse/HADOOP-2270?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel). Even though the parameter dfs.rep

Re: Preventing/Limiting NotReplicatedYetException exceptions

2010-07-30 Thread Bobby Dennett
ov" wrote: Hi Bobby, On Mon, Jul 26, 2010 at 10:32 AM, Bobby Dennett <[1]softw...@bobby.fastmail.us> wrote: Just following up again as this issue is becoming a high priority for us since it is affecting a critical process... Can anyone provide some insight as to what you wo

Re: Preventing/Limiting NotReplicatedYetException exceptions

2010-07-26 Thread Bobby Dennett
K On Wed, Jul 21, 2010 at 5:02 PM, Bobby Dennett <[1]softw...@bobby.fastmail.us> wrote: Hi all, We recently finished migrating from a modified v0.19.1 Apache Hadoop cluster to a v0.20.1+169.68 Cloudera Hadoop cluster and now encounter org.apache.hadoop.hdfs.server.namenode.

Fwd: Re: Preventing/Limiting NotReplicatedYetException exceptions

2010-07-21 Thread Bobby Dennett
fied while a client is writing to it (like in the case if speculative execution is not implemented correctly and the two tasks are writing into the same file) - Network problems (like dropped frames) Hope this helps to debug your specific issue, Alex K On Wed, Jul 21, 2010 at 5:

Is it safe to set default/minimum replication to 2?

2010-07-21 Thread Bobby Dennett
The team that manages our Hadoop clusters is currently being pressured to reduce block replication from 3 to 2 in our production cluster. This request is for various reasons -- particularly the reduction of used space in the cluster and potential of reduced write operations -- but from what I've re

Preventing/Limiting NotReplicatedYetException exceptions

2010-07-21 Thread Bobby Dennett
Hi all, We recently finished migrating from a modified v0.19.1 Apache Hadoop cluster to a v0.20.1+169.68 Cloudera Hadoop cluster and now encounter org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException exceptions periodically, which end up affecting at least one of our production process

Re: Hadoop JobTracker Hanging

2010-06-23 Thread Bobby Dennett
Thanks for the latest round of suggestions. We will definitely check out compressed object pointers and are looking into what we can do regarding the JT history. As I mentioned previously, we are working on getting stronger servers for the NN/JT node and the secondary NN node (similar to worka

RE: Hadoop JobTracker Hanging

2010-06-21 Thread Bobby Dennett
Thanks all for your suggestions (please note that Tan is my co-worker; we are both working to try and resolve this issue)... we experienced another hang this weekend and increased the HADOOP_HEAPSIZE setting to 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java heap space" errors