NameNode - didn't persist the edit log

2011-12-15 Thread Guy Doulberg
Hi guys, We recently had the following problem on our production cluster: The filesystem containing the editlog and fsimage had no free inodes. As a result the namenode wasn't able to obtain an inode for the fsimage and editlog after a checkpiot has been reached, while the previous files we

Re: NameNode - didn't persist the edit log

2011-12-15 Thread Todd Lipcon
Hi Guy, Several questions come to mind here: - What was the exact WARN level message you saw? - Did you have multiple dfs.name.dirs configured as recommended by most setup guides? - Did you try entering safemode and then running saveNamespace to persist the image before shutting down the NN? This

Re: NameNode - didn't persist the edit log

2011-12-15 Thread Guy Doulberg
Hi Todd, you are right I should be more specific: 1. from the namenode log: 2011-12-11 08:57:23,245 WARN org.apache.hadoop.hdfs.server.common.Storage: rollEdidLog: removing storage /srv/hadoop/hdfs/edit 2011-12-11 08:57:23,311 WARN org.apache.hadoop.hdfs.server.common.Storage: incrementCheckpo

Re: NameNode - didn't persist the edit log

2011-12-17 Thread Todd Lipcon
Hi Guy, Eli has been looking into these issues and it looks like you found a nasty bug. You can follow these JIRAs to track resolution: HDFS-2701, HDFS-2702, HDFS-2703. I think in particular HDFS-2703 is the one that bit you here. -Todd On Thu, Dec 15, 2011 at 2:06 AM, Guy Doulberg wrote: > Hi