Tom, I would file a jira, if I were you and my Hadoop Version was recent enough. Should be pretty easy to reproduce.
Jens Am Donnerstag, 26. September 2013 schrieb Tom Brown : > They were created and deleted in quick succession. I thought that meant > the edits for both the create and delete would be logically next to each > other in the file allowing it to release the memory almost as soon as it > had been allocated. > > In any case, after finding a VM host that could give me more RAM, I was > able to get the namenode started. The process used 25GB at it's peak. > > Thanks for your help! > > > On Thu, Sep 26, 2013 at 11:07 AM, Harsh J <ha...@cloudera.com> wrote: > > Tom, > > That is valuable info. When we "replay" edits, we would be creating > and then deleting those files - so memory would grow in between until > the delete events begin appearing in the edit log segment. > > On Thu, Sep 26, 2013 at 10:07 PM, Tom Brown <tombrow...@gmail.com> wrote: > > A simple estimate puts the total number of blocks somewhere around > 500,000. > > Due to an HBase bug (HBASE-9648), there were approximately 50,000,000 > files > > that were created and quickly deleted (about 10/sec for 6 weeks) in the > > cluster, and that activity is what is contained in the edits. > > > > Since those files don't exist (quickly created and deleted), shouldn't > they > > be inconsequential to the memory requirements of the namenode as it > starts > > up. > > > > --Tom > > > > > > On Thu, Sep 26, 2013 at 10:25 AM, Nitin Pawar <nitinpawar...@gmail.com> > > wrote: > >> > >> Can you share how many blocks does your cluster have? how many > >> directories? how many files? > >> > >> There is a JIRA https://issues.apache.org/jira/browse/HADOOP-1687 which > >> explains how much RAM will be used for your namenode. > >> Its pretty old by hadoop version but its a good starting point. > >> > >> According to Cloudera's blog "A good rule of thumb is to assume 1GB of > >> NameNode memory for every 1 million blocks stored in the distributed > file > >> system" > >> > >> > http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ > >> > >> > >> > >> On Thu, Sep 26, 2013 at 9:26 PM, Tom Brown <tombrow...@gmail.com> > wrote: > >>> > >>> It ran again for about 15 hours before dying again. I'm seeing what > extra > >>> RAM resources we can throw at this VM (maybe up to 32GB), but until > then I'm > >>> trying to figure out if I'm hitting some strange bug. > >>> > >>> When the edits were originally made (over the course of 6 weeks), the > >>> namenode only had 512MB and was able to contain the filesystem > completely in > >>> memory. I don't understand why it's running out of memory. If 512MB was > >>> enough while the edits were first made, shouldn't it be enough to > process > >>> them again? > >>> > >>> --Tom > >>> > >>> > >>> On Thu, Sep 26, 2013 at 6:05 AM, Harsh J <ha...@cloudera.com> wrote: > >>>> > >>>> Hi Tom, > >>>> > >>>> The edits are processed sequentially, and aren't all held in memory. > >>>> Right now there's no mid-way-checkpoint when it is loaded, such that > >>>> it could resume only with remaining work if interrupted. Normally this > >>>> is not a problem in deployments given that SNN or SBN runs for > >>>> checkpointing the images and keeping the edits collection small > >>>> periodically. > >>>> > >>>> If your NameNode is running out of memory _applying_ the edits, then > >>>> the cause is not the edits but a growing namespace. You most-likely > >>>> have more files now than before, and thats going to take up permanent > >>>> memory from the NameNode heap size. > >>>> > >>>> On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown <tombrow...@gmail.com> > wrote: > >>>> > Unfortunately, I cannot give it that much RAM. The machine has 4GB > >>>> > total > >>>> > (though could be expanded somewhat-- it's a VM). > >>>> > > >>>> > Though if each edit is processed sequentially (in a > >