Re: How to rebuild the shared edits directory

2012-05-08 Thread Todd Lipcon
Hi Jeff, Check out HDFS-3077. We'll probably need the most help when it comes time to do testing. Any testing you can do on the current HA solution, non-ideal as it may be, is also immensely valuable. For example, if you can reproduce the case where it didn't exit upon loss of shared edits, that w

Re: How to rebuild the shared edits directory

2012-05-08 Thread Jeff Whiting
Thanks for being patient and listening to my rants. I'm excited to see hdfs continue to move forward. If the organization I'm working for was willing spend some resources to help speed this process up, where should be start looking? I'm sure there are quite a few jiras on these issues. Most

Re: How to rebuild the shared edits directory

2012-05-08 Thread Todd Lipcon
On Tue, May 8, 2012 at 12:38 PM, Jeff Whiting wrote: > It seems the NN was originally written with the assumption that disks fail > and stuff happens.  Hence the ability to have multiple directories store > your NN data even though each directory is mostly likely redundant / HA. > > [start rant] >

Re: How to rebuild the shared edits directory

2012-05-08 Thread Jeff Whiting
It seems the NN was originally written with the assumption that disks fail and stuff happens. Hence the ability to have multiple directories store your NN data even though each directory is mostly likely redundant / HA. [start rant] My opinion is that it is a step backwards that the shared ed

Re: How to rebuild the shared edits directory

2012-05-08 Thread Nathaniel Cook
On Tue, May 8, 2012 at 11:44 AM, Todd Lipcon wrote: > On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook > wrote: >> We ran the initializeSharedEdits command and it didn't have any >> effect, but that my be because of the weird state we got it in. >> >> So help me understand: I was under the assumpt

Re: How to rebuild the shared edits directory

2012-05-08 Thread Todd Lipcon
On Tue, May 8, 2012 at 10:33 AM, Nathaniel Cook wrote: > We ran the initializeSharedEdits command and it didn't have any > effect, but that my be because of the weird state we got it in. > > So help me understand: I was under the assumption that if shared edits > went away you would lose the abili

Re: How to rebuild the shared edits directory

2012-05-08 Thread Nathaniel Cook
We ran the initializeSharedEdits command and it didn't have any effect, but that my be because of the weird state we got it in. So help me understand: I was under the assumption that if shared edits went away you would lose the ability to failover and that is it. The active namenode would still fu

Re: Does NameNode directory need fine-grained path lock?

2012-05-08 Thread Todd Lipcon
Hi Denny, Do you see an issue in practice? Fine grained locking makes sense if you hold the lock while doing disk IO. But since the structure is all in-memory, it's far less important, and in fact could reduce performance in some cases. Until you hit a few thousand nodes in your cluster, lock co

Re: How to rebuild the shared edits directory

2012-05-08 Thread Todd Lipcon
On Tue, May 8, 2012 at 7:46 AM, Nathaniel Cook wrote: > We have be working with an HA hdfs cluster, testing several failover > scenarios.  We have a small cluster of 4 machines spun up for testing. > We run a namenode on two of the machines and hosted an nfs share on > the third for the shared edi

How to rebuild the shared edits directory

2012-05-08 Thread Nathaniel Cook
We have be working with an HA hdfs cluster, testing several failover scenarios. We have a small cluster of 4 machines spun up for testing. We run a namenode on two of the machines and hosted an nfs share on the third for the shared edits directory. The fourth machine is just a datanode. We configu

Does NameNode directory need fine-grained path lock?

2012-05-08 Thread Denny Ye
hi guys, Currently, NameNode uses read-write lock at top root folder. In my opinion, it's too huge for whole namespace with million of file/folder. Meanwhile, a few level folder on top of the namespace directory has been set with different application. Does we need lesser lock level for such ap