The NFS mount is to be soft-mounted; so if the NFS goes down, the NN ejects it out and continues with the local disk. If auto-restore is configured, it will re-add the NFS if its detected good again later.
On Wed, Jan 16, 2013 at 7:04 AM, randy <randy...@comcast.net> wrote: > What happens to the NN and/or performance if there's a problem with the > NFS server? Or the network? > > Thanks, > randy > > > On 01/14/2013 11:36 PM, Harsh J wrote: > >> Its very rare to observe an NN crash due to a software bug in >> production. Most of the times its a hardware fault you should worry about. >> >> On 1.x, or any non-HA-carrying release, the best you can get to >> safeguard against a total loss is to have redundant disk volumes >> configured, one preferably over a dedicated remote NFS mount. This way >> the NN is recoverable after the node goes down, since you can retrieve a >> current copy from another machine (i.e. via the NFS mount) and set a new >> node up to replace the older NN and continue along. >> >> A load balancer will not work as the NN is not a simple webserver - it >> maintains state which you cannot sync. We wrote HA-HDFS features to >> address the very concern you have. >> >> If you want true, painless HA, branch-2 is your best bet at this point. >> An upcoming 2.0.3 release should include the QJM based HA features that >> is painless to setup and very reliable to use (over other options), and >> works with commodity level hardware. FWIW, we've (my team and I) been >> supporting several users and customers who're running the 2.x based HA >> in production and other types of environments and it has been greatly >> stable in our experience. There are also some folks in the community >> running 2.x based HDFS for HA/else. >> >> >> On Tue, Jan 15, 2013 at 6:55 AM, Panshul Whisper <ouchwhis...@gmail.com >> <mailto:ouchwhis...@gmail.com>**> wrote: >> >> Hello, >> >> Is there a standard way to prevent the failure of Namenode crash in >> a Hadoop cluster? >> or what is the standard or best practice for overcoming the Single >> point failure problem of Hadoop. >> >> I am not ready to take chances on a production server with Hadoop >> 2.0 Alpha release, which claims to have solved the problem. Are >> there any other things I can do to either prevent the failure or >> recover from the failure in a very short time. >> >> Thanking You, >> >> -- >> Regards, >> Ouch Whisper >> 010101010101 >> >> >> >> >> -- >> Harsh J >> > > -- Harsh J