Re: Namenode automatic failover - how to handle WebHDFS URL?

Chris Nauroth Wed, 08 Jun 2016 12:29:58 -0700

Hello Vamsi,

A general-purpose HTTP client like curl won't have knowledge of the HA failover 
mechanism, so unfortunately it won't be possible to craft the URL in a certain 
way so that it can failover automatically.


However, Hadoop ships with the WebHdfsFileSystem class, which does have 
awareness of HA failover.  If your web application is coded in Java, or has a 
reasonable way to bridge over to Java, then you could take advantage of that 
class.  This class gets executed when running Hadoop shell commands that 
reference a URI containing the webhdfs: scheme.  For example:

hdfs dfs -ls webhdfs://127.0.0.1:50070/

You could also get an instance of WebHdfsFileSystem by calling FileSystem#get 
with a Configuration object that sets fs.defaultFS to a webhdfs: URI, or call 
the overload of FileSystem#get that accepts an explicit URI argument.

--Chris Nauroth

From: Vamsi Krishna <vamsi.attl...@gmail.com<mailto:vamsi.attl...@gmail.com>>
Date: Wednesday, June 8, 2016 at 10:35 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Namenode automatic failover - how to handle WebHDFS URL?


Hi,

How to handle WebHDFS URL in case of Namenode automatic failover in HA HDFS 
Cluster?


HDFS CLI:

HDFS URI: hdfs://<HOST>:<RPC_PORT>/<PATH>

When working with HDFS CLI replacing the '<HOST>:<RPC_PORT>' with 
'DFS.NAMESERVICES' (from hdfs-site.xml) value in the HDFS URI is fetching me 
the same result as with '<HOST>:<RPC_PORT>'.

By using the 'DFS.NAMESERVICES' in the HDFS URI I do not need to change my HDFS 
CLI commands in case of Namenode automatic failover.

Example:

hdfs dfs -ls hdfs://<HOST>:<RPC_PORT>/<PATH>

hdfs dfs -ls hdfs://<DFS.NAMESERVICES>/<PATH>


WebHDFS:

WebHDFS URL: http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...

Is there a way to frame the WebHDFS URL so that we don't have to change the URL 
(host) in case of Namenode automatic failover (failover from namenode-1 to 
namenode-2)?

http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS

Scenario:
I have a web application which uses WebHDFS HTTP request to read data files 
from Hadoop cluster.
I would like to know if there is a way to make the web application work without 
any downtime in case of Namenode automatic failover (failover from namenode-1 
to namenode-2)

Thanks,
Vamsi Attluri
--
Vamsi Attluri

Re: Namenode automatic failover - how to handle WebHDFS URL?

Reply via email to