Author: kminder
Date: Wed Sep 17 17:06:34 2014
New Revision: 1625685

URL: http://svn.apache.org/r1625685
Log:
Updates for HDFS HA support.

Modified:
    knox/site/books/knox-0-4-0/deployment-overview.png
    knox/site/books/knox-0-4-0/deployment-provider.png
    knox/site/books/knox-0-4-0/deployment-service.png
    knox/site/books/knox-0-4-0/runtime-overview.png
    knox/site/books/knox-0-4-0/runtime-request-processing.png
    knox/site/books/knox-0-5-0/knox-0-5-0.html
    knox/trunk/books/0.5.0/service_webhdfs.md

Modified: knox/site/books/knox-0-4-0/deployment-overview.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-overview.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/deployment-provider.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-provider.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/deployment-service.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-service.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/runtime-overview.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/runtime-overview.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/runtime-request-processing.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/runtime-request-processing.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-5-0/knox-0-5-0.html
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-0-5-0/knox-0-5-0.html?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
--- knox/site/books/knox-0-5-0/knox-0-5-0.html (original)
+++ knox/site/books/knox-0-5-0/knox-0-5-0.html Wed Sep 17 17:06:34 2014
@@ -1625,7 +1625,7 @@ dep/commons-codec-1.7.jar
   </tbody>
 </table><p>However, there is a subtle difference to URLs that are returned by 
WebHDFS in the Location header of many requests. Direct WebHDFS requests may 
return Location headers that contain the address of a particular Data Node. The 
gateway will rewrite these URLs to ensure subsequent requests come back through 
the gateway and internal cluster details are protected.</p><p>A WebHDFS request 
to the Node Node to retrieve a file will return a URL of the form below in the 
Location header.</p>
 <pre><code>http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?...
-</code></pre><p>Note that this URL contains the newtwork location of a Data 
Node. The gateway will rewrite this URL to look like the URL below.</p>
+</code></pre><p>Note that this URL contains the network location of a Data 
Node. The gateway will rewrite this URL to look like the URL below.</p>
 
<pre><code>https://{gateway-host}:{gateway-port}/{gateway-path}/{custer-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
 </code></pre><p>The <code>{encrypted-query-parameters}</code> will contain the 
<code>{datanode-host}</code> and <code>{datanode-port}</code> information. This 
information along with the original query parameters are encrypted so that the 
internal Hadoop details are protected.</p><h4><a 
id="WebHDFS+Examples"></a>WebHDFS Examples</h4><p>The examples below upload a 
file, download the file and list the contents of the directory.</p><h5><a 
id="WebHDFS+via+client+DSL"></a>WebHDFS via client DSL</h5><p>You can use the 
Groovy example scripts and interpreter provided with the distribution.</p>
 <pre><code>java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy
@@ -1763,7 +1763,30 @@ session.shutdown()
   <ul>
     <li><code>Hdfs.rm( session ).file( &quot;/user/guest/example&quot; 
).recursive().now()</code></li>
   </ul></li>
-</ul><h3><a id="WebHCat"></a>WebHCat</h3><p>WebHCat is a related but separate 
service from Hive. As such it is installed and configured independently. The <a 
href="https://cwiki.apache.org/confluence/display/Hive/WebHCat";>WebHCat wiki 
pages</a> describe this processes. In sandbox this configuration file for 
WebHCat is located at /etc/hadoop/hcatalog/webhcat-site.xml. Note the 
properties shown below as they are related to configuration required by the 
gateway.</p>
+</ul><h3><a id="WebHDFS+HA"></a>WebHDFS HA</h3><p>Knox provides basic failover 
and retry functionality for REST API calls made to WebHDFS when HDFS HA has 
been configured and enabled.</p><p>To enable HA functionality for WebHDFS in 
Knox the following configuration has to be added to the topology file.</p>
+<pre><code>&lt;provider&gt;
+   &lt;role&gt;ha&lt;/role&gt;
+   &lt;name&gt;HaProvider&lt;/name&gt;
+   &lt;enabled&gt;true&lt;/enabled&gt;
+   &lt;param&gt;
+       &lt;name&gt;WEBHDFS&lt;/name&gt;
+       
&lt;value&gt;maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true&lt;/value&gt;
+   &lt;/param&gt;
+&lt;/provider&gt;
+</code></pre><p>The role and name of the provider above must be as shown. The 
name in the &lsquo;param&rsquo; section must match that of the service role 
name that is being configured for HA and the value in the &lsquo;param&rsquo; 
section is the configuration for that particular service in HA mode. In this 
case the name is &lsquo;WEBHDFS&rsquo;.</p><p>The various configuration 
parameters are described below:</p>
+<ul>
+  <li><p>maxFailoverAttempts - This is the maximum number of times a failover 
will be attempted. The failover strategy at this time is very simplistic in 
that the next URL in the list of URLs provided for the service is used and the 
one that failed is put at the bottom of the list. If the list is exhausted and 
the maximum number of attempts is not reached then the first URL that failed 
will be tried again (the list will start again from the original top 
entry).</p></li>
+  <li><p>failoverSleep - The amount of time in millis that the process will 
wait or sleep before attempting to failover.</p></li>
+  <li><p>maxRetryAttempts - The is the maximum number of times that a retry 
request will be attempted. Unlike failover, the retry is done on the same URL 
that failed. This is a special case in HDFS when the node is in safe mode. The 
expectation is that the node will come out of safe mode so a retry is desirable 
here as opposed to a failover.</p></li>
+  <li><p>retrySleep - The amount of time in millis that the process will wait 
or sleep before a retry is issued.</p></li>
+  <li><p>enabled - Flag to turn the particular service on or off for 
HA.</p></li>
+</ul><p>And for the service configuration itself the additional URLs that 
standby nodes should be added to the list. The active URL (at the time of 
configuration) should ideally be added to the top of the list.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;WEBHDFS&lt;/role&gt;
+    &lt;url&gt;http://{host1}:50070/webhdfs&lt;/url&gt;
+    &lt;url&gt;http://{host2}:50070/webhdfs&lt;/url&gt;
+&lt;/service&gt;
+</code></pre><h3><a id="WebHCat"></a>WebHCat</h3><p>WebHCat is a related but 
separate service from Hive. As such it is installed and configured 
independently. The <a 
href="https://cwiki.apache.org/confluence/display/Hive/WebHCat";>WebHCat wiki 
pages</a> describe this processes. In sandbox this configuration file for 
WebHCat is located at /etc/hadoop/hcatalog/webhcat-site.xml. Note the 
properties shown below as they are related to configuration required by the 
gateway.</p>
 <pre><code>&lt;property&gt;
     &lt;name&gt;templeton.port&lt;/name&gt;
     &lt;value&gt;50111&lt;/value&gt;

Modified: knox/trunk/books/0.5.0/service_webhdfs.md
URL: 
http://svn.apache.org/viewvc/knox/trunk/books/0.5.0/service_webhdfs.md?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
--- knox/trunk/books/0.5.0/service_webhdfs.md (original)
+++ knox/trunk/books/0.5.0/service_webhdfs.md Wed Sep 17 17:06:34 2014
@@ -77,7 +77,7 @@ A WebHDFS request to the Node Node to re
 
     http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?...
 
-Note that this URL contains the newtwork location of a Data Node.
+Note that this URL contains the network location of a Data Node.
 The gateway will rewrite this URL to look like the URL below.
 
     
https://{gateway-host}:{gateway-port}/{gateway-path}/{custer-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
@@ -234,6 +234,61 @@ Use can use cURL to directly invoke the 
     * `Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now()`
 
 
+### WebHDFS HA ###
+
+Knox provides basic failover and retry functionality for REST API calls made 
to WebHDFS when HDFS HA has been 
+configured and enabled.
+
+To enable HA functionality for WebHDFS in Knox the following configuration has 
to be added to the topology file.
+
+    <provider>
+       <role>ha</role>
+       <name>HaProvider</name>
+       <enabled>true</enabled>
+       <param>
+           <name>WEBHDFS</name>
+           
<value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
+       </param>
+    </provider>
+    
+The role and name of the provider above must be as shown. The name in the 
'param' section must match that of the service 
+role name that is being configured for HA and the value in the 'param' section 
is the configuration for that particular
+service in HA mode. In this case the name is 'WEBHDFS'.
+
+The various configuration parameters are described below:
+     
+* maxFailoverAttempts - 
+This is the maximum number of times a failover will be attempted. The failover 
strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and 
the one that failed is put at the bottom 
+of the list. If the list is exhausted and the maximum number of attempts is 
not reached then the first URL that failed 
+will be tried again (the list will start again from the original top entry).
+
+* failoverSleep - 
+The amount of time in millis that the process will wait or sleep before 
attempting to failover.
+
+* maxRetryAttempts - 
+The is the maximum number of times that a retry request will be attempted. 
Unlike failover, the retry is done on the 
+same URL that failed. This is a special case in HDFS when the node is in safe 
mode. The expectation is that the node will
+come out of safe mode so a retry is desirable here as opposed to a failover.
+
+* retrySleep - 
+The amount of time in millis that the process will wait or sleep before a 
retry is issued.
+
+* enabled - 
+Flag to turn the particular service on or off for HA.
+
+And for the service configuration itself the additional URLs that standby 
nodes should be added to the list. The active 
+URL (at the time of configuration) should ideally be added to the top of the 
list.
+
+
+    <service>
+        <role>WEBHDFS</role>
+        <url>http://{host1}:50070/webhdfs</url>
+        <url>http://{host2}:50070/webhdfs</url>
+    </service>
+    
+
+
 
 
 


Reply via email to