Author: lmccay
Date: Mon Apr 13 00:18:34 2020
New Revision: 1876435

URL: http://svn.apache.org/viewvc?rev=1876435&view=rev
Log:
add the docs for KnoxShell User Guide part2

Added:
    knox/site/books/knox-1-4-0/covid19-nj-agg-from-webhdfs-1.png   (with props)
    knox/site/books/knox-1-4-0/covid19-persistence.png   (with props)
    knox/site/books/knox-1-4-0/covid19csv-1.png   (with props)
    knox/site/books/knox-1-4-0/covid19nj-1.png   (with props)
    knox/site/books/knox-1-4-0/covid19nj-aggregate-1.png   (with props)
    knox/site/books/knox-1-4-0/covid19nj-put-webhdfs-1.png   (with props)
    knox/site/books/knox-1-4-0/covid19nj.png   (with props)
    knox/site/books/knox-1-4-0/fs-mount-login-1.png   (with props)
    knox/site/books/knox-1-4-0/knoxline-splash-2.png   (with props)
    knox/site/books/knox-1-4-0/knoxshell-help.png   (with props)
    knox/site/books/knox-1-4-0/knoxshell_user_guide.html

Added: knox/site/books/knox-1-4-0/covid19-nj-agg-from-webhdfs-1.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/covid19-nj-agg-from-webhdfs-1.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/covid19-nj-agg-from-webhdfs-1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/covid19-persistence.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/covid19-persistence.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/covid19-persistence.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/covid19csv-1.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/covid19csv-1.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/covid19csv-1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/covid19nj-1.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/covid19nj-1.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/covid19nj-1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/covid19nj-aggregate-1.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/covid19nj-aggregate-1.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/covid19nj-aggregate-1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/covid19nj-put-webhdfs-1.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/covid19nj-put-webhdfs-1.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/covid19nj-put-webhdfs-1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/covid19nj.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/covid19nj.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/covid19nj.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/fs-mount-login-1.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/fs-mount-login-1.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/fs-mount-login-1.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/knoxline-splash-2.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/knoxline-splash-2.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/knoxline-splash-2.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/knoxshell-help.png
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/knoxshell-help.png?rev=1876435&view=auto
==============================================================================
Binary file - no diff available.

Propchange: knox/site/books/knox-1-4-0/knoxshell-help.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: knox/site/books/knox-1-4-0/knoxshell_user_guide.html
URL: 
http://svn.apache.org/viewvc/knox/site/books/knox-1-4-0/knoxshell_user_guide.html?rev=1876435&view=auto
==============================================================================
--- knox/site/books/knox-1-4-0/knoxshell_user_guide.html (added)
+++ knox/site/books/knox-1-4-0/knoxshell_user_guide.html Mon Apr 13 00:18:34 
2020
@@ -0,0 +1,242 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<link href="book.css" rel="stylesheet"/>
+<img src="knox-logo.gif" alt="Knox"/>
+<img src="apache-logo.gif" align="right" alt="Apache"/>
+<h1><a id="KnoxShell+User+Guide">KnoxShell User Guide</a> <a 
href="#KnoxShell+User+Guide"><img src="markbook-section-link.png"/></a></h1>
+<p>Apache Knox version: 1.4.0</p>
+<ul>
+  <li><a href="#Introduction">Introduction</a></li>
+  <li><a href="#Representing+and+Working+with+Tabular+Data">Representing and 
Working with Tabular Data</a></li>
+  <li><a href="#KnoxShellTable">KnoxShellTable</a>
+    <ul>
+      <li><a href="#Builders">Builders</a></li>
+    </ul>
+  </li>
+  <li><a href="#Usecases">Usecases</a>
+    <ul>
+      <li><a href="#JDBC+Resultset+Representations">JDBC Resultset 
Representations</a></li>
+      <li><a href="#CSV+Representations">CSV Representations</a></li>
+      <li><a href="#General+Table+Operations">General Table Operations</a></li>
+      <li><a href="#Persistence+and+Publishing">Persistence and 
Publishing</a></li>
+      <li><a href="#KnoxLine+SQL+Shell">KnoxLine SQL Shell</a></li>
+      <li><a href="#Custom+GroovySh+Commands">Custom GroovySh Commands</a></li>
+    </ul>
+  </li>
+  <li><a href="#JDBC+Resultset+Representations">JDBC Resultset 
Representations</a></li>
+  <li><a href="#CSV+Representations">CSV Representations</a></li>
+  <li><a href="#General+Table+Operations">General Table Operations</a>
+    <ul>
+      <li><a href="#Sorting">Sorting</a></li>
+      <li><a href="#Selecting">Selecting</a></li>
+      <li><a href="#Filtering">Filtering</a></li>
+      <li><a href="#Fluent+API">Fluent API</a></li>
+      <li><a href="#Aggregating">Aggregating</a></li>
+    </ul>
+  </li>
+  <li><a href="#KnoxLine+SQL+Shell">KnoxLine SQL Shell</a></li>
+  <li><a href="#Custom+GroovySh+Commands">Custom GroovySh Commands</a>
+    <ul>
+      <li><a href="#KnoxShell+Commands:">KnoxShell Commands:</a></li>
+    </ul>
+  </li>
+  <li><a href="#EXAMPLE:+COVID19+Data+Flow+into+DataLake">EXAMPLE: COVID19 
Data Flow into DataLake</a>
+    <ul>
+      <li><a href="#Build+Table+from+Public+CSV+File">Build Table from Public 
CSV File</a></li>
+      <li><a href="#Select+Columns,+Filter+and+Sort+by+Column">Select Columns, 
Filter and Sort by Column</a></li>
+      <li><a href="#Aggregate+Calculations+on+Columns+of+Table">Aggregate 
Calculations on Columns of Table</a></li>
+      <li><a href="#Persist+Tables+to+Local+Disk">Persist Tables to Local 
Disk</a></li>
+      <li><a href="#Add+Tables+to+DataLake">Add Tables to DataLake</a></li>
+      <li><a href="#Building+KnoxShell+Truststore">Building KnoxShell 
Truststore</a></li>
+      <li><a href="#Mount+a+WebHDFS+Filesystem">Mount a WebHDFS 
Filesystem</a></li>
+      <li><a href="#Accessing+a+Filesystem">Accessing a Filesystem</a></li>
+      <li><a href="#Put+Tables+into+DataLake">Put Tables into DataLake</a></li>
+      <li><a href="#Pull+CSV+Files+from+WebHDFS+and+Create+Tables">Pull CSV 
Files from WebHDFS and Create Tables</a></li>
+    </ul>
+  </li>
+</ul>
+<h2><a id="Introduction">Introduction</a> <a href="#Introduction"><img 
src="markbook-section-link.png"/></a></h2>
+<p>The KnoxShell environment has been extended to provide more of an 
interactive experience through the use of custom commands and the newly added 
KnoxShellTable rendering and dataset representation class. This is provided 
through by integrating the power of groovysh extensions and the KnoxShell 
client classes/SDK and make for some really powerful command line capabilities 
that would otherwise require the user to SSH to a node within the cluster and 
use CLIs of different tools or components.</p>
+<p>This document will cover the various KnoxShell extentions and how to use 
them on their own and describe combinations of them as flows for working with 
tabular data from various sources.</p>
+<h2><a id="Representing+and+Working+with+Tabular+Data">Representing and 
Working with Tabular Data</a> <a 
href="#Representing+and+Working+with+Tabular+Data"><img 
src="markbook-section-link.png"/></a></h2>
+<p>The ability to read, write and work with tabular data formats such as CSV 
files, JDBC resultsets and others is core to the motivations of this KnoxShell 
oriented work. Intentions include: the ability to read arbitrary data from 
sources from inside a proxied cluster or from external sources, the ability to 
render the resulting tables, sort the table, filter it for specific subsets of 
the data and do some interesting calculations that can provide simple insights 
into your data.</p>
+<p>KnoxShellTable represents those core capabilties with its simple 
representation of a table, operation methods and builder classes.</p>
+<h2><a id="KnoxShellTable">KnoxShellTable</a> <a href="#KnoxShellTable"><img 
src="markbook-section-link.png"/></a></h2>
+<p>KnoxShellTable has a number of dedicated builders that have a fluent API 
for building table representations from various sources.</p>
+<h3><a id="Builders">Builders</a> <a href="#Builders"><img 
src="markbook-section-link.png"/></a></h3>
+<p>The following builders aid in the creation of tables from various types of 
data sources.</p>
+<h4><a id="JDBC">JDBC</a> <a href="#JDBC"><img 
src="markbook-section-link.png"/></a></h4>
+<pre><code><br/>    ports = KnoxShellTable.builder().jdbc().
+      
connect(&quot;jdbc:hive2://knox-host:8443/;ssl=true;transportMode=http;httpPath=topology/cdp-proxy-api/hive&quot;).
+      driver(&quot;org.apache.hive.jdbc.HiveDriver&quot;).
+      username(&quot;lmccay&quot;).pwd(&quot;xxxx&quot;).
+      sql(&quot;select * FROM ports&quot;);
+</code></pre>
+<p>Running the above within KnoxShell will submit the provided SQL to HS2, 
create and assign a new KnoxShellTable instance to the &ldquo;ports&rdquo; 
variable representing the border ports of entry data.</p>
+<h4><a id="CSV">CSV</a> <a href="#CSV"><img 
src="markbook-section-link.png"/></a></h4>
+<pre><code>crossings = KnoxShellTable.builder().csv().
+       withHeaders().
+       url(&quot;file:///home/lmccay/Border_Crossing_Entry_Data.csv&quot;)
+</code></pre>
+<p>Running the above within KnoxShell will import a CSV file from local disk, 
create and assign a new KnoxShellTable instance to the &ldquo;result&rdquo; 
variable.</p>
+<p>A higher level KnoxShell Custom Command allows for easier use of the 
builder through more natural syntax and hides the use of the lower level 
classes and syntax.</p>
+<h4><a id="Join">Join</a> <a href="#Join"><img 
src="markbook-section-link.png"/></a></h4>
+<pre><code>crossings = KnoxShellTable.builder().join().
+  left(ports).
+  right(crossings).
+  on(&quot;code&quot;,&quot;Port Code&quot;
+</code></pre>
+<p>Running the above within KnoxShell will import a join the two tables with a 
simple match of the values in left and right tables on each row that 
matches.</p>
+<h4><a id="JSON">JSON</a> <a href="#JSON"><img 
src="markbook-section-link.png"/></a></h4>
+<pre><code>tornados = KnoxShellTable.builder().json().
+  url(&quot;file:///home/lmccay/.knoxshell/.tables/tornados.json&quot;)
+</code></pre>
+<p>Running the above within KnoxShell will rematerialize a table that was 
persisted as JSON and assign it to a local &ldquo;tornados&rdquo; variable.</p>
+<h4><a id="Persistence+and+Publishing">Persistence and Publishing</a> <a 
href="#Persistence+and+Publishing"><img 
src="markbook-section-link.png"/></a></h4>
+<p>Being able to create tables, combine them with other datasets, filter them 
and add new cols based on calculations between cols, etc is all great for 
creating tables in memory and working with them.</p>
+<p>We also want to be able to persist these tables in a KnoxShellTable 
canonical JSON format of its own and be able to reload the same datasets 
later.</p>
+<p>We also want to be able to take a given dataset and publish it as a brand 
new CSV file that can be pushed into HDFS, saved to local disk, written to 
cloud storage, etc.</p>
+<p>In addition, we may want to be able to write it directly to Hive or another 
JDBC datasource.</p>
+<h5><a id="JSON">JSON</a> <a href="#JSON"><img 
src="markbook-section-link.png"/></a></h5>
+<pre><code>tornados.toJSON()
+</code></pre>
+<p>The above will return and render a JSON representation of the tornados 
KnoxShellTable including: headers, rows, optionally title and optionally 
callHistory.</p>
+<h5><a id="CSV">CSV</a> <a href="#CSV"><img 
src="markbook-section-link.png"/></a></h5>
+<pre><code>tornados.toCSV()
+</code></pre>
+<p>The above will return and render a CSV representation of the tornados 
KnoxShellTable including: headers (if present), and all rows.</p>
+<p>Note that title and callhistory which are KnoxShellTable specifics are 
excluded and lost unless also saved as JSON.</p>
+<h2><a id="Usecases">Usecases</a> <a href="#Usecases"><img 
src="markbook-section-link.png"/></a></h2>
+<ul>
+  <li>JDBC Resultset Representations</li>
+  <li>CSV Representations</li>
+  <li>General Table Operations</li>
+  <li>Joining</li>
+  <li>Sorting, Selecting, Filtering, Calculations</li>
+  <li>Persistence and Publishing</li>
+  <li>KnoxLine SQL Shell</li>
+  <li>Custom GroovySh Commands</li>
+</ul>
+<p>Let&rsquo;s take a look at each usecase.</p>
+<h2><a id="JDBC+Resultset+Representations">JDBC Resultset Representations</a> 
<a href="#JDBC+Resultset+Representations"><img 
src="markbook-section-link.png"/></a></h2>
+<p>KnoxLine SQL Client requires a tabular representation of the data from a 
SQL/JDBC Resultset. This requirement led to the creation of the KnoxShellTable 
JDBC Builder. It may be used outside of KnoxLine within your own Java clients 
or groovy scripts leveraging the KnoxShell classes.</p>
+<pre><code>ports = KnoxShellTable.builder().jdbc().
+  
connect(&quot;jdbc:hive2://knox-host:8443/;ssl=true;transportMode=http;httpPath=topology/datalake-api/hive&quot;).
+  driver(&quot;org.apache.hive.jdbc.HiveDriver&quot;).
+  username(&quot;lmccay&quot;).pwd(&quot;xxxx&quot;).
+  sql(&quot;select * FROM ports&quot;);
+</code></pre>
+<p>It can create the cols based on the metadata of the resultset and 
accurately represent the data and perform type specific operations, sorts, 
etc.</p>
+<p>A higher level KnoxShell Custom Command allows for the use of this builder 
with Datasources that are managed within the KnoxShell environment and 
persisted to the users&rsquo; home directory to allow continued use across 
sessions. This command hides the use of the underlying classes and syntax and 
allows the user to concentrate on SQL.</p>
+<h2><a id="CSV+Representations">CSV Representations</a> <a 
href="#CSV+Representations"><img src="markbook-section-link.png"/></a></h2>
+<p>Another dedicated table builder is provided for creating a table from a CSV 
file that is imported via URL.</p>
+<p>Combined with all the general table operations and ability to join them 
with other KnoxShellTable representations, this allows for CSV data to be 
combined with JDBC datasets, filtered and republished as a new dataset or 
report to be rendered or even reexecuted later.</p>
+<h2><a id="General+Table+Operations">General Table Operations</a> <a 
href="#General+Table+Operations"><img src="markbook-section-link.png"/></a></h2>
+<p>In addition to the builders described above, there are a number of 
operations that may be executed on the table itself.</p>
+<h3><a id="Sorting">Sorting</a> <a href="#Sorting"><img 
src="markbook-section-link.png"/></a></h3>
+<pre><code>tornados.sort(&quot;state&quot;)
+</code></pre>
+<p>When a column is of String type values but they are numerics, you may also 
sort numerically.</p>
+<pre><code>tornados.sortNumeric(&quot;count&quot;)
+</code></pre>
+<p>The above will sort the tornados table by the &ldquo;state&rdquo; 
column.</p>
+<h3><a id="Selecting">Selecting</a> <a href="#Selecting"><img 
src="markbook-section-link.png"/></a></h3>
+<pre><code>tornados.select(&quot;state,cat,inj,fat,date,month,day,year&quot;)
+</code></pre>
+<p>The above will return and render a new table with only the subset of cols 
selected.</p>
+<h3><a id="Filtering">Filtering</a> <a href="#Filtering"><img 
src="markbook-section-link.png"/></a></h3>
+<pre><code>tornados.filter().name(&quot;fat&quot;).greaterThan(0)
+</code></pre>
+<p>The above will return and render a table with only those tornados that 
resulted in one or more fatalities.</p>
+<h3><a id="Fluent+API">Fluent API</a> <a href="#Fluent+API"><img 
src="markbook-section-link.png"/></a></h3>
+<p>The above operations can be combined in a natural, fluent manner</p>
+<pre><code>tornados.select(&quot;state,cat,inj,fat,date,month,day,year&quot;).
+
+  filter().name(&quot;fat&quot;).greaterThan(0).
+
+  sort(&quot;state&quot;)
+</code></pre>
+<h3><a id="Aggregating">Aggregating</a> <a href="#Aggregating"><img 
src="markbook-section-link.png"/></a></h3>
+<p>The following method allows for the use of table column calculations to 
build an aggregate view of helpful calculations for multiple columns in a table 
and summarizes them in a new table representation.</p>
+<pre><code>table.aggregate().columns(&quot;col1, col2, 
col3&quot;).functions(&quot;min,max,mean,median,mode,sum&quot;)
+</code></pre>
+<p>The above allows you to combine them by streaming them into each other in 
one line the select of only certain cols, the filtering of only those events 
with more than 0 fatalities and the much more efficient sort of the resulting 
table.</p>
+<h2><a id="KnoxLine+SQL+Shell">KnoxLine SQL Shell</a> <a 
href="#KnoxLine+SQL+Shell"><img src="markbook-section-link.png"/></a></h2>
+<p>KnoxLine is a beeline like facility built into the KnoxShell client toolbox 
with basic datasource management and simple SQL client capabilities. ResultSets 
are rendered via KnoxShellTable but further table based manipulations are not 
available within the knoxline shell. This is purely dedicated to SQL 
interactions and table renderings.</p>
+<p>For leveraging the SQL builder of KnoxShellTable to be able to operate on 
the results locally, see the custom KnoxShell command &lsquo;SQL&rsquo;.</p>
+<p><img src="knoxline-splash-2.png" /></p>
+<p>Once connected to the datasource, SQL commands may be invoked via the 
command line directly.</p>
+<h2><a id="Custom+GroovySh+Commands">Custom GroovySh Commands</a> <a 
href="#Custom+GroovySh+Commands"><img src="markbook-section-link.png"/></a></h2>
+<p>Groovy shell has the ability to extend the commands available to help 
automate scripting or coding that you would otherwise need to do 
programmatically over and over.</p>
+<p>By providing custom commands for KnoxShellTable operations, builders and 
manipulation we can greatly simplify what would need to be done with the fluent 
API of KnoxShellTable and groovy/java code for saving state, etc.</p>
+<h3><a id="KnoxShell+Commands:">KnoxShell Commands:</a> <a 
href="#KnoxShell+Commands:"><img src="markbook-section-link.png"/></a></h3>
+<ol>
+  <li><strong>Datasources</strong> (:datasource|:ds) CRUD and select 
operations for a set of JDBC datasources that are persisted to disk 
(KNOX-2128)</li>
+  <li><strong>SQL</strong> (:SQL|:sql) SQL query execution with persisted SQL 
history per datasource (KNOX-2128)</li>
+  <li><strong>CSV</strong> (:CSV|:csv) Import and Export from CSV and JSON 
formats</li>
+  <li><strong>Filesystem</strong> (:Filesystem|:fs) POSIX style commands for 
HDFS and cloud storage (mount, unmount, mounts, ls, rm, mkdir, cat, put, 
etc)</li>
+</ol>
+<p><img src="knoxshell-help.png" /></p>
+<h2><a id="EXAMPLE:+COVID19+Data+Flow+into+DataLake">EXAMPLE: COVID19 Data 
Flow into DataLake</a> <a href="#EXAMPLE:+COVID19+Data+Flow+into+DataLake"><img 
src="markbook-section-link.png"/></a></h2>
+<p>Let&rsquo;s start to put the commands and table capabilities together to 
consume some public tabular data and usher it into our datalake or cluster.</p>
+<h3><a id="Build+Table+from+Public+CSV+File">Build Table from Public CSV 
File</a> <a href="#Build+Table+from+Public+CSV+File"><img 
src="markbook-section-link.png"/></a></h3>
+<p><img src="covid19csv-1.png" /></p>
+<p>The use of the CSV KnoxShell command above can be easily correlated to the 
CSV builder of KnoxShellTable. It is obviously less verbose and more natural 
than using the fluent API of KnoxShellTable directly and also leverages a 
separate capability for KnoxShell to assign the resulting table to a KnoxShell 
variable that can be references and manipulated afterward.</p>
+<p>As you can see the result of creating the table from a CSV file is a 
rendering of the entire table and often does not fit the screen propertly. This 
is where the operations on the resulting table come in handy for explorer the 
dataset. Let&rsquo;s filter the above dataset of COVID19 across the world to 
only a subset of columns and for only New Jersey by selecting, filtering and 
sorting numerically by number of Confirmed cases.</p>
+<h3><a id="Select+Columns,+Filter+and+Sort+by+Column">Select Columns, Filter 
and Sort by Column</a> <a 
href="#Select+Columns,+Filter+and+Sort+by+Column"><img 
src="markbook-section-link.png"/></a></h3>
+<p>First we will interrogate the table for its column names or headers. Then 
we will select only those columns that we want in order to fit it to the 
screen, filter it for only New Jersey information and sort numerically by the 
number of Confirmed cases per county.</p>
+<p><img src="covid19nj-1.png" alt="COVID19NJ-1" /></p>
+<p>From the above operation, we can now see the COVID19 data for New Jersey 
counties for 4/10/2020 sorted by the number of Confirmed cases and the subset 
of cols of the most interest and tailored to fit our screen. From the above 
table, we can visually see a number of insights in terms of the most affected 
counties across the state of New Jersey but it may be more interesting to be 
able to see an aggregation of some of the calculations available for numeric 
columns through KnoxShellTable. Let&rsquo;s take a look at an aggregate table 
for this dataset.</p>
+<h3><a id="Aggregate+Calculations+on+Columns+of+Table">Aggregate Calculations 
on Columns of Table</a> <a 
href="#Aggregate+Calculations+on+Columns+of+Table"><img 
src="markbook-section-link.png"/></a></h3>
+<p>Since the KnoxShellTable fluent API allows us to chain such operations 
together easily, we will just hit the up arrow to get the previous table 
operation command and add the aggregate operation to the chain.</p>
+<p><img src="covid19nj-aggregate-1.png" /></p>
+<p>Now, by using both tables above, we can see that my county of Camden is 
both visually in approximately the center of the counties in terms of Confirmed 
case numbers but how it stands related to both the average and the median 
calculations. You can also see the sum of all of New Jersey and the number of 
those that belong to my county.</p>
+<h3><a id="Persist+Tables+to+Local+Disk">Persist Tables to Local Disk</a> <a 
href="#Persist+Tables+to+Local+Disk"><img 
src="markbook-section-link.png"/></a></h3>
+<p>Next, we will persist these tables to our local disk and then push them 
into our HDFS based datalake for access by cluster resources and other 
users.</p>
+<p><img src="covid19-persistence.png" /></p>
+<h3><a id="Add+Tables+to+DataLake">Add Tables to DataLake</a> <a 
href="#Add+Tables+to+DataLake"><img src="markbook-section-link.png"/></a></h3>
+<p>Now that we have these tables persisted to local disk, we can use our 
KnoxShell Filesystem commands to add them to the datalake.</p>
+<h3><a id="Building+KnoxShell+Truststore">Building KnoxShell Truststore</a> <a 
href="#Building+KnoxShell+Truststore"><img 
src="markbook-section-link.png"/></a></h3>
+<p>Before we can access resources from datalake behind Knox we need to insure 
that the cert presented by the Knox instance is trusted. If the deployment is 
using certs signed by a well-known ca, then we generally don&rsquo;t have to do 
anything. If we are using Knox self-signed certs or certs signed by an internal 
ca of some sort then we must import them into the KnoxShell truststore. While 
this can be located in arbitrary places and configured via system properties 
and environment variables, the most common approach is to use then default 
location.</p>
+<pre><code>// exit knoxshell
+^C
+
+bin/knoxshell.sh buildTrustStore 
https://nightly7x-1.nightly7x.root.hwx.site:8443/gateway/datalake-api
+
+ls -l ~/gateway-client-trust.jks
+
+// to reenter knoxshell
+bin/knoxshell.sh
+</code></pre>
+<p>With the default password of &lsquo;changeit&rsquo;.</p>
+<h3><a id="Mount+a+WebHDFS+Filesystem">Mount a WebHDFS Filesystem</a> <a 
href="#Mount+a+WebHDFS+Filesystem"><img 
src="markbook-section-link.png"/></a></h3>
+<p>We may now mount a filesystem from the remote Knox instance by mounting the 
topology that hosts the WebHDFS API endpoint.</p>
+<pre><code>:fs mount 
https://nightly7x-1.nightly7x.root.hwx.site:8443/gateway/datalake-api nightly
+</code></pre>
+<h3><a id="Accessing+a+Filesystem">Accessing a Filesystem</a> <a 
href="#Accessing+a+Filesystem"><img src="markbook-section-link.png"/></a></h3>
+<p>Once we have the desired mount, we may now access it by specifying the 
mountpoint name as the path prefix into the HDFS filesystem. Upon mounting or 
first access, the KnoxShell will prompt for user credentials for use as HTTP 
Basic credentials while accessing WebHDFS API.</p>
+<p><img src="fs-mount-login-1.png" /></p>
+<p>Once we authenticate to the mounted filesystem, we reference it by 
mountpoint and never concern ourselves with the actual URL to the endpoint.</p>
+<h3><a id="Put+Tables+into+DataLake">Put Tables into DataLake</a> <a 
href="#Put+Tables+into+DataLake"><img src="markbook-section-link.png"/></a></h3>
+<p><img src="covid19nj-put-webhdfs-1.png" /></p>
+<p>Above, we have put the previously persisted CSV files into the tmp 
directory of the mounted filesystem to be available to other datalake users.</p>
+<p>We can now also access them from any other KnoxShell instance that has 
mounted this filesystem with appropriate credentials. Let&rsquo;s now cat the 
contents of one of the CSV files into the KnoxShell and then render it as a 
table from the raw CSV format.</p>
+<h3><a id="Pull+CSV+Files+from+WebHDFS+and+Create+Tables">Pull CSV Files from 
WebHDFS and Create Tables</a> <a 
href="#Pull+CSV+Files+from+WebHDFS+and+Create+Tables"><img 
src="markbook-section-link.png"/></a></h3>
+<p><img src="covid19-nj-agg-from-webhdfs-1.png" /></p>
+<p>Note that the cat command returns the CSV file contents as a string to the 
KnoxShell environment as a variable called &rsquo;_&rsquo; .</p>
+<p>This is true of any command in groovysh or KnoxShell. The previous result 
is always available as this variable. Here we pass the contents of the variable 
to the CSV KnoxShellTable builder string() method. This is a very convenient 
way to render tabular data from a cat&rsquo;d file from your remote datalake. 
</p>
+<p>Also note that tables that are assigned to variables within KnoxShell will 
render themselves just by typing the variable name.</p>
\ No newline at end of file


Reply via email to