This is an automated email from the ASF dual-hosted git repository. jmark99 pushed a commit to branch 2.1 in repository https://gitbox.apache.org/repos/asf/accumulo-examples.git
The following commit(s) were added to refs/heads/2.1 by this push: new ae45b3d Add documentation for uniquecols example (#115) ae45b3d is described below commit ae45b3db10ed71576910eedc1c169331f4fdefbe Author: Mark Owens <jmar...@apache.org> AuthorDate: Thu Jan 26 16:00:46 2023 -0500 Add documentation for uniquecols example (#115) Add documentation for the uniquecols examples for the 2.1 branch. --- docs/uniquecols.md | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 69 insertions(+), 2 deletions(-) diff --git a/docs/uniquecols.md b/docs/uniquecols.md index 46b6a30..714fcb2 100644 --- a/docs/uniquecols.md +++ b/docs/uniquecols.md @@ -17,7 +17,74 @@ limitations under the License. # Apache Accumulo Unique Columns example The UniqueColumns examples ([UniqueColumns.java]) computes the unique set -of columns in a table and shows how a map reduce job can directly read a -tables files from HDFS. +of column family and column qualifiers in a table. It also demonstrates +how a mapReduce job can directly read a tables files from HDFS. + +Create a table and add rows that all have identical column family and column +qualifiers. + +``` +$ /path/to/accumulo shell -u username -p secret +username@instance> createnamespace examples +username@instance> createtable examples.unique +username@instance> examples.unique> insert row1 fam1 qual1 v1 +username@instance> examples.unique> insert row2 fam1 qual1 v2 +username@instance> examples.unique> insert row3 fam1 qual1 v3 +``` + +Exit the Accumulo shell and run the uniqueColumns mapReduce job against +this table. Note that if the output file already exists in HDFS, it will +need to be deleted. + +``` +$ ./bin/runmr mapreduce.UniqueColumns --table examples.unique --reducers 1 --output /tmp/unique +``` + +When the mapReduce job completes, examine the output. + +``` +$ hdfs dfs -cat /tmp/unique/part-r-00000 +cf:fam1 +cq:qual1 +``` + +The output displays the unique column family and column qualifier values. In +this case since all rows use the same values, there are only two values output. + +Note that since the example used only one reducer all output will be contained +within the single `part-r-00000` file. If more than one reducer is used the output +will be spread among various `part-r-xxxxx` files. + +Go back to the shell and add some additional entries. + +```text +$ /path/to/accumulo shell -u username -p secret +username@instance> table unique +username@instance example.unique> insert row1 fam2 qual2 v2 +username@instance example.unique> insert row1 fam3 qual2 v2 +username@instance example.unique> insert row1 fam2 qual2 v2 +username@instance example.unique> insert row2 fam2 qual2 v2 +username@instance example.unique> insert row3 fam2 qual2 v2 +username@instance example.unique> insert row3 fam3 qual3 v2 +username@instance example.unique> insert row3 fam3 qual4 v2 +``` + +Re-running the command will now find any additional unique column values. + +```text +$ hdfs dfs -rm -r -f /tmp/unique +$ ./bin/runmr mapreduce.UniqueColumns --table examples.unique --reducers 1 --output /tmp/unique +$ hdfs dfs -cat /tmp/unique/part-r-00000 +cf:fam1 +cf:fam2 +cf:fam3 +cq:qual1 +cq:qual2 +cq:qual3 +cq:qual4 +``` + +The output now includes the additional column values that were added during the last batch of inserts. + [UniqueColumns.java]: ../src/main/java/org/apache/accumulo/examples/mapreduce/UniqueColumns.java