Added: nifi/site/trunk/docs/nifi-docs/html/nifi-in-depth.html
URL: 
http://svn.apache.org/viewvc/nifi/site/trunk/docs/nifi-docs/html/nifi-in-depth.html?rev=1794596&view=auto
==============================================================================
--- nifi/site/trunk/docs/nifi-docs/html/nifi-in-depth.html (added)
+++ nifi/site/trunk/docs/nifi-docs/html/nifi-in-depth.html Tue May  9 15:27:39 
2017
@@ -0,0 +1,856 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge"><![endif]-->
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="generator" content="Asciidoctor 1.5.2">
+<meta name="author" content="Apache NiFi Team">
+<title>Apache NiFi In Depth</title>
+<style>
+/* Asciidoctor default stylesheet | MIT License | http://asciidoctor.org */
+/* Copyright (C) 2012-2015 Dan Allen, Ryan Waldron and the Asciidoctor Project
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE. */
+/* Remove the comments around the @import statement below when using this as a 
custom stylesheet */
+@import 
"https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400";;
+article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}
+audio,canvas,video{display:inline-block}
+audio:not([controls]){display:none;height:0}
+[hidden],template{display:none}
+script{display:none!important}
+html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}
+body{margin:0}
+a{background:transparent}
+a:focus{outline:thin dotted}
+a:active,a:hover{outline:0}
+h1{font-size:2em;margin:.67em 0}
+abbr[title]{border-bottom:1px dotted}
+b,strong{font-weight:bold}
+dfn{font-style:italic}
+hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}
+mark{background:#ff0;color:#000}
+code,kbd,pre,samp{font-family:monospace;font-size:1em}
+pre{white-space:pre-wrap}
+q{quotes:"\201C" "\201D" "\2018" "\2019"}
+small{font-size:80%}
+sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}
+sup{top:-.5em}
+sub{bottom:-.25em}
+img{border:0}
+svg:not(:root){overflow:hidden}
+figure{margin:0}
+fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}
+legend{border:0;padding:0}
+button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}
+button,input{line-height:normal}
+button,select{text-transform:none}
+button,html 
input[type="button"],input[type="reset"],input[type="submit"]{-webkit-appearance:button;cursor:pointer}
+button[disabled],html input[disabled]{cursor:default}
+input[type="checkbox"],input[type="radio"]{box-sizing:border-box;padding:0}
+input[type="search"]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}
+input[type="search"]::-webkit-search-cancel-button,input[type="search"]::-webkit-search-decoration{-webkit-appearance:none}
+button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}
+textarea{overflow:auto;vertical-align:top}
+table{border-collapse:collapse;border-spacing:0}
+*,*:before,*:after{-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box}
+html,body{font-size:100%}
+body{background:#fff;color:rgba(0,0,0,.8);padding:0;margin:0;font-family:"Noto 
Serif","DejaVu 
Serif",serif;font-weight:400;font-style:normal;line-height:1;position:relative;cursor:auto}
+a:hover{cursor:pointer}
+img,object,embed{max-width:100%;height:auto}
+object,embed{height:100%}
+img{-ms-interpolation-mode:bicubic}
+#map_canvas img,#map_canvas embed,#map_canvas object,.map_canvas 
img,.map_canvas embed,.map_canvas object{max-width:none!important}
+.left{float:left!important}
+.right{float:right!important}
+.text-left{text-align:left!important}
+.text-right{text-align:right!important}
+.text-center{text-align:center!important}
+.text-justify{text-align:justify!important}
+.hide{display:none}
+.antialiased,body{-webkit-font-smoothing:antialiased}
+img{display:inline-block;vertical-align:middle}
+textarea{height:auto;min-height:50px}
+select{width:100%}
+p.lead,.paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type 
p{font-size:1.21875em;line-height:1.6}
+.subheader,.admonitionblock 
td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{line-height:1.45;color:#7a2518;font-weight:400;margin-top:0;margin-bottom:.25em}
+div,dl,dt,dd,ul,ol,li,h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6,pre,form,p,blockquote,th,td{margin:0;padding:0;direction:ltr}
+a{color:#2156a5;text-decoration:underline;line-height:inherit}
+a:hover,a:focus{color:#1d4b8f}
+a img{border:none}
+p{font-family:inherit;font-weight:400;font-size:1em;line-height:1.6;margin-bottom:1.25em;text-rendering:optimizeLegibility}
+p aside{font-size:.875em;line-height:1.35;font-style:italic}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{font-family:"Open 
Sans","DejaVu 
Sans",sans-serif;font-weight:300;font-style:normal;color:#ba3925;text-rendering:optimizeLegibility;margin-top:1em;margin-bottom:.5em;line-height:1.0125em}
+h1 small,h2 small,h3 small,#toctitle small,.sidebarblock>.content>.title 
small,h4 small,h5 small,h6 small{font-size:60%;color:#e99b8f;line-height:0}
+h1{font-size:2.125em}
+h2{font-size:1.6875em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.375em}
+h4,h5{font-size:1.125em}
+h6{font-size:1em}
+hr{border:solid #ddddd8;border-width:1px 0 0;clear:both;margin:1.25em 0 
1.1875em;height:0}
+em,i{font-style:italic;line-height:inherit}
+strong,b{font-weight:bold;line-height:inherit}
+small{font-size:60%;line-height:inherit}
+code{font-family:"Droid Sans Mono","DejaVu Sans 
Mono",monospace;font-weight:400;color:rgba(0,0,0,.9);padding-right: 1px;}
+ul,ol,dl{font-size:1em;line-height:1.6;margin-bottom:1.25em;list-style-position:outside;font-family:inherit}
+ul,ol,ul.no-bullet,ol.no-bullet{margin-left:1.5em}
+ul li ul,ul li ol{margin-left:1.25em;margin-bottom:0;font-size:1em}
+ul.square li ul,ul.circle li ul,ul.disc li ul{list-style:inherit}
+ul.square{list-style-type:square}
+ul.circle{list-style-type:circle}
+ul.disc{list-style-type:disc}
+ul.no-bullet{list-style:none}
+ol li ul,ol li ol{margin-left:1.25em;margin-bottom:0}
+dl dt{margin-bottom:.3125em;font-weight:bold}
+dl dd{margin-bottom:1.25em}
+abbr,acronym{text-transform:uppercase;font-size:90%;color:rgba(0,0,0,.8);border-bottom:1px
 dotted #ddd;cursor:help}
+abbr{text-transform:none}
+blockquote{margin:0 0 1.25em;padding:.5625em 1.25em 0 1.1875em;border-left:1px 
solid #ddd}
+blockquote cite{display:block;font-size:.9375em;color:rgba(0,0,0,.6)}
+blockquote cite:before{content:"\2014 \0020"}
+blockquote cite a,blockquote cite a:visited{color:rgba(0,0,0,.6)}
+blockquote,blockquote p{line-height:1.6;color:rgba(0,0,0,.85)}
+@media only screen and 
(min-width:768px){h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2}
+h1{font-size:2.75em}
+h2{font-size:2.3125em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.6875em}
+h4{font-size:1.4375em}}table{background:#fff;margin-bottom:1.25em;border:solid 
1px #dedede}
+table thead,table tfoot{background:#f7f8f7;font-weight:bold}
+table thead tr th,table thead tr td,table tfoot tr th,table tfoot tr 
td{padding:.5em .625em 
.625em;font-size:inherit;color:rgba(0,0,0,.8);text-align:left}
+table tr th,table tr td{padding:.5625em 
.625em;font-size:inherit;color:rgba(0,0,0,.8)}
+table tr.even,table tr.alt,table tr:nth-of-type(even){background:#f8f8f7}
+table thead tr th,table tfoot tr th,table tbody tr td,table tr td,table tfoot 
tr td{display:table-cell;line-height:1.6}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2;word-spacing:-.05em}
+h1 strong,h2 strong,h3 strong,#toctitle strong,.sidebarblock>.content>.title 
strong,h4 strong,h5 strong,h6 strong{font-weight:400}
+.clearfix:before,.clearfix:after,.float-group:before,.float-group:after{content:"
 ";display:table}
+.clearfix:after,.float-group:after{clear:both}
+*:not(pre)>code{font-size:.9375em;font-style:normal!important;letter-spacing:0;word-spacing:-.15em;background-color:#f7f7f8;-webkit-border-radius:4px;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed}
+pre,pre>code{line-height:1.45;color:rgba(0,0,0,.9);font-family:"Droid Sans 
Mono","DejaVu Sans Mono",monospace;font-weight:400;text-rendering:optimizeSpeed}
+.keyseq{color:rgba(51,51,51,.8)}
+kbd{display:inline-block;color:rgba(0,0,0,.8);font-size:.75em;line-height:1.4;background-color:#f7f7f7;border:1px
 solid #ccc;-webkit-border-radius:3px;border-radius:3px;-webkit-box-shadow:0 
1px 0 rgba(0,0,0,.2),0 0 0 .1em white inset;box-shadow:0 1px 0 rgba(0,0,0,.2),0 
0 0 .1em #fff inset;margin:-.15em .15em 0 .15em;padding:.2em .6em .2em 
.5em;vertical-align:middle;white-space:nowrap}
+.keyseq kbd:first-child{margin-left:0}
+.keyseq kbd:last-child{margin-right:0}
+.menuseq,.menu{color:rgba(0,0,0,.8)}
+b.button:before,b.button:after{position:relative;top:-1px;font-weight:400}
+b.button:before{content:"[";padding:0 3px 0 2px}
+b.button:after{content:"]";padding:0 2px 0 3px}
+p a>code:hover{color:rgba(0,0,0,.9)}
+#header,#content,#footnotes,#footer{width:100%;margin-left:auto;margin-right:auto;margin-top:0;margin-bottom:0;max-width:62.5em;*zoom:1;position:relative;padding-left:.9375em;padding-right:.9375em}
+#header:before,#header:after,#content:before,#content:after,#footnotes:before,#footnotes:after,#footer:before,#footer:after{content:"
 ";display:table}
+#header:after,#content:after,#footnotes:after,#footer:after{clear:both}
+#content{margin-top:1.25em}
+#content:before{content:none}
+#header>h1:first-child{color:rgba(0,0,0,.85);margin-top:2.25rem;margin-bottom:0}
+#header>h1:first-child+#toc{margin-top:8px;border-top:1px solid #ddddd8}
+#header>h1:only-child,body.toc2 #header>h1:nth-last-child(2){border-bottom:1px 
solid #ddddd8;padding-bottom:8px}
+#header .details{border-bottom:1px solid 
#ddddd8;line-height:1.45;padding-top:.25em;padding-bottom:.25em;padding-left:.25em;color:rgba(0,0,0,.6);display:-ms-flexbox;display:-webkit-flex;display:flex;-ms-flex-flow:row
 wrap;-webkit-flex-flow:row wrap;flex-flow:row wrap}
+#header .details span:first-child{margin-left:-.125em}
+#header .details span.email a{color:rgba(0,0,0,.85)}
+#header .details br{display:none}
+#header .details br+span:before{content:"\00a0\2013\00a0"}
+#header .details 
br+span.author:before{content:"\00a0\22c5\00a0";color:rgba(0,0,0,.85)}
+#header .details br+span#revremark:before{content:"\00a0|\00a0"}
+#header #revnumber{text-transform:capitalize}
+#header #revnumber:after{content:"\00a0"}
+#content>h1:first-child:not([class]){color:rgba(0,0,0,.85);border-bottom:1px 
solid 
#ddddd8;padding-bottom:8px;margin-top:0;padding-top:1rem;margin-bottom:1.25rem}
+#toc{border-bottom:1px solid #efefed;padding-bottom:.5em}
+#toc>ul{margin-left:.125em}
+#toc ul.sectlevel0>li>a{font-style:italic}
+#toc ul.sectlevel0 ul.sectlevel1{margin:.5em 0}
+#toc ul{font-family:"Open Sans","DejaVu Sans",sans-serif;list-style-type:none}
+#toc a{text-decoration:none}
+#toc a:active{text-decoration:underline}
+#toctitle{color:#7a2518;font-size:1.2em}
+@media only screen and (min-width:768px){#toctitle{font-size:1.375em}
+body.toc2{padding-left:15em;padding-right:0}
+#toc.toc2{margin-top:0!important;background-color:#f8f8f7;position:fixed;width:15em;left:0;top:0;border-right:1px
 solid 
#efefed;border-top-width:0!important;border-bottom-width:0!important;z-index:1000;padding:1.25em
 1em;height:100%;overflow:auto}
+#toc.toc2 #toctitle{margin-top:0;font-size:1.2em}
+#toc.toc2>ul{font-size:.9em;margin-bottom:0}
+#toc.toc2 ul ul{margin-left:0;padding-left:1em}
+#toc.toc2 ul.sectlevel0 
ul.sectlevel1{padding-left:0;margin-top:.5em;margin-bottom:.5em}
+body.toc2.toc-right{padding-left:0;padding-right:15em}
+body.toc2.toc-right #toc.toc2{border-right-width:0;border-left:1px solid 
#efefed;left:auto;right:0}}@media only screen and 
(min-width:1280px){body.toc2{padding-left:20em;padding-right:0}
+#toc.toc2{width:20em}
+#toc.toc2 #toctitle{font-size:1.375em}
+#toc.toc2>ul{font-size:.95em}
+#toc.toc2 ul ul{padding-left:1.25em}
+body.toc2.toc-right{padding-left:0;padding-right:20em}}#content 
#toc{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px}
+#content #toc>:first-child{margin-top:0}
+#content #toc>:last-child{margin-bottom:0}
+#footer{max-width:100%;background-color:rgba(0,0,0,.8);padding:1.25em}
+#footer-text{color:rgba(255,255,255,.8);line-height:1.44}
+.sect1{padding-bottom:.625em}
+@media only screen and 
(min-width:768px){.sect1{padding-bottom:1.25em}}.sect1+.sect1{border-top:1px 
solid #efefed}
+#content 
h1>a.anchor,h2>a.anchor,h3>a.anchor,#toctitle>a.anchor,.sidebarblock>.content>.title>a.anchor,h4>a.anchor,h5>a.anchor,h6>a.anchor{position:absolute;z-index:1001;width:1.5ex;margin-left:-1.5ex;display:block;text-decoration:none!important;visibility:hidden;text-align:center;font-weight:400}
+#content 
h1>a.anchor:before,h2>a.anchor:before,h3>a.anchor:before,#toctitle>a.anchor:before,.sidebarblock>.content>.title>a.anchor:before,h4>a.anchor:before,h5>a.anchor:before,h6>a.anchor:before{content:"\00A7";font-size:.85em;display:block;padding-top:.1em}
+#content h1:hover>a.anchor,#content 
h1>a.anchor:hover,h2:hover>a.anchor,h2>a.anchor:hover,h3:hover>a.anchor,#toctitle:hover>a.anchor,.sidebarblock>.content>.title:hover>a.anchor,h3>a.anchor:hover,#toctitle>a.anchor:hover,.sidebarblock>.content>.title>a.anchor:hover,h4:hover>a.anchor,h4>a.anchor:hover,h5:hover>a.anchor,h5>a.anchor:hover,h6:hover>a.anchor,h6>a.anchor:hover{visibility:visible}
+#content 
h1>a.link,h2>a.link,h3>a.link,#toctitle>a.link,.sidebarblock>.content>.title>a.link,h4>a.link,h5>a.link,h6>a.link{color:#ba3925;text-decoration:none}
+#content 
h1>a.link:hover,h2>a.link:hover,h3>a.link:hover,#toctitle>a.link:hover,.sidebarblock>.content>.title>a.link:hover,h4>a.link:hover,h5>a.link:hover,h6>a.link:hover{color:#a53221}
+.audioblock,.imageblock,.literalblock,.listingblock,.stemblock,.videoblock{margin-bottom:1.25em}
+.admonitionblock 
td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{text-rendering:optimizeLegibility;text-align:left;font-family:"Noto
 Serif","DejaVu Serif",serif;font-size:1rem;font-style:italic}
+table.tableblock>caption.title{white-space:nowrap;overflow:visible;max-width:0}
+.paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type 
p{color:rgba(0,0,0,.85)}
+table.tableblock #preamble>.sectionbody>.paragraph:first-of-type 
p{font-size:inherit}
+.admonitionblock>table{border-collapse:separate;border:0;background:none;width:100%}
+.admonitionblock>table td.icon{text-align:center;width:80px}
+.admonitionblock>table td.icon img{max-width:none}
+.admonitionblock>table td.icon .title{font-weight:bold;font-family:"Open 
Sans","DejaVu Sans",sans-serif;text-transform:uppercase}
+.admonitionblock>table 
td.content{padding-left:1.125em;padding-right:1.25em;border-left:1px solid 
#ddddd8;color:rgba(0,0,0,.6)}
+.admonitionblock>table td.content>:last-child>:last-child{margin-bottom:0}
+.exampleblock>.content{border-style:solid;border-width:1px;border-color:#e6e6e6;margin-bottom:1.25em;padding:1.25em;background:#fff;-webkit-border-radius:4px;border-radius:4px}
+.exampleblock>.content>:first-child{margin-top:0}
+.exampleblock>.content>:last-child{margin-bottom:0}
+.sidebarblock{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px}
+.sidebarblock>:first-child{margin-top:0}
+.sidebarblock>:last-child{margin-bottom:0}
+.sidebarblock>.content>.title{color:#7a2518;margin-top:0;text-align:center}
+.exampleblock>.content>:last-child>:last-child,.exampleblock>.content 
.olist>ol>li:last-child>:last-child,.exampleblock>.content 
.ulist>ul>li:last-child>:last-child,.exampleblock>.content 
.qlist>ol>li:last-child>:last-child,.sidebarblock>.content>:last-child>:last-child,.sidebarblock>.content
 .olist>ol>li:last-child>:last-child,.sidebarblock>.content 
.ulist>ul>li:last-child>:last-child,.sidebarblock>.content 
.qlist>ol>li:last-child>:last-child{margin-bottom:0}
+.literalblock pre,.listingblock pre:not(.highlight),.listingblock 
pre[class="highlight"],.listingblock pre[class^="highlight "],.listingblock 
pre.CodeRay,.listingblock pre.prettyprint{background:#f7f7f8}
+.sidebarblock .literalblock pre,.sidebarblock .listingblock 
pre:not(.highlight),.sidebarblock .listingblock 
pre[class="highlight"],.sidebarblock .listingblock pre[class^="highlight 
"],.sidebarblock .listingblock pre.CodeRay,.sidebarblock .listingblock 
pre.prettyprint{background:#f2f1f1}
+.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock 
pre[class]{-webkit-border-radius:4px;border-radius:4px;word-wrap:break-word;padding:1em;font-size:.8125em}
+.literalblock pre.nowrap,.literalblock pre[class].nowrap,.listingblock 
pre.nowrap,.listingblock 
pre[class].nowrap{overflow-x:auto;white-space:pre;word-wrap:normal}
+@media only screen and (min-width:768px){.literalblock pre,.literalblock 
pre[class],.listingblock pre,.listingblock 
pre[class]{font-size:.90625em}}@media only screen and 
(min-width:1280px){.literalblock pre,.literalblock pre[class],.listingblock 
pre,.listingblock pre[class]{font-size:1em}}.literalblock.output 
pre{color:#f7f7f8;background-color:rgba(0,0,0,.9)}
+.listingblock pre.highlightjs{padding:0}
+.listingblock 
pre.highlightjs>code{padding:1em;-webkit-border-radius:4px;border-radius:4px}
+.listingblock pre.prettyprint{border-width:0}
+.listingblock>.content{position:relative}
+.listingblock 
code[data-lang]:before{display:none;content:attr(data-lang);position:absolute;font-size:.75em;top:.425rem;right:.5rem;line-height:1;text-transform:uppercase;color:#999}
+.listingblock:hover code[data-lang]:before{display:block}
+.listingblock.terminal pre 
.command:before{content:attr(data-prompt);padding-right:.5em;color:#999}
+.listingblock.terminal pre .command:not([data-prompt]):before{content:"$"}
+table.pyhltable{border-collapse:separate;border:0;margin-bottom:0;background:none}
+table.pyhltable td{vertical-align:top;padding-top:0;padding-bottom:0}
+table.pyhltable td.code{padding-left:.75em;padding-right:0}
+pre.pygments .lineno,table.pyhltable 
td:not(.code){color:#999;padding-left:0;padding-right:.5em;border-right:1px 
solid #ddddd8}
+pre.pygments .lineno{display:inline-block;margin-right:.25em}
+table.pyhltable .linenodiv{background:none!important;padding-right:0!important}
+.quoteblock{margin:0 1em 1.25em 1.5em;display:table}
+.quoteblock>.title{margin-left:-1.5em;margin-bottom:.75em}
+.quoteblock blockquote,.quoteblock blockquote 
p{color:rgba(0,0,0,.85);font-size:1.15rem;line-height:1.75;word-spacing:.1em;letter-spacing:0;font-style:italic;text-align:justify}
+.quoteblock blockquote{margin:0;padding:0;border:0}
+.quoteblock 
blockquote:before{content:"\201c";float:left;font-size:2.75em;font-weight:bold;line-height:.6em;margin-left:-.6em;color:#7a2518;text-shadow:0
 1px 2px rgba(0,0,0,.1)}
+.quoteblock blockquote>.paragraph:last-child p{margin-bottom:0}
+.quoteblock .attribution{margin-top:.5em;margin-right:.5ex;text-align:right}
+.quoteblock .quoteblock{margin-left:0;margin-right:0;padding:.5em 
0;border-left:3px solid rgba(0,0,0,.6)}
+.quoteblock .quoteblock blockquote{padding:0 0 0 .75em}
+.quoteblock .quoteblock blockquote:before{display:none}
+.verseblock{margin:0 1em 1.25em 1em}
+.verseblock pre{font-family:"Open Sans","DejaVu 
Sans",sans;font-size:1.15rem;color:rgba(0,0,0,.85);font-weight:300;text-rendering:optimizeLegibility}
+.verseblock pre strong{font-weight:400}
+.verseblock .attribution{margin-top:1.25rem;margin-left:.5ex}
+.quoteblock .attribution,.verseblock 
.attribution{font-size:.9375em;line-height:1.45;font-style:italic}
+.quoteblock .attribution br,.verseblock .attribution br{display:none}
+.quoteblock .attribution cite,.verseblock .attribution 
cite{display:block;letter-spacing:-.05em;color:rgba(0,0,0,.6)}
+.quoteblock.abstract{margin:0 0 1.25em 0;display:block}
+.quoteblock.abstract blockquote,.quoteblock.abstract blockquote 
p{text-align:left;word-spacing:0}
+.quoteblock.abstract blockquote:before,.quoteblock.abstract blockquote 
p:first-of-type:before{display:none}
+table.tableblock{max-width:100%;border-collapse:separate}
+table.tableblock td>.paragraph:last-child p>p:last-child,table.tableblock 
th>p:last-child,table.tableblock td>p:last-child{margin-bottom:0}
+table.spread{width:100%}
+table.tableblock,th.tableblock,td.tableblock{border:0 solid #dedede}
+table.grid-all th.tableblock,table.grid-all td.tableblock{border-width:0 1px 
1px 0}
+table.grid-all tfoot>tr>th.tableblock,table.grid-all 
tfoot>tr>td.tableblock{border-width:1px 1px 0 0}
+table.grid-cols th.tableblock,table.grid-cols td.tableblock{border-width:0 1px 
0 0}
+table.grid-all *>tr>.tableblock:last-child,table.grid-cols 
*>tr>.tableblock:last-child{border-right-width:0}
+table.grid-rows th.tableblock,table.grid-rows td.tableblock{border-width:0 0 
1px 0}
+table.grid-all tbody>tr:last-child>th.tableblock,table.grid-all 
tbody>tr:last-child>td.tableblock,table.grid-all 
thead:last-child>tr>th.tableblock,table.grid-rows 
tbody>tr:last-child>th.tableblock,table.grid-rows 
tbody>tr:last-child>td.tableblock,table.grid-rows 
thead:last-child>tr>th.tableblock{border-bottom-width:0}
+table.grid-rows tfoot>tr>th.tableblock,table.grid-rows 
tfoot>tr>td.tableblock{border-width:1px 0 0 0}
+table.frame-all{border-width:1px}
+table.frame-sides{border-width:0 1px}
+table.frame-topbot{border-width:1px 0}
+th.halign-left,td.halign-left{text-align:left}
+th.halign-right,td.halign-right{text-align:right}
+th.halign-center,td.halign-center{text-align:center}
+th.valign-top,td.valign-top{vertical-align:top}
+th.valign-bottom,td.valign-bottom{vertical-align:bottom}
+th.valign-middle,td.valign-middle{vertical-align:middle}
+table thead th,table tfoot th{font-weight:bold}
+tbody tr th{display:table-cell;line-height:1.6;background:#f7f8f7}
+tbody tr th,tbody tr th p,tfoot tr th,tfoot tr th 
p{color:rgba(0,0,0,.8);font-weight:bold}
+p.tableblock>code:only-child{background:none;padding:0}
+p.tableblock{font-size:1em}
+td>div.verse{white-space:pre}
+ol{margin-left:1.75em}
+ul li ol{margin-left:1.5em}
+dl dd{margin-left:1.125em}
+dl dd:last-child,dl dd:last-child>:last-child{margin-bottom:0}
+ol>li p,ul>li p,ul dd,ol dd,.olist .olist,.ulist .ulist,.ulist .olist,.olist 
.ulist{margin-bottom:.625em}
+ul.unstyled,ol.unnumbered,ul.checklist,ul.none{list-style-type:none}
+ul.unstyled,ol.unnumbered,ul.checklist{margin-left:.625em}
+ul.checklist li>p:first-child>.fa-square-o:first-child,ul.checklist 
li>p:first-child>.fa-check-square-o:first-child{width:1em;font-size:.85em}
+ul.checklist 
li>p:first-child>input[type="checkbox"]:first-child{width:1em;position:relative;top:1px}
+ul.inline{margin:0 auto .625em 
auto;margin-left:-1.375em;margin-right:0;padding:0;list-style:none;overflow:hidden}
+ul.inline>li{list-style:none;float:left;margin-left:1.375em;display:block}
+ul.inline>li>*{display:block}
+.unstyled dl dt{font-weight:400;font-style:normal}
+ol.arabic{list-style-type:decimal}
+ol.decimal{list-style-type:decimal-leading-zero}
+ol.loweralpha{list-style-type:lower-alpha}
+ol.upperalpha{list-style-type:upper-alpha}
+ol.lowerroman{list-style-type:lower-roman}
+ol.upperroman{list-style-type:upper-roman}
+ol.lowergreek{list-style-type:lower-greek}
+.hdlist>table,.colist>table{border:0;background:none}
+.hdlist>table>tbody>tr,.colist>table>tbody>tr{background:none}
+td.hdlist1{padding-right:.75em;font-weight:bold}
+td.hdlist1,td.hdlist2{vertical-align:top}
+.literalblock+.colist,.listingblock+.colist{margin-top:-.5em}
+.colist>table tr>td:first-of-type{padding:0 .75em;line-height:1}
+.colist>table tr>td:last-of-type{padding:.25em 0}
+.thumb,.th{line-height:0;display:inline-block;border:solid 4px 
#fff;-webkit-box-shadow:0 0 0 1px #ddd;box-shadow:0 0 0 1px #ddd}
+.imageblock.left,.imageblock[style*="float: left"]{margin:.25em .625em 1.25em 
0}
+.imageblock.right,.imageblock[style*="float: right"]{margin:.25em 0 1.25em 
.625em}
+.imageblock>.title{margin-bottom:0}
+.imageblock.thumb,.imageblock.th{border-width:6px}
+.imageblock.thumb>.title,.imageblock.th>.title{padding:0 .125em}
+.image.left,.image.right{margin-top:.25em;margin-bottom:.25em;display:inline-block;line-height:0}
+.image.left{margin-right:.625em}
+.image.right{margin-left:.625em}
+a.image{text-decoration:none}
+span.footnote,span.footnoteref{vertical-align:super;font-size:.875em}
+span.footnote a,span.footnoteref a{text-decoration:none}
+span.footnote a:active,span.footnoteref a:active{text-decoration:underline}
+#footnotes{padding-top:.75em;padding-bottom:.75em;margin-bottom:.625em}
+#footnotes hr{width:20%;min-width:6.25em;margin:-.25em 0 .75em 
0;border-width:1px 0 0 0}
+#footnotes .footnote{padding:0 
.375em;line-height:1.3;font-size:.875em;margin-left:1.2em;text-indent:-1.2em;margin-bottom:.2em}
+#footnotes .footnote a:first-of-type{font-weight:bold;text-decoration:none}
+#footnotes .footnote:last-of-type{margin-bottom:0}
+#content #footnotes{margin-top:-.625em;margin-bottom:0;padding:.75em 0}
+.gist .file-data>table{border:0;background:#fff;width:100%;margin-bottom:0}
+.gist .file-data>table td.line-data{width:99%}
+div.unbreakable{page-break-inside:avoid}
+.big{font-size:larger}
+.small{font-size:smaller}
+.underline{text-decoration:underline}
+.overline{text-decoration:overline}
+.line-through{text-decoration:line-through}
+.aqua{color:#00bfbf}
+.aqua-background{background-color:#00fafa}
+.black{color:#000}
+.black-background{background-color:#000}
+.blue{color:#0000bf}
+.blue-background{background-color:#0000fa}
+.fuchsia{color:#bf00bf}
+.fuchsia-background{background-color:#fa00fa}
+.gray{color:#606060}
+.gray-background{background-color:#7d7d7d}
+.green{color:#006000}
+.green-background{background-color:#007d00}
+.lime{color:#00bf00}
+.lime-background{background-color:#00fa00}
+.maroon{color:#600000}
+.maroon-background{background-color:#7d0000}
+.navy{color:#000060}
+.navy-background{background-color:#00007d}
+.olive{color:#606000}
+.olive-background{background-color:#7d7d00}
+.purple{color:#600060}
+.purple-background{background-color:#7d007d}
+.red{color:#bf0000}
+.red-background{background-color:#fa0000}
+.silver{color:#909090}
+.silver-background{background-color:#bcbcbc}
+.teal{color:#006060}
+.teal-background{background-color:#007d7d}
+.white{color:#bfbfbf}
+.white-background{background-color:#fafafa}
+.yellow{color:#bfbf00}
+.yellow-background{background-color:#fafa00}
+span.icon>.fa{cursor:default}
+.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 
1px 2px rgba(0,0,0,.5);cursor:default}
+.admonitionblock td.icon .icon-note:before{content:"\f05a";color:#19407c}
+.admonitionblock td.icon .icon-tip:before{content:"\f0eb";text-shadow:1px 1px 
2px rgba(155,155,0,.8);color:#111}
+.admonitionblock td.icon .icon-warning:before{content:"\f071";color:#bf6900}
+.admonitionblock td.icon .icon-caution:before{content:"\f06d";color:#bf3400}
+.admonitionblock td.icon .icon-important:before{content:"\f06a";color:#bf0000}
+.conum[data-value]{display:inline-block;color:#fff!important;background-color:rgba(0,0,0,.8);-webkit-border-radius:100px;border-radius:100px;text-align:center;font-size:.75em;width:1.67em;height:1.67em;line-height:1.67em;font-family:"Open
 Sans","DejaVu Sans",sans-serif;font-style:normal;font-weight:bold}
+.conum[data-value] *{color:#fff!important}
+.conum[data-value]+b{display:none}
+.conum[data-value]:after{content:attr(data-value)}
+pre .conum[data-value]{position:relative;top:-.125em}
+b.conum *{color:inherit!important}
+.conum:not([data-value]):empty{display:none}
+h1,h2{letter-spacing:-.01em}
+dt,th.tableblock,td.content{text-rendering:optimizeLegibility}
+p,td.content{letter-spacing:-.01em}
+p strong,td.content strong{letter-spacing:-.005em}
+p,blockquote,dt,td.content{font-size:1.0625rem}
+p{margin-bottom:1.25rem}
+.sidebarblock p,.sidebarblock dt,.sidebarblock 
td.content,p.tableblock{font-size:1em}
+.exampleblock>.content{background-color:#fffef7;border-color:#e0e0dc;-webkit-box-shadow:0
 1px 4px #e0e0dc;box-shadow:0 1px 4px #e0e0dc}
+.print-only{display:none!important}
+@media print{@page{margin:1.25cm .75cm}
+*{-webkit-box-shadow:none!important;box-shadow:none!important;text-shadow:none!important}
+a{color:inherit!important;text-decoration:underline!important}
+a.bare,a[href^="#"],a[href^="mailto:"]{text-decoration:none!important}
+a[href^="http:"]:not(.bare):after,a[href^="https:"]:not(.bare):after{content:"("
 attr(href) ")";display:inline-block;font-size:.875em;padding-left:.25em}
+abbr[title]:after{content:" (" attr(title) ")"}
+pre,blockquote,tr,img{page-break-inside:avoid}
+thead{display:table-header-group}
+img{max-width:100%!important}
+p,blockquote,dt,td.content{font-size:1em;orphans:3;widows:3}
+h2,h3,#toctitle,.sidebarblock>.content>.title{page-break-after:avoid}
+#toc,.sidebarblock,.exampleblock>.content{background:none!important}
+#toc{border-bottom:1px solid #ddddd8!important;padding-bottom:0!important}
+.sect1{padding-bottom:0!important}
+.sect1+.sect1{border:0!important}
+#header>h1:first-child{margin-top:1.25rem}
+body.book #header{text-align:center}
+body.book #header>h1:first-child{border:0!important;margin:2.5em 0 1em 0}
+body.book #header 
.details{border:0!important;display:block;padding:0!important}
+body.book #header .details span:first-child{margin-left:0!important}
+body.book #header .details br{display:block}
+body.book #header .details br+span:before{content:none!important}
+body.book 
#toc{border:0!important;text-align:left!important;padding:0!important;margin:0!important}
+body.book #toc,body.book #preamble,body.book h1.sect0,body.book 
.sect1>h2{page-break-before:always}
+.listingblock code[data-lang]:before{display:block}
+#footer{background:none!important;padding:0 .9375em}
+#footer-text{color:rgba(0,0,0,.6)!important;font-size:.9em}
+.hide-on-print{display:none!important}
+.print-only{display:block!important}
+.hide-for-print{display:none!important}
+.show-for-print{display:inherit!important}}
+</style>
+<link rel="stylesheet" 
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.2.0/css/font-awesome.min.css";>
+</head>
+<body class="article">
+<div id="header">
+<h1>Apache NiFi In Depth</h1>
+<div class="details">
+<span id="author" class="author">Apache NiFi Team</span><br>
+<span id="email" class="email"><a 
href="mailto:d...@nifi.apache.org";>d...@nifi.apache.org</a></span><br>
+</div>
+<div id="toc" class="toc">
+<div id="toctitle">Table of Contents</div>
+<ul class="sectlevel1">
+<li><a href="nifi-in-depth.html#intro">Intro</a></li>
+<li><a href="nifi-in-depth.html#repositories">Repositories</a>
+<ul class="sectlevel2">
+<li><a href="nifi-in-depth.html#flowfile-repository">FlowFile 
Repository</a></li>
+<li><a href="nifi-in-depth.html#content-repository">Content Repository</a></li>
+<li><a href="nifi-in-depth.html#provenance-repository">Provenance 
Repository</a></li>
+<li><a href="nifi-in-depth.html#general-repository-notes">General Repository 
Notes</a></li>
+</ul>
+</li>
+<li><a href="nifi-in-depth.html#life-of-a-flowfile">Life of a FlowFile</a>
+<ul class="sectlevel2">
+<li><a href="nifi-in-depth.html#webcrawler-template">WebCrawler 
Template</a></li>
+<li><a href="nifi-in-depth.html#data-ingress">Data Ingress</a></li>
+<li><a href="nifi-in-depth.html#pass-by-reference">Pass by Reference</a></li>
+<li><a href="nifi-in-depth.html#extended-routing-use-cases">Extended Routing 
Use-cases</a></li>
+<li><a href="nifi-in-depth.html#funnels">Funnels</a></li>
+<li><a href="nifi-in-depth.html#copy-on-write">Copy on Write</a></li>
+<li><a href="nifi-in-depth.html#updating-attributes">Updating 
Attributes</a></li>
+<li><a href="nifi-in-depth.html#data-egress">Data Egress</a></li>
+</ul>
+</li>
+<li><a href="nifi-in-depth.html#closing-remarks">Closing Remarks</a></li>
+</ul>
+</div>
+</div>
+<div id="content">
+<div class="sect1">
+<h2 id="intro"><a class="anchor" href="nifi-in-depth.html#intro"></a>Intro</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This advanced level document is aimed at providing an in-depth look at the 
implementation and design decisions of NiFi. It assumes the reader has read 
enough of the other documentation to know the basics of NiFi.</p>
+</div>
+<div class="paragraph">
+<p>FlowFiles are at the heart of NiFi and its flow-based design. A FlowFile is 
a data record, which consists of a pointer to its content (payload) and 
attributes to support the content, that is associated with one or more 
provenance events. The attributes are key/value pairs that act as the metadata 
for the FlowFile, such as the FlowFile filename. The content is the actual data 
or the payload of the file. Provenance is a record of what has happened to the 
FlowFile. Each one of these parts has its own repository (repo) for storage.</p>
+</div>
+<div class="paragraph">
+<p>One key aspect of the repositories is immutability. The content in the 
Content Repository and data within the FlowFile Repository are immutable. When 
a change occurs to the attributes of a FlowFile, new copies of the attributes 
are created in memory and then persisted on disk. When content is being changed 
for a given FlowFile, its original content is read, streamed through the 
transform, and written to a new stream. Then the FlowFile&#8217;s content 
pointer is updated to the new location on disk. As a result, the default 
approach for FlowFile content storage can be said to be an immutable versioned 
content store.  The benefits of this are many, including: substantial reduction 
in storage space required for the typical complex graphs of processing, natural 
replay capability, takes advantage of OS caching, reduces random read/write 
performance hits, and is easy to reason over. The previous revisions are kept 
according to the archiving properties set in <em>nifi.properties</em> fil
 e and outlined in the <a href="administration-guide.html">NiFi System 
Administrator&#8217;s Guide</a>.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="repositories"><a class="anchor" 
href="nifi-in-depth.html#repositories"></a>Repositories</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>There are three repositories that are utilized by NiFi. Each exists within 
the OS/Host&#8217;s file system and provides specific functionality. In order 
to fully understand FlowFiles and how they are used by the underlying system 
it&#8217;s important to know about these repositories. All three repositories 
are directories on local storage that NiFi uses to persist data.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The FlowFile Repository contains metadata for all the current FlowFiles in 
the flow.</p>
+</li>
+<li>
+<p>The Content Repository holds the content for current and past FlowFiles.</p>
+</li>
+<li>
+<p>The Provenance Repository holds the history of FlowFiles.</p>
+</li>
+</ul>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/zero-master-node.png" alt="NiFi Architecture Diagram">
+</div>
+</div>
+<div class="sect2">
+<h3 id="flowfile-repository"><a class="anchor" 
href="nifi-in-depth.html#flowfile-repository"></a>FlowFile Repository</h3>
+<div class="paragraph">
+<p>FlowFiles that are actively being processed by the system are held in a 
hash map in the JVM memory (more about that in <a 
href="nifi-in-depth.html#DeeperView">Deeper View: FlowFiles in Memory and on 
Disk</a>). This makes it very efficient to process them, but requires a 
secondary mechanism to provide durability of data across process restarts due 
to a number of reasons, such as power loss, kernel panics, system upgrades, and 
maintenance cycles. The FlowFile Repository is a "Write-Ahead Log" (or data 
record) of the metadata of each of the FlowFiles that currently exist in the 
system. This FlowFile metadata includes all the attributes associated with the 
FlowFile, a pointer to the actual content of the FlowFile (which exists in the 
Content Repo) and the state of the FlowFile, such as which Connection/Queue the 
FlowFile belongs in. This Write-Ahead Log provides NiFi the resiliency it needs 
to handle restarts and unexpected system failures.</p>
+</div>
+<div class="paragraph">
+<p>The FlowFile Repository acts as NiFi&#8217;s Write-Ahead Log, so as the 
FlowFiles are flowing through the system, each change is logged in the FlowFile 
Repository before it happens as a transactional unit of work. This allows the 
system to know exactly what step the node is on when processing a piece of 
data. If the node goes down while processing the data, it can easily resume 
from where it left off upon restart (more in-depth in <a 
href="nifi-in-depth.html#EffectSystemFailure">Effect of System Failure on 
Transactions</a>). The format of the FlowFiles in the log is a series of deltas 
(or changes) that happened along the way. NiFi recovers a FlowFile by restoring 
a “snapshot” of the FlowFile (created when the Repository is check-pointed) 
and then replaying each of these deltas.</p>
+</div>
+<div class="paragraph">
+<p>A snapshot is automatically taken periodically by the system, which creates 
a new snapshot for each FlowFile. The system computes a new base checkpoint by 
serializing each FlowFile in the hash map and writing it to disk with the 
filename ".partial". As the checkpointing proceeds, the new FlowFile baselines 
are written to the ".partial" file. Once the checkpointing is done the old 
"snapshot" file is deleted and the ".partial" file is renamed "snapshot".</p>
+</div>
+<div class="paragraph">
+<p>The period between system checkpoints is configurable in the 
<em>nifi.properties</em> file (documented in the <a 
href="administration-guide.html">NiFi System Administrator&#8217;s Guide</a>). 
The default is a two-minute interval.</p>
+</div>
+<div class="sect3">
+<h4 id="EffectSystemFailure"><a class="anchor" 
href="nifi-in-depth.html#EffectSystemFailure"></a>Effect of System Failure on 
Transactions</h4>
+<div class="paragraph">
+<p>NiFi protects against hardware and system failures by keeping a record of 
what was happening on each node at that time in their respective FlowFile Repo. 
As mentioned above, the FlowFile Repo is NiFi&#8217;s Write-Ahead Log. When the 
node comes back online, it works to restore its state by first checking for the 
"snapshot" and ".partial" files. The node either accepts the "snapshot" and 
deletes the ".partial" (if it exists), or renames the ".partial" file to 
"snapshot" if the "snapshot" file doesn&#8217;t exist.</p>
+</div>
+<div class="paragraph">
+<p>If the Node was in the middle of writing content when it went down, nothing 
is corrupted, thanks to the Copy On Write (mentioned below) and Immutability 
(mentioned above) paradigms. Since FlowFile transactions never modify the 
original content (pointed to by the content pointer), the original is safe. 
When NiFi goes down, the write claim for the change is orphaned and then 
cleaned up by the background garbage collection. This provides a “rollback” 
to the last known stable state.</p>
+</div>
+<div class="paragraph">
+<p>The Node then restores its state from the FlowFile. For a more in-depth, 
step-by-step explanation of the process, see this link: <a 
href="https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write-Ahead+Log+Implementation";
 
class="bare">https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write-Ahead+Log+Implementation</a>
 .</p>
+</div>
+<div class="paragraph">
+<p>This setup, in terms of transactional units of work, allows NiFi to be very 
resilient in the face of adversity, ensuring that even if NiFi is suddenly 
killed, it can pick back up without any loss of data.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="DeeperView"><a class="anchor" 
href="nifi-in-depth.html#DeeperView"></a>Deeper View: FlowFiles in Memory and 
on Disk</h4>
+<div class="paragraph">
+<p>The term "FlowFile" is a bit of a misnomer. This would lead one to believe 
that each FlowFile corresponds to a file on disk, but that is not true. There 
are two main locations that the FlowFile attributes exist, the Write-Ahead Log 
that is explained above and a hash map in working memory. This hash map has a 
reference to all of the FlowFiles actively being used in the Flow. The object 
referenced by this map is the same one that is used by processors and held in 
connections queues. Since the FlowFile object is held in memory, all which has 
to be done for the Processor to get the FlowFile is to ask the ProcessSession 
to grab it from the queue.</p>
+</div>
+<div class="paragraph">
+<p>When a change occurs to the FlowFile, the delta is written out to the 
Write-Ahead Log and the object in memory is modified accordingly. This allows 
the system to quickly work with FlowFiles while also keeping track of what has 
happened and what will happen when the session is committed. This provides a 
very robust and durable system.</p>
+</div>
+<div class="paragraph">
+<p>There is also the notion of "swapping" FlowFiles. This occurs when the 
number of FlowFiles in a connection queue exceeds the value set in the 
"nifi.queue.swap.threshold" property. The FlowFiles with the lowest priority in 
the connection queue are serialized and written to disk in a "swap file" in 
batches of 10,000. These FlowFiles are then removed from the hash map mentioned 
above and the connection queue is in charge of determining when to swap the 
files back into memory. When the FlowFiles are swapped out, the FlowFile repo 
is notified and it keeps a list of the swap files. When the system is 
checkpointed the snapshot includes a section for swapped out files. When swap 
files are swapped back in, the FlowFiles are added back into the hash map. This 
swapping technique, much like the swapping performed by most Operating Systems, 
allows NiFi to provide very fast access to FlowFiles that are actively being 
processed while still allowing many millions of FlowFiles to exist in the Flo
 w without depleting the system’s memory.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="content-repository"><a class="anchor" 
href="nifi-in-depth.html#content-repository"></a>Content Repository</h3>
+<div class="paragraph">
+<p>The Content Repository is simply a place in local storage where the content 
of all FlowFiles exists and it is typically the largest of the three 
Repositories. As mentioned in the introductory section, this repository 
utilizes the immutability and copy-on-write paradigms to maximize speed and 
thread-safety. The core design decision influencing the Content Repo is to hold 
the FlowFile&#8217;s content on disk and only read it into JVM memory when 
it&#8217;s needed. This allows NiFi to handle tiny and massive sized objects 
without requiring producer and consumer processors to hold the full objects in 
memory. As a result, actions like splitting, aggregating, and transforming very 
large objects are quite easy to do without harming memory.</p>
+</div>
+<div class="paragraph">
+<p>In the same way the JVM Heap has a garbage collection process to reclaim 
unreachable objects when space is needed, there exists a dedicated thread in 
NiFi to analyze the Content repo for un-used content (more info in the " Deeper 
View: Deletion After Checkpointing" section). After a FlowFile&#8217;s content 
is identified as no longer in use it will either be deleted or archived. If 
archiving is enabled in <em>nifi.properties</em> then the FlowFile’s content 
will exist in the Content Repo either until it is aged off (deleted after a 
certain amount of time) or deleted due to the Content Repo taking up too much 
space.  The conditions for archiving and/or deleting are configured in the 
<em>nifi.properties</em> file 
("nifi.content.repository.archive.max.retention.period", 
"nifi.content.repository.archive.max.usage.percentage") and outlined in the <a 
href="administration-guide.html">NiFi System Administrator&#8217;s Guide</a>. 
Refer to the "Data Egress" section for more informatio
 n on the deletion of content.</p>
+</div>
+<div class="sect3">
+<h4 id="deeper-view-content-claim"><a class="anchor" 
href="nifi-in-depth.html#deeper-view-content-claim"></a>Deeper View: Content 
Claim</h4>
+<div class="paragraph">
+<p>In general, when talking about a FlowFile, the reference to its content can 
simply be referred to as a "pointer" to the content. Though, the underlying 
implementation of the FlowFile Content reference has multiple layers of 
complexity. The Content Repository is made up of a collection of files on disk. 
These files are binned into Containers and Sections. A Section is a 
subdirectory of a Container. A Container can be thought of as a “root 
directory” for the Content Repository. The Content Repository, though, can be 
made up of many Containers. This is done so that NiFi can take advantage of 
multiple physical partitions in parallel.” NiFi is then capable of reading 
from, and writing to, all of these disks in parallel, in order to achieve data 
rates of hundreds of Megabytes or even Gigabytes per second of disk throughput 
on a single node. "Resource Claims" are Java objects that point to specific 
files on disk (this is done by keeping track of the file ID, the section t
 he file is in, and the container the section is a part of).</p>
+</div>
+<div class="paragraph">
+<p>To keep track of the FlowFile&#8217;s contents, the FlowFile has a "Content 
Claim" object. This Content Claim has a reference to the Resource Claim that 
contains the content, the offset of the content within the file, and the length 
of the content. To access the content, the Content Repository drills down using 
to the specific file on disk using the Resource Claim&#8217;s properties and 
then seeks to the offset specified by the Resource Claim before streaming 
content from the file.</p>
+</div>
+<div class="paragraph">
+<p>This layer of abstraction (Resource Claim) was done so that there is not a 
file on disk for the content of every FlowFile. The concept of immutability is 
key to this being possible. Since the content is never changed once it is 
written ("copy on write" is used to make changes), there is no fragmentation of 
memory or moving data if the content of a FlowFile changes. By utilizing a 
single file on disk to hold the content of many FlowFiles, NiFi is able to 
provide far better throughput, often approaching the maximum data rates 
provided by the disks.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="provenance-repository"><a class="anchor" 
href="nifi-in-depth.html#provenance-repository"></a>Provenance Repository</h3>
+<div class="paragraph">
+<p>The Provenance Repository is where the history of each FlowFile is stored. 
This history is used to provide the Data Lineage (also known as the Chain of 
Custody) of each piece of data. Each time that an event occurs for a FlowFile 
(FlowFile is created, forked, cloned, modified, etc.) a new provenance event is 
created. This provenance event is a snapshot of the FlowFile as it looked and 
fit in the flow that existed at that point in time. When a provenance event is 
created, it copies all the FlowFile&#8217;s attributes and the pointer to the 
FlowFile&#8217;s content and aggregates that with the FlowFile&#8217;s state 
(such as its relationship with other provenance events) to one location in the 
Provenance Repo. This snapshot will not change, with the exception of the data 
being expired. The Provenance Repository holds all of these provenance events 
for a period of time after completion, as specified in the 
<em>nifi.properties</em> file.</p>
+</div>
+<div class="paragraph">
+<p>Because all of the FlowFile attributes and the pointer to the content are 
kept in the Provenance Repository, a Dataflow Manager is able to not only see 
the lineage, or processing history, of that piece of data, but is also able to 
later view the data itself and even replay the data from any point in the flow. 
A common use-case for this is when a particular down-stream system claims to 
have not received the data. The data lineage can show exactly when the data was 
delivered to the downstream system, what the data looked like, the filename, 
and the URL that the data was sent to – or can confirm that the data was 
indeed never sent. In either case, the Send event can be replayed with the 
click of a button (or by accessing the appropriate HTTP API endpoint) in order 
to resend the data only to that particular downstream system. Alternatively, if 
the data was not handled properly (perhaps some data manipulation should have 
occurred first), the flow can be fixed and then the data ca
 n be replayed into the new flow, in order to process the data properly.</p>
+</div>
+<div class="paragraph">
+<p>Keep in mind, though, that since Provenance is not copying the content in 
the Content Repo, and just copying the FlowFile&#8217;s pointer to the content, 
the content could be deleted before the provenance event that references it is 
deleted. This would mean that the user would no longer able to see the content 
or replay the FlowFile later on. However, users are still able to view the 
FlowFile’s lineage and understand what happened to the data. For instance, 
even though the data itself will not be accessible, the user is still able to 
see the unique identifier of the data, its filename (if applicable), when it 
was received, where it was received from, how it was manipulated, where it was 
sent, and so on. Additionally, since the FlowFile’s attributes are made 
available, a Dataflow Manager is able to understand why the data was processed 
in the way that it was, providing a crucial tool for understanding and 
debugging the dataflow.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Since provenance events are snapshots of the FlowFile, as it exists in the 
current flow, changes to the flow may impact the ability to replay provenance 
events later on. For example, if a Connection is deleted from the flow, the 
data cannot be replayed from that point in the flow, since there is now nowhere 
to enqueue the data for processing.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>For a look at the design decisions behind the Provenance Repository check 
out this link: <a 
href="https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance+Repository+Design";
 
class="bare">https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance+Repository+Design</a></p>
+</div>
+<div class="sect3">
+<h4 id="deeper-view-provenance-log-files"><a class="anchor" 
href="nifi-in-depth.html#deeper-view-provenance-log-files"></a>Deeper View: 
Provenance Log Files</h4>
+<div class="paragraph">
+<p>Each provenance event has two maps, one for the attributes before the event 
and one for the updated attribute values. In general, provenance events 
don&#8217;t store the updated values of the attributes as they existed when the 
event was emitted, but instead, the attribute values when the session is 
committed. The events are cached and saved until the session is committed and 
once the session is committed the events are emitted with the attributes 
associated with the FlowFile when the session is committed. The exception to 
this rule is the "SEND" event, in which case the event contains the attributes 
as they existed when the event was emitted. This is done because if the 
attributes themselves were also sent, it is important to have an accurate 
account of exactly what information was sent.</p>
+</div>
+<div class="paragraph">
+<p>As NiFi is running, there is a rolling group of 16 provenance log files. As 
provenance events are emitted they are written to one of the 16 files (there 
are multiple files to increase throughput). The log files are periodically 
rolled over (the default timeframe is every 30 seconds). This means the newly 
created provenance events start writing to a new group of 16 log files and the 
original ones are processed for long term storage. First the rolled over logs 
are merged into one file. Then the file is optionally compressed (determined by 
the "nifi.provenance.repository.compress.on.rollover" property). Lastly the 
events are indexed using Lucene and made available for querying. This batched 
approach for indexing means provenance events aren&#8217;t available 
immediately for querying but in return this dramatically increases performance 
because committing a transaction and indexing are very expensive tasks.</p>
+</div>
+<div class="paragraph">
+<p>A separate thread handles the deletion of provenance logs. The two 
conditions admins can set to control the deletion of provenance logs is the max 
amount of disk space it can take up and the max retention duration for the 
logs. The thread sorts the repo by the last modified date and deletes the 
oldest file when one of the conditions is exceeded.</p>
+</div>
+<div class="paragraph">
+<p>The Provenance Repo is a Lucene index that is broken into multiple shards. 
This is done for multiple reasons. Firstly, Lucene uses a 32-bit integer for 
the document identifier so the maximum number of documents supported by Lucene 
without sharding is limited. Second, if we know the time range for each shard, 
it makes it easy to search with multiple threads. Also, this sharding also 
allows for more efficient deletion. NiFi waits until all events in a shard are 
scheduled for deletion before deleting the entire shard from disk. This makes 
it so we do not have to update the Lucene index when we delete.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="general-repository-notes"><a class="anchor" 
href="nifi-in-depth.html#general-repository-notes"></a>General Repository 
Notes</h3>
+<div class="sect3">
+<h4 id="multiple-physical-storage-points"><a class="anchor" 
href="nifi-in-depth.html#multiple-physical-storage-points"></a>Multiple 
Physical Storage Points</h4>
+<div class="paragraph">
+<p>For the Provenance and Content repos, there is the option to stripe the 
information across multiple physical partitions. An admin would do this if they 
wanted to federate reads and writes across multiple disks. The repo (Content or 
Provenance) is still one logical store but writes will be striped across 
multiple volumes/partitions automatically by the system. The directories are 
specified in the <em>nifi.properties</em> file.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="best-practice"><a class="anchor" 
href="nifi-in-depth.html#best-practice"></a>Best Practice</h4>
+<div class="paragraph">
+<p>It is considered a best practice to analyze the contents of a FlowFile as 
few times as possible and instead extract key information from the contents 
into the attributes of the FlowFile; then read/write information from the 
FlowFile attributes. One example of this is the ExtractText processor, which 
extracts text from the FlowFile Content and puts it as an attribute so other 
processors can make use of it. This provides far better performance than 
continually processing the entire content of the FlowFile, as the attributes 
are kept in-memory and updating the FlowFile repository is much faster than 
updating the Content repository, given the amount of data stored in each.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="life-of-a-flowfile"><a class="anchor" 
href="nifi-in-depth.html#life-of-a-flowfile"></a>Life of a FlowFile</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>To better understand how the repos interact with one another, the 
underlying functionality of NiFi, and the life of a FlowFile; this next section 
will include examples of a FlowFile at different points in a real flow. The 
flow is a template called "WebCrawler.xml" and is available here: <a 
href="https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates";
 
class="bare">https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates</a>.</p>
+</div>
+<div class="paragraph">
+<p>At a high level, this template reaches out to a seed URL configured in the 
GetHTTP processor, then analyzes the response using the RouteText processor to 
find instances of a keyword (in this case "nifi"), and potential URLs to hit. 
Then InvokeHTTP executes a HTTP Get request using the URLs found in the 
original seed web page. The response is routed based on the status code 
attribute and only 200-202 status codes are routed back to the original 
RouteText processor for analysis.</p>
+</div>
+<div class="paragraph">
+<p>The flow also detects duplicate URLs and prevents processing them again, 
emails the user when keywords are found, logs all successful HTTP requests, and 
bundles up the successful requests to be compressed and archived on disk.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+To use this flow you need to configure a couple options. First a 
DistributedMapCacheServer controller service must be added with default 
properties. At the time of writing there was no way to explicitly add the 
controller service to the template and since no processors reference the 
service it is not included. Also to get emails, the PutEmail processor must be 
configured with your email credentials. Finally, to use HTTPS the 
StandardSSLContextService must be configured with proper key and trust stores. 
Remember that the truststore must be configured with the proper Certificate 
Authorities in order to work for websites. The command below is an example of 
using the "keytool" command to add the default Java 1.8.0_60 CAs to a 
truststore called myTrustStore:
+keytool -importkeystore -srckeystore 
/Library/Java/JavaVirtualMachines/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts
  -destkeystore myTrustStore
+</td>
+</tr>
+</table>
+</div>
+<div class="sect2">
+<h3 id="webcrawler-template"><a class="anchor" 
href="nifi-in-depth.html#webcrawler-template"></a>WebCrawler Template</h3>
+<div class="imageblock">
+<div class="content">
+<img src="images/WebCrawler.png" alt="Web Crawler Flow">
+</div>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+It is not uncommon for bulletins with messages such as "Connection timed out" 
to appear on the InvokeHttp processor due to the random nature of web crawling.
+</td>
+</tr>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="data-ingress"><a class="anchor" 
href="nifi-in-depth.html#data-ingress"></a>Data Ingress</h3>
+<div class="paragraph">
+<p>A FlowFile is created in the system when a producer processor invokes 
"ProcessSession.create()" followed by an appropriate call to the 
ProvenanceReporter. The "ProcessSession.create()" call creates an empty 
FlowFile with a few core attributes (filename, path and uuid for the standard 
process session) but without any content or lineage to parents (the create 
method is overloaded to allow parameters for parent FlowFiles). The producer 
processor then adds the content and attributes to the FlowFile.</p>
+</div>
+<div class="paragraph">
+<p>ProvenanceReporter is used to emit the Provenance Events for the FlowFile. 
If the file is created by NiFi from data not received by an external entity 
then a "CREATE" event should be emitted. If instead the data was created from 
data received from an external source then a "RECEIVE" event should be emitted. 
The Provenance Events are made using "ProvenanceReporter.create()" and 
"ProvenanceReporter.receive()" respectively.</p>
+</div>
+<div class="paragraph">
+<p>In our WebCrawler flow, the GetHTTP processor creates the initial FlowFile 
using "ProcessSession.create()" and records the receipt of data using 
"ProvenanceReporter.receive()". This method call also provides the URL from 
which the data was received, how long it took the transfer the data, and any 
FlowFile attributes that were added to the FlowFile. HTTP Headers, for 
instance, can be added as FlowFile attributes.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/DataIngress.png" alt="Data Ingress">
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="pass-by-reference"><a class="anchor" 
href="nifi-in-depth.html#pass-by-reference"></a>Pass by Reference</h3>
+<div class="paragraph">
+<p>An important aspect of flow-based programming is the idea of 
resource-constrained relationships between the black boxes. In NiFi these are 
queues and processors respectively. FlowFiles are routed from one processor to 
another through queues simply by passing a reference to the FlowFile (similar 
to the "Claim Check" pattern in EIP).</p>
+</div>
+<div class="paragraph">
+<p>In the WebCrawler flow, the InvokeHTTP processor reaches out to the URL 
with an HTTP GET request and adds a status code attribute to the FlowFile 
depending on what the response was from the HTTP server.  After updating the 
FlowFile&#8217;s filename  (in the UpdateAttribute processor after InvokeHttp) 
there is a RouteOnAttribute processor that routes FlowFiles with successful 
status code attributes to two different processors. Those that are unmatched 
are "DROPPED" (See the Data Egress section) by the RouteOnAttribute Processor, 
because it is configured to Auto-Terminate any data that does not match any of 
the routing rules.  Coming in to the RouteOnAttribute processor there is a 
FlowFile (F1) that contains the status code attribute and points to the Content 
(C1). There is a provenance event that points to C1 and includes a snapshot of 
F1 but is omitted to better focus on the routing. This information is located 
in the FlowFile, Content and Provenance Repos respectively.</p>
+</div>
+<div class="paragraph">
+<p>After the RouteOnAttribute processor examines the FlowFile&#8217;s status 
code attribute it determines that it should be routed to two different 
locations. The first thing that happens is the processor clones the FlowFile to 
create F2. This copies all of the attributes and the pointer to the content. 
Since it is merely routing and analyzing the attributes, the content does not 
change.  The FlowFiles are then added to the respective connection queue to 
wait for the next processor to retrieve them for processing.</p>
+</div>
+<div class="paragraph">
+<p>The ProvenanceReporter documents the changes that occurred, which includes 
a CLONE and two ROUTE events. Each of these events has a pointer to the 
relevant content and contains a copy of the respective FlowFiles in the form of 
a snapshot.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/PassByReference.png" alt="Pass By Reference">
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="extended-routing-use-cases"><a class="anchor" 
href="nifi-in-depth.html#extended-routing-use-cases"></a>Extended Routing 
Use-cases</h3>
+<div class="paragraph">
+<p>In addition to routing FlowFiles based on attributes, some processors also 
route based on content. While it is not as efficient, sometimes it is necessary 
because you want to split up the content of the FlowFile into multiple 
FlowFiles.</p>
+</div>
+<div class="paragraph">
+<p>One example is the SplitText processor.  This processor analyzes the 
content looking for end line characters and creates new FlowFiles containing a 
configurable number of lines. The Web Crawler flow uses this to split the 
potential URLs into single lines for URL extraction and to act as requests for 
InvokeHttp. One benefit of the SplitText processor is that since the processor 
is splitting contiguous chunks (no FlowFile content is disjoint or overlapping) 
the processor can do this routing without copying any content. All it does is 
create new FlowFiles, each with a pointer to a section of the original 
FlowFile’s content. This is made possible by the content demarcation and 
split facilities built into the NiFi API.  While not always feasible to split 
in this manner when it is feasible the performance benefits are 
considerable.</p>
+</div>
+<div class="paragraph">
+<p>RouteText is a processor that shows why copying content can be needed for 
certain styles of routing. This processor analyzes each line and routes it to 
one or more relationships based on configurable properties. When more than one 
line gets routed to the same relationship (for the same input FlowFile), those 
lines get combined into one FlowFile.  Since the lines could be disjoint (lines 
1 and 100 route to the same relationship) and one pointer cannot describe the 
FlowFile&#8217;s content accurately, the processor must copy the contents to a 
new location. For example, in the Web Crawler flow, the RouteText processor 
routes all lines that contain "nifi" to the "NiFi" relationship. So when there 
is one input FlowFile that has "nifi" multiple times on the web page, only one 
email will be sent (via the subsequent PutEmail processor).</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="funnels"><a class="anchor" 
href="nifi-in-depth.html#funnels"></a>Funnels</h3>
+<div class="paragraph">
+<p>The funnel is a component that takes input from one or more connections and 
routes them to one or more destinations. The typical use-cases of which are 
described in the User Guide. Regardless of use-case, if there is only one 
processor downstream from the funnel then there are no provenance events 
emitted by the funnel and it appears to be invisible in the Provenance graph.  
If there are multiple downstream processors, like the one in WebCrawler, then a 
clone event occurs. Referring to the graphic below, you can see that a new 
FlowFile (F2) is cloned from the original FlowFile (F1) and, just like the 
Routing above, the new FlowFile just has a pointer to the same content (the 
content is not copied).</p>
+</div>
+<div class="paragraph">
+<p>From a developer point of view, you can view a Funnel just as a very simple 
processor. When it is scheduled to run, it simply does a "ProcessSession.get()" 
and then "ProcessSession.transfer()" to the output connection . If there is 
more than one output connection (like the example below) then a 
"ProcessSession.clone()" is run. Finally a "ProcessSession.commit()" is called, 
completing the transaction.</p>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/Funnels.png" alt="Funnel">
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="copy-on-write"><a class="anchor" 
href="nifi-in-depth.html#copy-on-write"></a>Copy on Write</h3>
+<div class="paragraph">
+<p>In the previous example, there was only routing but no changes to the 
content of the FlowFile. This next example focuses on the CompressContent 
processor of the template that compresses the bundle of merged FlowFiles 
containing webpages that were queued to be analyzed.</p>
+</div>
+<div class="paragraph">
+<p>In this example, the content C1 for FlowFile F1 is being compressed in the 
CompressContent processor. Since C1 is immutable and we want a full re-playable 
provenance history we can&#8217;t just overwrite C1. In order to "modify" C1 we 
do a "copy on write", which we accomplish by modifying the content as it is 
copied to a new location within the content repository. When doing so, FlowFile 
reference F1 is updated to point to the new compressed content C2 and a new 
Provenance Event P2 is created referencing the new FlowFile F1.1. Because the 
FlowFile repo is immutable, instead of modifying the old F1, a new delta (F1.1) 
is created.  Previous provenance events still have the pointer to the Content 
C1 and contain old attributes, but they are not the most up-to-date version of 
the FlowFile.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+For the sake of focusing on the Copy on Write event, the FlowFile&#8217;s (F1) 
provenance events leading up to this point are omitted.
+</td>
+</tr>
+</table>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/CopyOnWrite.png" alt="Copy On Write">
+</div>
+</div>
+<div class="sect3">
+<h4 id="extended-copy-on-write-use-case"><a class="anchor" 
href="nifi-in-depth.html#extended-copy-on-write-use-case"></a>Extended Copy on 
Write Use-case</h4>
+<div class="paragraph">
+<p>A unique case of Copy on Write is the MergeContent processor. Just about 
every processor only acts on one FlowFile at a time. The MergeContent processor 
is unique in that it takes in multiple FlowFiles and combines them into one. 
Currently, MergeContent has multiple different Merge Strategies but all of them 
require the contents of the input FlowFiles to be copied to a new merged 
location. After MergeContent finishes, it emits a provenance event of type 
"JOIN" that establishes that the given parents were joined together to create a 
new child FlowFile.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="updating-attributes"><a class="anchor" 
href="nifi-in-depth.html#updating-attributes"></a>Updating Attributes</h3>
+<div class="paragraph">
+<p>Working with a FlowFile&#8217;s attributes is a core aspect of NiFi. It is 
assumed that attributes are small enough to be entirely read into local memory 
every time a processor executes on it. So it is important that they are easy to 
work with. As attributes are the core way of routing and processing a FlowFile, 
it is very common to have processors that just change a FlowFile&#8217;s 
attributes. One such example is the UpdateAttribute processor. All the 
UpdateAttribute processor does is change the incoming FlowFile&#8217;s 
attributes according to the processor&#8217;s properties.</p>
+</div>
+<div class="paragraph">
+<p>Taking a look at the diagram, before the processor there is the FlowFile 
(F1) that has attributes and a pointer to the content (C1). The processor 
updates the FlowFile&#8217;s attributes by creating a new delta (F1.1) that 
still has a pointer to the content  (C1). An “ATTRIBUTES_MODIFIED” 
provenance event is emitted when this happens.</p>
+</div>
+<div class="paragraph">
+<p>In this example, the previous processor (InvokeHTTP) fetched information 
from a URL and created a new response FlowFile with a filename attribute that 
is the same as the request FlowFile. This does not help describe the response 
FlowFile, so the UpdateAttribute processor modifies the filename attribute to 
something more relevant (URL and transaction ID).</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+For the sake of focusing on the ATTRIBUTES_MODIFIED event the FlowFile&#8217;s 
(F1) provenance events leading up to this point are omitted.
+</td>
+</tr>
+</table>
+</div>
+<div class="imageblock">
+<div class="content">
+<img src="images/UpdatingAttributes.png" alt="Updating Attributes">
+</div>
+</div>
+<div class="sect3">
+<h4 id="typical-use-case-note"><a class="anchor" 
href="nifi-in-depth.html#typical-use-case-note"></a>Typical Use-case Note</h4>
+<div class="paragraph">
+<p>In addition to adding arbitrary attributes via UpdateAttribute, extracting 
information from the content of a FlowFile into the attributes is a very common 
use-case.  One such example in the Web Crawler flow is the ExtractText 
processor. We cannot use the URL when it is embedded within the content of the 
FlowFile, so we much extract the URL from the contents of the FlowFile and 
place it as an attribute. This way we can use the Expression Language to 
reference this attribute in the URL Property of InvokeHttp.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="data-egress"><a class="anchor" 
href="nifi-in-depth.html#data-egress"></a>Data Egress</h3>
+<div class="paragraph">
+<p>Eventually data in NiFi will reach a point where it has either been loaded 
into another system and we can stop processing it, or we filtered the FlowFile 
out and determined we no longer care about it. Either way, the FlowFile will 
eventually be "DROPPED".  "DROP" is a provenance event meaning that we are no 
longer processing the FlowFile in the Flow and it is available for deletion. It 
remains in the FlowFile Repository until the next repository checkpoint. The 
Provenance Repository keeps the Provenance events for an amount of time stated 
in <em>nifi.properties</em> (default is 24 hours). The content in the Content 
Repo is marked for deletion once the FlowFile leaves NiFi and the background 
checkpoint processing of the Write-Ahead Log to compact/remove occurs. That is 
unless another FlowFile references the same content or if archiving is enabled 
in <em>nifi.properties</em>. If archiving is enabled, the content exists until 
either the max percentage of disk is reached or max reten
 tion period is reached (also set in <em>nifi.properties</em>).</p>
+</div>
+<div class="sect3">
+<h4 id="deeper-view-deletion-after-checkpointing"><a class="anchor" 
href="nifi-in-depth.html#deeper-view-deletion-after-checkpointing"></a>Deeper 
View: Deletion After Checkpointing</h4>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+This section relies heavily on information from the "Deeper View: Content 
Claim" section above.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>Once the “.partial” file is synchronized with the underlying storage 
mechanism and renamed to be the new snapshot (detailed in the FlowFile Repo 
section) there is a callback to the FlowFile Repo to release all the old 
content claims (this is done after checkpointing so that content is not lost if 
something goes wrong). The FlowFile Repo knows which Content Claims can be 
released and notifies the Resource Claim Manager. The Resource Claim Manager 
keeps track of all the content claims that have been released and which 
resource claims are ready to be deleted (a resource claim is ready to be 
deleted when there are no longer any FlowFiles referencing it in the flow).</p>
+</div>
+<div class="paragraph">
+<p>Periodically, the Content Repo asks the Resource Claim Manager which 
Resource Claims can be cleaned up. The Content Repo then makes the decision 
whether the Resource Claims should be archived or deleted (based on the value 
of the "nifi.content.repository.archive.enabled" property in the 
<em>nifi.properties</em> file). If archiving is disabled, then the file is 
simply deleted from the disk. Otherwise, a background thread runs to see when 
archives should be deleted (based on the conditions above). This background 
thread keeps a list of the 10,000 oldest content claims and deletes them until 
below the necessary threshold. If it runs out of content claims it scans the 
repo for the oldest content to re-populate the list. This provides a model that 
is efficient in terms of both Java heap utilization as well as disk I/O 
utilization.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="associating-disparate-data"><a class="anchor" 
href="nifi-in-depth.html#associating-disparate-data"></a>Associating Disparate 
Data</h4>
+<div class="paragraph">
+<p>One of the features of the Provenance Repository is that it allows 
efficient access to events that occur sequentially. A NiFi Reporting Task could 
then be used to iterate over these events and send them to an external service. 
If other systems are also sending similar types of events to this external 
system, it may be necessary to associate a NiFi FlowFile with another piece of 
information. For instance, if GetSFTP is used to retrieve data, NiFi refers to 
that FlowFile using its own, unique UUID. However, if the system that placed 
the file there referred to the file by filename, NiFi should have a mechanism 
to indicate that these are the same piece of data. This is accomplished by 
calling the ProvenanceReporter.associate() method and providing both the UUID 
of the FlowFile and the alternate name (the filename, in this example). Since 
the determination that two pieces of data are the same may be flow-dependent, 
it is often necessary for the DataFlow Manager to make this associatio
 n. A simple way of doing this is to use the UpdateAttribute processor and 
configure it to set the  "alternate.identifier" attribute. This automatically 
emits the "associate" event, using whatever value is added as the 
“alternate.identifier” attribute.</p>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="closing-remarks"><a class="anchor" 
href="nifi-in-depth.html#closing-remarks"></a>Closing Remarks</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Utilizing the copy-on-write, pass-by-reference, and immutability concepts 
in conjunction with the three repositories, NiFi is a fast, efficient, and 
robust enterprise dataflow platform. This document has covered specific 
implementations of pluggable interfaces. These include the Write-Ahead Log 
based implementation of the FlowFile Repository, the File based Provenance 
Repository, and the File based Content Repository. These implementations are 
the NiFi defaults but are pluggable so that, if needed, users can write their 
own to fulfill certain use-cases.</p>
+</div>
+<div class="paragraph">
+<p>Hopefully, this document has given you a better understanding of the 
low-level functionality of NiFi and the decisions behind them. If there is 
something you wish to have explained more in depth or you feel should be 
included please feel free to send an email to the Apache NiFi Developer mailing 
list (<a href="mailto:d...@nifi.apache.org";>d...@nifi.apache.org</a>).</p>
+</div>
+</div>
+</div>
+</div>
+<div id="footer">
+<div id="footer-text">
+Last updated 2017-05-05 21:10:21 -04:00
+</div>
+</div>
+</body>
+</html>
\ No newline at end of file


Reply via email to