[nifi] branch main updated: NIFI-11000 Add compression example to CreateHadoopSequenceFile documentation

exceptionfactory Wed, 25 Jan 2023 08:34:02 -0800

This is an automated email from the ASF dual-hosted git repository.

exceptionfactory pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/nifi.git



The following commit(s) were added to refs/heads/main by this push:
     new 4d3fcb6843 NIFI-11000 Add compression example to 
CreateHadoopSequenceFile documentation
4d3fcb6843 is described below

commit 4d3fcb684395ca1be0bef74e96f73dcdfc105fad
Author: Peter Gyori <peter.gyori....@gmail.com>
AuthorDate: Wed Dec 21 17:55:34 2022 +0100

    NIFI-11000 Add compression example to CreateHadoopSequenceFile documentation
    
    This closes #6801
    
    Signed-off-by: David Handermann <exceptionfact...@apache.org>
---
 .../additionalDetails.html                         | 82 +++++++++++++++++-----
 1 file changed, 64 insertions(+), 18 deletions(-)

diff --git 
a/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/resources/docs/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html
 
b/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/resources/docs/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html
index b8bf7c2d99..9f754a0724 100644
--- 
a/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/resources/docs/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html
+++ 
b/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/resources/docs/org.apache.nifi.processors.hadoop.CreateHadoopSequenceFile/additionalDetails.html
@@ -23,24 +23,70 @@
 
     <body>
         <!-- Processor Documentation 
================================================== -->
-        <h2>Description:</h2>
-        <p>This processor is used to create a Hadoop Sequence File, which 
essentially is a file of key/value pairs. The key 
-            will be a file name and the value will be the flow file content. 
The processor will take either a merged (a.k.a. packaged) flow 
-            file or a singular flow file. Historically, this processor handled 
the merging by type and size or time prior to creating a 
+        <h2>Description</h2>
+        <p>
+            This processor is used to create a Hadoop Sequence File, which 
essentially is a file of key/value pairs. The key
+            will be a file name and the value will be the flow file content. 
The processor will take either a merged (a.k.a. packaged) flow
+            file or a singular flow file. Historically, this processor handled 
the merging by type and size or time prior to creating a
             SequenceFile output; it no longer does this. If creating a 
SequenceFile that contains multiple files of the same type is desired,
             precede this processor with a <code>RouteOnAttribute</code> 
processor to segregate files of the same type and follow that with a
-            <code>MergeContent</code> processor to bundle up files. If the 
type of files is not important, just use the 
-            <code>MergeContent</code> processor. When using the 
<code>MergeContent</code> processor, the following Merge Formats are 
+            <code>MergeContent</code> processor to bundle up files. If the 
type of files is not important, just use the
+            <code>MergeContent</code> processor. When using the 
<code>MergeContent</code> processor, the following Merge Formats are
             supported by this processor:
-        <ul>
-            <li>TAR</li>
-            <li>ZIP</li>
-            <li>FlowFileStream v3</li>
-        </ul>
-        The created SequenceFile is named the same as the incoming FlowFile 
with the suffix '.sf'. For incoming FlowFiles that are 
-        bundled, the keys in the SequenceFile are the individual file names, 
the values are the contents of each file.
-    </p>
-    NOTE: The value portion of a key/value pair is loaded into memory. While 
there is a max size limit of 2GB, this could cause memory
-    issues if there are too many concurrent tasks and the flow file sizes are 
large.
-</body>
-</html>
+            <ul>
+                <li>TAR</li>
+                <li>ZIP</li>
+                <li>FlowFileStream v3</li>
+            </ul>
+            The created SequenceFile is named the same as the incoming 
FlowFile with the suffix '.sf'. For incoming FlowFiles that are
+            bundled, the keys in the SequenceFile are the individual file 
names, the values are the contents of each file.
+        </p>
+        <p>
+            NOTE: The value portion of a key/value pair is loaded into memory. 
While there is a max size limit of 2GB, this could cause memory
+            issues if there are too many concurrent tasks and the flow file 
sizes are large.
+        </p>
+
+        <h2>Using Compression</h2>
+        <p>
+            The value of the <code>Compression codec</code> property 
determines the compression library the processor uses to compress content.
+            Third party libraries are used for compression. These third party 
libraries can be Java libraries or native libraries.
+            In case of native libraries, the path of the parent folder needs 
to be in an environment variable called <code>LD_LIBRARY_PATH</code> so that 
NiFi can find the libraries.
+        </p>
+        <h3>Example: using Snappy compression with native library on 
CentOS</h3>
+        <p>
+            <ol>
+                <li>
+                    Snappy compression needs to be installed on the server 
running NiFi:
+                    <br/>
+                    <code>sudo yum install snappy</code>
+                    <br/>
+                </li>
+                <li>
+                    Suppose that the server running NiFi has the native 
compression libraries in <code>/opt/lib/hadoop/lib/native</code> .
+                    (Native libraries have file extensions like 
<code>.so</code>, <code>.dll</code>, <code>.lib</code>, etc. depending on the 
platform.)
+                    <br/>
+                    We need to make sure that the files can be executed by the 
NiFi process' user. For this purpose we can make a copy of these files
+                    to e.g. <code>/opt/nativelibs</code> and change their 
owner. If NiFi is executed by <code>nifi</code> user in the <code>nifi</code> 
group, then:
+                    <br/>
+                    <code>chown nifi:nifi /opt/nativelibs</code>
+                    <br/>
+                    <code>chown nifi:nifi /opt/nativelibs/*</code>
+                    <br/>
+                </li>
+                <li>
+                    The <code>LD_LIBRARY_PATH</code> needs to be set to 
contain the path to the folder <code>/opt/nativelibs</code>.
+                    <br/>
+                </li>
+                <li>
+                    NiFi needs to be restarted.
+                </li>
+                <li>
+                    <code>Compression codec</code> property can be set to 
<code>SNAPPY</code> and a <code>Compression type</code> can be selected.
+                </li>
+                <li>
+                    The processor can be started.
+                </li>
+            </ol>
+        </p>
+    </body>
+</html>
\ No newline at end of file

[nifi] branch main updated: NIFI-11000 Add compression example to CreateHadoopSequenceFile documentation

Reply via email to