Yes, that's exactly what those commands do.  Your linux commands like unzip
and tar can probably read directly from /dev/stdin and write directly to
/dev/stdout if you want to.

-- Mike


On Fri, Feb 2, 2024 at 9:22 AM James McMahon <[email protected]> wrote:

> Hi Michael. This is a very clever approach: convert from a zip (which
> UnpackContent does not preserve file metadata for extracted files) to a tar
> (for which UnpackContent does preserve file metadata), then employ the
> UnpackContent.
>
> One quick followup question. The ExecuteStreamCommand will be in the nifi
> flow, and so its input will be streaming incoming flowfiles, and its output
> will be streamed as a flowfile. Are these two commands in the script where
> we capture the incoming flowfile
>
> cat /dev/stdin >> $tmpzipfile
>
> ...and where we create the output flowfile from the ExecuteStreamCommand
> processor?
>
> cat $tmptarfile >> /dev/stdout
>
>
> On Thu, Feb 1, 2024 at 10:11 AM Michael Moser <[email protected]> wrote:
>
>> Hi Jim,
>>
>> The ExecuteStreamCommand will only output 1 flowfile, so using it to
>> unzip in this fashion won't yield the results you need.
>>
>> Instead, you might try a workaround with ExecuteStreamCommand to unzip
>> your file and then tar to repackage it.  Then UnpackContent should be able
>> to read the tar file metadata.  I have used ExecuteStreamCommand to execute
>> bash scripts.  An example is shown below, which you can modify for your
>> needs.  The ExecuteStreamCommand properties "Command Path=/bin/bash" and
>> "Command Arguments=/path/to/script.sh" is all you need for this script to
>> work.
>>
>> #!/bin/bash
>> tmpzipfile=$(mktemp)
>> tmptarfile=$(mktemp)
>> #remove the tmptarfile file, we just need a temporary filename, and will
>> recreate it below
>> rm -f $tmptarfile
>> #create a directory to unzip files to
>> tmpdir=$(mktemp -d)
>>
>> cat /dev/stdin >> $tmpzipfile
>> # here is your unzip command to unzip $tmpzipfile to $tmpdir, preserving
>> file metadata
>> # here is your tar command to tar $tmpdir to $tmptarfile
>> cat $tmptarfile >> /dev/stdout
>>
>> #cleanup
>> rm -f $tmpzipfile
>> rm -f $tmptarfile
>> rm -rf $tmpdir
>>
>>
>>
>> On Wed, Jan 31, 2024 at 12:55 PM James McMahon <[email protected]>
>> wrote:
>>
>>> If anyone can show me how to get my ExecuteStreamCommand configured
>>> properly as a workaround, I am still interested in that.
>>> Jim
>>>
>>> On Wed, Jan 31, 2024 at 12:39 PM James McMahon <[email protected]>
>>> wrote:
>>>
>>>> I tried to find a Create option for tickets here,
>>>> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-11859?filter=allopenissues
>>>> .
>>>> I did not find one, and suspect maybe I have no such privilege perhaps?
>>>> In any case, thank you for creating that.
>>>> Jim
>>>>
>>>> On Wed, Jan 31, 2024 at 12:37 PM Joe Witt <[email protected]> wrote:
>>>>
>>>>> I went ahead and wrote it up here
>>>>> https://issues.apache.org/jira/browse/NIFI-12709
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Jan 31, 2024 at 10:30 AM James McMahon <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Happy to do that Joe. How do I create and submit a JIRA for
>>>>>> consideration? I have not done one - at least, not for years.
>>>>>> If you get me started, I will do a concise and thorough description
>>>>>> in the ticket.
>>>>>> Sincerely,
>>>>>> Jim
>>>>>>
>>>>>> On Wed, Jan 31, 2024 at 12:12 PM Joe Witt <[email protected]> wrote:
>>>>>>
>>>>>>> James,
>>>>>>>
>>>>>>> Makes sense to create a JIRA to improve UnpackContent to extract
>>>>>>> these attributes in the event of a zip file that happens to present 
>>>>>>> them.
>>>>>>> The concept of lastModifiedDate does appear easily accessed if 
>>>>>>> available in
>>>>>>> the metadata.  Owner/Creator/Creation information looks less standard in
>>>>>>> the case of a Zip but perhaps still capturable as extra fields.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Wed, Jan 31, 2024 at 10:01 AM James McMahon <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I tried to use UnpackContent to extract the files within a zip file
>>>>>>>> named ABC DEF (1).zip. (the filename has spaces in its name).
>>>>>>>>
>>>>>>>> UnpackContent seemed to work, but it did not preserve file
>>>>>>>> attributes from the files in the zip. For example, the
>>>>>>>> lastModifiedTime   is not available so downstream I am unable to do
>>>>>>>> this: 
>>>>>>>> ${file.lastModifiedTime:toDate("yyyy-MM-dd'T'HH:mm:ssZ"):format("yyyyMMddHHmmss")}
>>>>>>>>
>>>>>>>> I did some digging and found that on the UnpackContent page, it
>>>>>>>> says:
>>>>>>>> file.lastModifiedTime  "The date and time that the unpacked file
>>>>>>>> was last modified (*tar only*)."
>>>>>>>>
>>>>>>>> I need these file attributes for those files I extract from the
>>>>>>>> zip. So as an alternative I tried configuring an
>>>>>>>> ExecuteStreamCommand processor like this:
>>>>>>>> Command Arguments  -c;"unzip -p -q < -"
>>>>>>>> Command Path  /bin/bash
>>>>>>>> Argument Delimiter   ;
>>>>>>>>
>>>>>>>> It throws these errors:
>>>>>>>>
>>>>>>>> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
>>>>>>>> write flow file to stdin due to Broken pipe: java.io.IOException: 
>>>>>>>> Broken
>>>>>>>> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] 
>>>>>>>> Transferring
>>>>>>>> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. 
>>>>>>>> Executable
>>>>>>>> command /bin/bash ended in an error: /bin/bash: -: No such file or 
>>>>>>>> directory
>>>>>>>>
>>>>>>>> It does not seem to be applying the unzip to the stdin of the ESC
>>>>>>>> processor. None of the files in the zip archive are output from ESC.
>>>>>>>>
>>>>>>>> What needs to be changed in my ESC configuration?
>>>>>>>>
>>>>>>>> Thank you in advance for any help.
>>>>>>>>
>>>>>>>>

Reply via email to