Yes, that's exactly what those commands do. Your linux commands like unzip and tar can probably read directly from /dev/stdin and write directly to /dev/stdout if you want to.
-- Mike On Fri, Feb 2, 2024 at 9:22 AM James McMahon <[email protected]> wrote: > Hi Michael. This is a very clever approach: convert from a zip (which > UnpackContent does not preserve file metadata for extracted files) to a tar > (for which UnpackContent does preserve file metadata), then employ the > UnpackContent. > > One quick followup question. The ExecuteStreamCommand will be in the nifi > flow, and so its input will be streaming incoming flowfiles, and its output > will be streamed as a flowfile. Are these two commands in the script where > we capture the incoming flowfile > > cat /dev/stdin >> $tmpzipfile > > ...and where we create the output flowfile from the ExecuteStreamCommand > processor? > > cat $tmptarfile >> /dev/stdout > > > On Thu, Feb 1, 2024 at 10:11 AM Michael Moser <[email protected]> wrote: > >> Hi Jim, >> >> The ExecuteStreamCommand will only output 1 flowfile, so using it to >> unzip in this fashion won't yield the results you need. >> >> Instead, you might try a workaround with ExecuteStreamCommand to unzip >> your file and then tar to repackage it. Then UnpackContent should be able >> to read the tar file metadata. I have used ExecuteStreamCommand to execute >> bash scripts. An example is shown below, which you can modify for your >> needs. The ExecuteStreamCommand properties "Command Path=/bin/bash" and >> "Command Arguments=/path/to/script.sh" is all you need for this script to >> work. >> >> #!/bin/bash >> tmpzipfile=$(mktemp) >> tmptarfile=$(mktemp) >> #remove the tmptarfile file, we just need a temporary filename, and will >> recreate it below >> rm -f $tmptarfile >> #create a directory to unzip files to >> tmpdir=$(mktemp -d) >> >> cat /dev/stdin >> $tmpzipfile >> # here is your unzip command to unzip $tmpzipfile to $tmpdir, preserving >> file metadata >> # here is your tar command to tar $tmpdir to $tmptarfile >> cat $tmptarfile >> /dev/stdout >> >> #cleanup >> rm -f $tmpzipfile >> rm -f $tmptarfile >> rm -rf $tmpdir >> >> >> >> On Wed, Jan 31, 2024 at 12:55 PM James McMahon <[email protected]> >> wrote: >> >>> If anyone can show me how to get my ExecuteStreamCommand configured >>> properly as a workaround, I am still interested in that. >>> Jim >>> >>> On Wed, Jan 31, 2024 at 12:39 PM James McMahon <[email protected]> >>> wrote: >>> >>>> I tried to find a Create option for tickets here, >>>> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-11859?filter=allopenissues >>>> . >>>> I did not find one, and suspect maybe I have no such privilege perhaps? >>>> In any case, thank you for creating that. >>>> Jim >>>> >>>> On Wed, Jan 31, 2024 at 12:37 PM Joe Witt <[email protected]> wrote: >>>> >>>>> I went ahead and wrote it up here >>>>> https://issues.apache.org/jira/browse/NIFI-12709 >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Jan 31, 2024 at 10:30 AM James McMahon <[email protected]> >>>>> wrote: >>>>> >>>>>> Happy to do that Joe. How do I create and submit a JIRA for >>>>>> consideration? I have not done one - at least, not for years. >>>>>> If you get me started, I will do a concise and thorough description >>>>>> in the ticket. >>>>>> Sincerely, >>>>>> Jim >>>>>> >>>>>> On Wed, Jan 31, 2024 at 12:12 PM Joe Witt <[email protected]> wrote: >>>>>> >>>>>>> James, >>>>>>> >>>>>>> Makes sense to create a JIRA to improve UnpackContent to extract >>>>>>> these attributes in the event of a zip file that happens to present >>>>>>> them. >>>>>>> The concept of lastModifiedDate does appear easily accessed if >>>>>>> available in >>>>>>> the metadata. Owner/Creator/Creation information looks less standard in >>>>>>> the case of a Zip but perhaps still capturable as extra fields. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Wed, Jan 31, 2024 at 10:01 AM James McMahon <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I tried to use UnpackContent to extract the files within a zip file >>>>>>>> named ABC DEF (1).zip. (the filename has spaces in its name). >>>>>>>> >>>>>>>> UnpackContent seemed to work, but it did not preserve file >>>>>>>> attributes from the files in the zip. For example, the >>>>>>>> lastModifiedTime is not available so downstream I am unable to do >>>>>>>> this: >>>>>>>> ${file.lastModifiedTime:toDate("yyyy-MM-dd'T'HH:mm:ssZ"):format("yyyyMMddHHmmss")} >>>>>>>> >>>>>>>> I did some digging and found that on the UnpackContent page, it >>>>>>>> says: >>>>>>>> file.lastModifiedTime "The date and time that the unpacked file >>>>>>>> was last modified (*tar only*)." >>>>>>>> >>>>>>>> I need these file attributes for those files I extract from the >>>>>>>> zip. So as an alternative I tried configuring an >>>>>>>> ExecuteStreamCommand processor like this: >>>>>>>> Command Arguments -c;"unzip -p -q < -" >>>>>>>> Command Path /bin/bash >>>>>>>> Argument Delimiter ; >>>>>>>> >>>>>>>> It throws these errors: >>>>>>>> >>>>>>>> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca >>>>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to >>>>>>>> write flow file to stdin due to Broken pipe: java.io.IOException: >>>>>>>> Broken >>>>>>>> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca >>>>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] >>>>>>>> Transferring >>>>>>>> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. >>>>>>>> Executable >>>>>>>> command /bin/bash ended in an error: /bin/bash: -: No such file or >>>>>>>> directory >>>>>>>> >>>>>>>> It does not seem to be applying the unzip to the stdin of the ESC >>>>>>>> processor. None of the files in the zip archive are output from ESC. >>>>>>>> >>>>>>>> What needs to be changed in my ESC configuration? >>>>>>>> >>>>>>>> Thank you in advance for any help. >>>>>>>> >>>>>>>>
