Thank you Mark. Had no idea there was this file-based dependency to 7z files. Since my workaround appears to be working I think I may just move forward with that. Steve, Mark - thank you again for replying. Jim
On Thu, Sep 29, 2022 at 9:15 AM Mark Payne <marka...@hotmail.com> wrote: > It’s been a while. But if I remember correctly, the reason that NiFi does > not natively support 7-zip format is that with 7-zip, the dictionary is > written at the end of the file. > So when data is compressed, the dictionary is built up during compression > and written at the end. This makes sense from a compression standpoint. > However, what it means, is that in order to decompress it, you must first > jump to the end of the file in order to access the dictionary. Then jump > back to the beginning of the file in order to perform the decompression. > NiFi makes use of Input Streams and Output Streams for FlowFIle access - > it doesn’t provide a File-based approach. And this ability to jump to the > end, read the dictionary, and then jump back to the beginning isn’t really > possible with Input/Output Streams - at least, not without buffering > everything into memory. > > So it would make sense that there would be a “Not Implemented” error when > attempting to do the same thing using the 7-zip application directly, when > attempting to use input streams & output streams. > I think that if you’re stuck with 7-zip, your own option will be to do > what you’re doing - write the data out as a file, run the 7-zip application > against that file, writing the output to some directory, and then picking > up the files from that directory. > The alternative, of course, would be to update the source so that it’s > creating zip files instead of 7-zip files, if you have sway over the source > producer. > > Thanks > -Mark > > > On Sep 29, 2022, at 8:58 AM, stephen.hindmarch.bt.com via users < > users@nifi.apache.org> wrote: > > James, > > E_NOTIMPL means that feature is not implemented. I can see there is > discussion about this down at sourceforge but the detail is blocked by my > employer’s firewall. > > p7zip / Discussion / Help: E_NOTIMPL for stdin / stdout pipe > <https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d/> > > https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d > > *Steve Hindmarch* > > *From:* James McMahon <jsmcmah...@gmail.com> > *Sent:* 29 September 2022 12:12 > *To:* Hindmarch,SJ,Stephen,VIR R <stephen.hindma...@bt.com> > *Cc:* users@nifi.apache.org > *Subject:* Re: Can ExecuteStreamCommand do this? > > I ran with these Command Arguments in the ExecuteStreamCommand > configuration: > x;-si;-so;-spf;-aou > ${filename} removed, -si indicating use of STDIN, -so STDOUT. > > The same error is thrown by 7z through ExecuteStreamCommand: Executable > command /bin/7za ended in an error: ERROR: Can not open the file as an > archive E_NOTIMPL > > I tried this at the command line, getting the same failure: > cat testArchive.7z | 7za x -si -so | dd of=stooges.txt > > > On Thu, Sep 29, 2022 at 6:44 AM James McMahon <jsmcmah...@gmail.com> > wrote: > > Good morning, Steve. Indeed, that second paragraph is *exactly* how I did > get this to work. I unpack to disk and then read in the twelve results > using a GetFile. So far it is working well. It just feels a little wrong to > me to do this, as I have introduced an extra write to and read from disk, > which is going to be slower than doing it all in memory within the JVM. > While that may not seem like anything significant for a single 7z file, as > we work across thousands and thousands it can be significant. > > I am about to try what you suggested above: dropping the ${filename} > entirely from the STDIN / STDOUT configuration. I realize it is not likely > going to give me the twelve output flowfiles I'm seeking in the "output > stream" path from ExecuteStreamCommand. I just want to see if it works > without throwing that error. > > Welcome any other thoughts or comments you may have. Thanks again for your > comments so far. > > Jim > > On Thu, Sep 29, 2022 at 5:23 AM <stephen.hindma...@bt.com> wrote: > > James, > > I have been thinking more about your problem and this may be the wrong > approach. If you successfully unpack your files into the flow file content, > you will still have one output flow file containing the unpacked contents > of all of your files. If you need 12 separate files in their own flowfiles > then you will need to find some way of splitting them up. Is there a byte > sequence you can use in a SplitContent process, or a specific file length > you can use in SplitText? > > Otherwise you may be better off using ExecuteStreamCommand to unpack the > files on disk. Run it verbosely and use the output of that step to create a > list of the locations where your recently unpacked files are. Or create a > temporary directory to unpack in and fetch all the files in there, cleaning > up aftwerwards. Then you can load the files with FetchFile. FetchFile can > be instructed to delete the file it has just read so can also clean up > after itself. > > *Steve Hindmarch* > > *From:* stephen.hindmarch.bt.com > <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstephen.hindmarch.bt.com%2F&data=05%7C01%7Cstephen.hindmarch%40bt.com%7Ceb3e9d5ccfd74fc2646608daa20b7814%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000467408748985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XLh23oDzEOdy5nfg848cKdvu77cW18GwTRJxfj6COOE%3D&reserved=0> > via users <users@nifi.apache.org> > *Sent:* 29 September 2022 09:19 > *To:* jsmcmah...@gmail.com; users@nifi.apache.org > *Subject:* RE: Can ExecuteStreamCommand do this? > > James, > > Using ${filename} and -si together seems wrong to me. What happens when > you try that on the command line? > > *Steve Hindmarch* > > *From:* James McMahon <jsmcmah...@gmail.com> > *Sent:* 28 September 2022 13:49 > *To:* users@nifi.apache.org; Hindmarch,SJ,Stephen,VIR R < > stephen.hindma...@bt.com> > *Subject:* Re: Can ExecuteStreamCommand do this? > > Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files > into the flow . When I have my ESC configured like this following, I get my > unpacked files results to the #{unpacked.destination} directory on disk: > Command Arguments > x;${filename};-spf;-o#{unpacked.destination};-aou > Command Path /bin/7a > Ignore STDIN true > Working Directory #{unpacked.destination} > Argument Delimiter ; > Output Destination Attribute No value set > I get twelve files in my output destination folder. > > When I try this one, get an error and no output: > Command Arguments x;${filename};-si;-so;-spf;-aou > Command Path /bin/7a > Ignore STDIN false > Working Directory #{unpacked.destination} > Argument Delimiter ; > Output Destination Attribute No value set > > This yields this error... > Executable command /bin/7za ended in an error: ERROR: Can not open the > file as archive > E_NOTIMPL > ...and it yields only one flowfile result in Output Stream, and that is a > brief text/plain report of the results of the 7za extraction like this: > > This indicates it did indeed find my 7z file and it did indeed identify > the 12 files in it, yet still I get no output to my outgoing flow path: > Extracting archive: /parent/subparent/testArchive.7z > - - > Path = /parentdir/subdir/testArchive.7z > Type = 7z > Physical Size = 7204 > Headers Size = 298 > Method = LZMA2:96k > Solid = + > Blocks = 1 > > Everything is Ok > > Folders: 1 > Files: 12 > Size: 90238 > Compressed: 7204 > > ${filename} in both cases is a fully qualified name to the file, like > this: /dir/subdir/myTestFile.7z. > > I can't seem to get the ESC output stream to be the extracted files. > Anything jump out at you? > > On Wed, Sep 28, 2022 at 8:06 AM stephen.hindmarch.bt.com > <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstephen.hindmarch.bt.com%2F&data=05%7C01%7Cstephen.hindmarch%40bt.com%7Ceb3e9d5ccfd74fc2646608daa20b7814%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000467408748985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XLh23oDzEOdy5nfg848cKdvu77cW18GwTRJxfj6COOE%3D&reserved=0> > via users <users@nifi.apache.org> wrote: > > Hi James, > > I am not in a position to test this right now, but you have to think of > the flowfile content as STDIN and STDOUT. So with 7zip you need to use the > “-si” and “-so” flags to ensure there are no files involved. Then if you > can load the content of a file into a flowfile, eg with GetFile, then you > should be able to unpack it with ExecuteStreamCommand. Set “Ignore STDIN” = > “false”. > > I have written up my own use case on github. This involves having a Redis > script as the input, and results of the script as the output. > > my-nifi-cluster/experiment-redis_direct.md at main · > hindmasj/my-nifi-cluster · GitHub > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhindmasj%2Fmy-nifi-cluster%2Fblob%2Fmain%2Fdocs%2Fexperiment-redis_direct.md&data=05%7C01%7Cstephen.hindmarch%40bt.com%7Ceb3e9d5ccfd74fc2646608daa20b7814%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000467408748985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CvSe4TJ%2FU14HvUg%2FTYg4S9sTKikYeNpyrNfGbpoC31A%3D&reserved=0> > > The first part of the post shows how to do it with the input commands on > the command line, so a bit like you running “7za ${filename} -so”. The > second part has the script inside the flowfile and is treated as STDIN, a > bit like you doing “unzip -si -so”. > > See if that helps. Fundamentally, if you do “7za -si -so < myfile.7z” on > the command line and see the output on the console, ExecuteStreamCommand > will behave the same. > > *Steve Hindmarch* > *From:* James McMahon <jsmcmah...@gmail.com> > *Sent:* 28 September 2022 12:02 > *To:* users@nifi.apache.org > *Subject:* Can ExecuteStreamCommand do this? > > I continue to struggle with ExecuteStreamCommand, and am hoping one of you > from our user community can help me with the following: > 1. Can ExecuteStreamCommand be used as I am trying to use it? > 2. Can you direct me to an example where ExecuteStreamCommand is > configured to do something similar to my use case? > > My use case: > The incoming flowfiles in my flow path are 7z zips. Based on what I've > researched so far, NiFi's native processors don't handle unpacking of 7z > files. > > I want to read the 7z files as STDIN to ExecuteStreamCommand. > I'd like the processor to call out to a 7za app, which will unpack the 7z. > One incoming flowfile will yield multiple output files. Let's say twelve > in this case. > My goal is to output those twelve as new flowfiles out of > ExecuteStreamCommand, to its output stream path. > > I can't yet get this to work. Best I've been able to do is configure > ExecuteStreamCommand to unpack ${filename} to a temporary output directory > on disk. Then I have another path in my flow polling that directory every > few minutes looking for new data. Am hoping to eliminate that intermediate > write/read to/from disk by keeping this all within the flow and JVM memory. > > Thanks very much in advance for any assistance. > > >