Thank you Mark. Had no idea there was this file-based dependency to 7z
files. Since my workaround appears to be working I think I may just move
forward with that.
Steve, Mark - thank you again for replying.
Jim

On Thu, Sep 29, 2022 at 9:15 AM Mark Payne <marka...@hotmail.com> wrote:

> It’s been a while. But if I remember correctly, the reason that NiFi does
> not natively support 7-zip format is that with 7-zip, the dictionary is
> written at the end of the file.
> So when data is compressed, the dictionary is built up during compression
> and written at the end. This makes sense from a compression standpoint.
> However, what it means, is that in order to decompress it, you must first
> jump to the end of the file in order to access the dictionary. Then jump
> back to the beginning of the file in order to perform the decompression.
> NiFi makes use of Input Streams and Output Streams for FlowFIle access -
> it doesn’t provide a File-based approach. And this ability to jump to the
> end, read the dictionary, and then jump back to the beginning isn’t really
> possible with Input/Output Streams - at least, not without buffering
> everything into memory.
>
> So it would make sense that there would be a “Not Implemented” error when
> attempting to do the same thing using the 7-zip application directly, when
> attempting to use input streams & output streams.
> I think that if you’re stuck with 7-zip, your own option will be to do
> what you’re doing - write the data out as a file, run the 7-zip application
> against that file, writing the output to some directory, and then picking
> up the files from that directory.
> The alternative, of course, would be to update the source so that it’s
> creating zip files instead of 7-zip files, if you have sway over the source
> producer.
>
> Thanks
> -Mark
>
>
> On Sep 29, 2022, at 8:58 AM, stephen.hindmarch.bt.com via users <
> users@nifi.apache.org> wrote:
>
> James,
>
> E_NOTIMPL means that feature is not implemented. I can see there is
> discussion about this down at sourceforge but the detail is blocked by my
> employer’s firewall.
>
> p7zip / Discussion / Help: E_NOTIMPL for stdin / stdout pipe
> <https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d/>
>
> https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d
>
> *Steve Hindmarch*
>
> *From:* James McMahon <jsmcmah...@gmail.com>
> *Sent:* 29 September 2022 12:12
> *To:* Hindmarch,SJ,Stephen,VIR R <stephen.hindma...@bt.com>
> *Cc:* users@nifi.apache.org
> *Subject:* Re: Can ExecuteStreamCommand do this?
>
> I ran with these Command Arguments in the ExecuteStreamCommand
> configuration:
> x;-si;-so;-spf;-aou
> ${filename} removed, -si indicating use of STDIN, -so STDOUT.
>
> The same error is thrown by 7z through ExecuteStreamCommand: Executable
> command /bin/7za ended in an error: ERROR: Can not open the file as an
> archive  E_NOTIMPL
>
> I tried this at the command line, getting the same failure:
> cat testArchive.7z | 7za x -si -so | dd of=stooges.txt
>
>
> On Thu, Sep 29, 2022 at 6:44 AM James McMahon <jsmcmah...@gmail.com>
> wrote:
>
> Good morning, Steve. Indeed, that second paragraph is *exactly* how I did
> get this to work. I unpack to disk and then read in the twelve results
> using a GetFile. So far it is working well. It just feels a little wrong to
> me to do this, as I have introduced an extra write to and read from disk,
> which is going to be slower than doing it all in memory within the JVM.
> While that may not seem like anything significant for a single 7z file, as
> we work across thousands and thousands it can be significant.
>
> I am about to try what you suggested above: dropping the ${filename}
> entirely from the STDIN / STDOUT configuration. I realize it is not likely
> going to give me the twelve output flowfiles I'm seeking in the "output
> stream" path from ExecuteStreamCommand. I just want to see if it works
> without throwing that error.
>
> Welcome any other thoughts or comments you may have. Thanks again for your
> comments so far.
>
> Jim
>
> On Thu, Sep 29, 2022 at 5:23 AM <stephen.hindma...@bt.com> wrote:
>
> James,
>
> I have been thinking more about your problem and this may be the wrong
> approach. If you successfully unpack your files into the flow file content,
> you will still have one output flow file containing the unpacked contents
> of all of your files. If you need 12 separate files in their own flowfiles
> then you will need to find some way of splitting them up. Is there a byte
> sequence you can use in a SplitContent process, or a specific file length
> you can use in SplitText?
>
> Otherwise you may be better off using ExecuteStreamCommand to unpack the
> files on disk. Run it verbosely and use the output of that step to create a
> list of the locations where your recently unpacked files are. Or create a
> temporary directory to unpack in and fetch all the files in there, cleaning
> up aftwerwards. Then you can load the files with FetchFile. FetchFile can
> be instructed to delete the file it has just read so can also clean up
> after itself.
>
> *Steve Hindmarch*
>
> *From:* stephen.hindmarch.bt.com
> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstephen.hindmarch.bt.com%2F&data=05%7C01%7Cstephen.hindmarch%40bt.com%7Ceb3e9d5ccfd74fc2646608daa20b7814%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000467408748985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XLh23oDzEOdy5nfg848cKdvu77cW18GwTRJxfj6COOE%3D&reserved=0>
>  via users <users@nifi.apache.org>
> *Sent:* 29 September 2022 09:19
> *To:* jsmcmah...@gmail.com; users@nifi.apache.org
> *Subject:* RE: Can ExecuteStreamCommand do this?
>
> James,
>
> Using ${filename} and -si together seems wrong to me. What happens when
> you try that on the command line?
>
> *Steve Hindmarch*
>
> *From:* James McMahon <jsmcmah...@gmail.com>
> *Sent:* 28 September 2022 13:49
> *To:* users@nifi.apache.org; Hindmarch,SJ,Stephen,VIR R <
> stephen.hindma...@bt.com>
> *Subject:* Re: Can ExecuteStreamCommand do this?
>
> Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files
> into the flow . When I have my ESC configured like this following, I get my
> unpacked files results to the #{unpacked.destination} directory on disk:
> Command Arguments
> x;${filename};-spf;-o#{unpacked.destination};-aou
> Command Path                    /bin/7a
> Ignore STDIN                       true
> Working Directory                #{unpacked.destination}
> Argument Delimiter               ;
> Output Destination Attribute  No value set
> I get twelve files in my output destination folder.
>
> When I try this one, get an error and no output:
> Command Arguments            x;${filename};-si;-so;-spf;-aou
> Command Path                    /bin/7a
> Ignore STDIN                       false
> Working Directory                #{unpacked.destination}
> Argument Delimiter               ;
> Output Destination Attribute  No value set
>
> This yields this error...
> Executable command /bin/7za ended in an error: ERROR: Can not open the
> file as archive
> E_NOTIMPL
> ...and it yields only one flowfile result in Output Stream, and that is a
> brief text/plain report of the results of the 7za extraction like this:
>
> This indicates it did indeed find my 7z file and it did indeed identify
> the 12 files in it, yet still I get no output to my outgoing flow path:
> Extracting archive: /parent/subparent/testArchive.7z
> - -
> Path = /parentdir/subdir/testArchive.7z
> Type = 7z
> Physical Size = 7204
> Headers Size = 298
> Method = LZMA2:96k
> Solid = +
> Blocks = 1
>
> Everything is Ok
>
> Folders: 1
> Files: 12
> Size: 90238
> Compressed: 7204
>
> ${filename} in both cases is a fully qualified name to the file, like
> this: /dir/subdir/myTestFile.7z.
>
> I can't seem to get the ESC output stream to be the extracted files.
> Anything jump out at you?
>
> On Wed, Sep 28, 2022 at 8:06 AM stephen.hindmarch.bt.com
> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstephen.hindmarch.bt.com%2F&data=05%7C01%7Cstephen.hindmarch%40bt.com%7Ceb3e9d5ccfd74fc2646608daa20b7814%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000467408748985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XLh23oDzEOdy5nfg848cKdvu77cW18GwTRJxfj6COOE%3D&reserved=0>
>  via users <users@nifi.apache.org> wrote:
>
> Hi James,
>
> I am not in a position to test this right now, but you have to think of
> the flowfile content as STDIN and STDOUT. So with 7zip you need to use the
> “-si” and “-so” flags to ensure there are no files involved. Then if you
> can load the content of a file into a flowfile, eg with GetFile, then you
> should be able to unpack it with ExecuteStreamCommand. Set “Ignore STDIN” =
> “false”.
>
> I have written up my own use case on github. This involves having a Redis
> script as the input, and results of the script as the output.
>
> my-nifi-cluster/experiment-redis_direct.md at main ·
> hindmasj/my-nifi-cluster · GitHub
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhindmasj%2Fmy-nifi-cluster%2Fblob%2Fmain%2Fdocs%2Fexperiment-redis_direct.md&data=05%7C01%7Cstephen.hindmarch%40bt.com%7Ceb3e9d5ccfd74fc2646608daa20b7814%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C638000467408748985%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CvSe4TJ%2FU14HvUg%2FTYg4S9sTKikYeNpyrNfGbpoC31A%3D&reserved=0>
>
> The first part of the post shows how to do it with the input commands on
> the command line, so a bit like you running “7za ${filename} -so”. The
> second part has the script inside the flowfile and is treated as STDIN, a
> bit like you doing “unzip -si -so”.
>
> See if that helps. Fundamentally, if you do “7za -si -so < myfile.7z” on
> the command line and see the output on the console, ExecuteStreamCommand
> will behave the same.
>
> *Steve Hindmarch*
> *From:* James McMahon <jsmcmah...@gmail.com>
> *Sent:* 28 September 2022 12:02
> *To:* users@nifi.apache.org
> *Subject:* Can ExecuteStreamCommand do this?
>
> I continue to struggle with ExecuteStreamCommand, and am hoping one of you
> from our user community can help me with the following:
> 1. Can ExecuteStreamCommand be used as I am trying to use it?
> 2. Can you direct me to an example where ExecuteStreamCommand is
> configured to do something similar to my use case?
>
> My use case:
> The incoming flowfiles in my flow path are 7z zips. Based on what I've
> researched so far, NiFi's native processors don't handle unpacking of 7z
> files.
>
> I want to read the 7z files as STDIN to ExecuteStreamCommand.
> I'd like the processor to call out to a 7za app, which will unpack the 7z.
> One incoming flowfile will yield multiple output files. Let's say twelve
> in this case.
> My goal is to output those twelve as new flowfiles out of
> ExecuteStreamCommand, to its output stream path.
>
> I can't yet get this to work. Best I've been able to do is configure
> ExecuteStreamCommand to unpack ${filename} to a temporary output directory
> on disk. Then I have another path in my flow polling that directory every
> few minutes looking for new data. Am hoping to eliminate that intermediate
> write/read to/from disk by keeping this all within the flow and JVM memory.
>
> Thanks very much in advance for any assistance.
>
>
>

Reply via email to