Here is the challenge I am trying to work around. In NiFi, a processor
called UnpackContent can be used to extract from a range of compressed
formats - tars and zips among them.

I need to access the file metadata of the extracted files. If the
compressed parent file is a tar, UnpackContent exposes the file metadata of
the extracted files. But if the file is a zip, UnpackContent does not.

As a workaround I want a Groovy script that extracts the files from a zip
preserving file metadata, placing each extracted file in the output stream
of the ExecuteGroovyScript processor with the metadata as attributes.

I cannot get the groovy script to successfully extract from the zip. Today
I will continue to try. My first change will be to switch my import to the
correct Apache lib. MG, thank you for that recommendation.

Is there a better way to do this? I would welcome any help.

Does this explain a little more clearly the what and why?

Jim

On Sat, Feb 17, 2024 at 5:10 AM Bob Brown <b...@transentia.com.au> wrote:

> Not entirely sure that is what James is looking for…I THINK he’s more
> interested in reading than creating.
>
>
>
> Commons compress has some example code at
> https://commons.apache.org/proper/commons-compress/examples.html:
>
>
>
> ===
>
> InputStream fin = Files.newInputStream(Paths.get("some-file"));
>
> BufferedInputStream in = new BufferedInputStream(fin);
>
> OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
>
> Deflate64CompressorInputStream defIn = new
> Deflate64CompressorInputStream(in);
>
> final byte[] buffer = new byte[buffersize];
>
> int n = 0;
>
> while (-1 != (n = defIn.read(buffer))) {
>
>     out.write(buffer, 0, n);
>
> }
>
> out.close();
>
> defIn.close();
>
> ===
>
>
>
> BOB
>
>
>
> *From:* MG <mg...@arscreat.com>
> *Sent:* Saturday, February 17, 2024 10:37 AM
> *To:* users@groovy.apache.org; Bob Brown <b...@transentia.com.au>
> *Subject:* Re: Cannot process zip file with Groovy
>
>
>
> I agree, would also recommend using Apache libs, we use e.g. the ZIP
> classes that come with the ant lib in the Groovy distribution
> (org.apache.tools.zip.*):
>
> Here is a quickly sanitzed version of our code (disclaimer: Not
> compiled/tested; Zip64Mode.Always is important if you expect larger files):
>
> InputStream zipInputStream(String compressedFilename) {
>     final zipFile = new ZipFile(new File(compressedFilename))
>     final zipEntry = (ZipEntry) zipFile.entries.nextElement()
>     if(zipEntry === null) { throw new Exception("${zipFile.name} has no
> entries") }
>     final zis = zipFile.getInputStream(zipEntry)
>     return zis
> }
>
> OutputStream zipOutputStream(String filename, String
> compressedFileExtension = "zip") {
>     final fos = new FileOutputStream(filename + '.' +
> compressedFileExtension)
>     final zos = new ZipOutputStream(fos)
>     zos.useZip64 = Zip64Mode.Always // To avoid
> org.apache.tools.zip.Zip64RequiredException: ... exceeds the limit of
> 4GByte.
>     final zipFileName =
> org.apache.commons.io.FilenameUtils.getName(filename)
>     final zipEntry = new ZipEntry(zipFileName)
>     zos.putNextEntry(zipEntry)
>     return zos
> }
>
> Cheers,
> mg
>
>
> On 17/02/2024 00:52, Bob Brown wrote:
>
> MY first thought was “are you SURE it is a kosher Zip file?”
>
>
>
> Sometimes one gets ‘odd’ gzip files masquerading as plain zip files.
>
>
>
> Also, apparently “java.util.Zip does not support DEFLATE64 compression
> method.” :
> https://www.ibm.com/support/pages/zip-file-fails-route-invalid-compression-method-error
>
>
>
> IF this is the case, you may need to use:
> https://commons.apache.org/proper/commons-compress/zip.html
>
> (maybe worth looking at the “Known Interoperability Problems” section of
> the above doc)
>
>
>
> May be helpful: https://stackoverflow.com/a/76321625
>
>
>
> HTH
>
>
>
> BOB
>
>
>
> *From:* James McMahon <jsmcmah...@gmail.com> <jsmcmah...@gmail.com>
> *Sent:* Saturday, February 17, 2024 4:20 AM
> *To:* users@groovy.apache.org
> *Subject:* Re: Cannot process zip file with Groovy
>
>
>
> Hello Paul, and thanks again for taking a moment to look at this. I tried
> as you suggested:
>
> - - - - - - - - - -
>
> import java.util.zip.ZipInputStream
>
> def ff = session.get()
> if (!ff) return
>
> try {
>     ff = session.write(ff, { inputStream, outputStream ->
>         def zipInputStream = new ZipInputStream(inputStream)
>         def entry = zipInputStream.getNextEntry()
>         while (entry != null) {
>             entry = zipInputStream.getNextEntry()
>         }
>         *outputStream = inputStream*
>     } as StreamCallback)
>
>     session.transfer(ff, REL_SUCCESS)
> } catch (Exception e) {
>     log.error('Error occurred processing FlowFile', e)
>     session.transfer(ff, REL_FAILURE)
> }
>
> - - - - - - - - - -
>
>
>
> Once again it threw this error and failed:
>
>
>
> ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086] Error occurred 
> processing FlowFile: org.apache.nifi.processor.exception.ProcessException: 
> IOException thrown from 
> ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086]: 
> java.util.zip.ZipException: invalid compression method
>
> - Caused by: java.util.zip.ZipException: invalid compression method
>
>
>
> It bears repeating: I am able to list and unzip the file at the linux command 
> line, but cannot get it to work from the script.
>
>
>
> What is interesting (and a little frustrating) is that the NiFi UnpackContent 
> *will *successfully unzip the zip file. However, the reason I am trying to do 
> it in Groovy is that UnpackContent exposes the file metadata for each file in 
> a tar archive - lastModifiedDate, for example - but it does *not* do so for 
> files extracted from zips. And I need that metadata. So here I be.
>
>
>
> Can I explicitly set my (de)compression in the Groovy script? Where would I 
> do that, and what values does one typically encounter for zip compression?
>
>
>
> Jim
>
>
>
> On Thu, Feb 15, 2024 at 9:26 PM Paul King <pa...@asert.com.au> wrote:
>
> What you are doing to read the zip looks okay.
>
> Just a guess, but it could be that because you haven't written to the
> output stream, it is essentially a corrupt data stream as far as NiFi
> processing is concerned. What happens if you set "outputStream =
> inputStream" as the last line of your callback?
>
> Paul.
>
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> Virus-free.www.avast.com
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Fri, Feb 16, 2024 at 8:48 AM James McMahon <jsmcmah...@gmail.com>
> wrote:
> >
> > I am struggling to build a Groovy scri[t I can run from a NiFi
> ExecuteScript processor to extract from a zip file and stream to a tar
> archive.
> >
> > I tried to tackle it all at once and made little progress.
> > I am now just trying to read the zip file, and am getting this error:
> >
> > ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086] Error occurred
> processing FlowFile: org.apache.nifi.processor.exception.ProcessException:
> IOException thrown from
> ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086]:
> java.util.zip.ZipException: invalid compression method
> > - Caused by: java.util.zip.ZipException: invalid compression method
> >
> >
> > This is my simplified code:
> >
> >
> > import java.util.zip.ZipInputStream
> >
> > def ff = session.get()
> > if (!ff) return
> >
> > try {
> >     ff = session.write(ff, { inputStream, outputStream ->
> >         def zipInputStream = new ZipInputStream(inputStream)
> >         def entry = zipInputStream.getNextEntry()
> >         while (entry != null) {
> >             entry = zipInputStream.getNextEntry()
> >         }
> >     } as StreamCallback)
> >
> >     session.transfer(ff, REL_SUCCESS)
> > } catch (Exception e) {
> >     log.error('Error occurred processing FlowFile', e)
> >     session.transfer(ff, REL_FAILURE)
> > }
> >
> >
> > I am able to list and unzip the file at the linux command line, but
> cannot get it to work from the script.
> >
> >
> > Has anyone had success doing this? Can anyone help me get past this
> error?
> >
> >
> > Thanks in advance.
> >
> > Jim
> >
> >
>
>
>

Reply via email to