Here is the challenge I am trying to work around. In NiFi, a processor called UnpackContent can be used to extract from a range of compressed formats - tars and zips among them.
I need to access the file metadata of the extracted files. If the compressed parent file is a tar, UnpackContent exposes the file metadata of the extracted files. But if the file is a zip, UnpackContent does not. As a workaround I want a Groovy script that extracts the files from a zip preserving file metadata, placing each extracted file in the output stream of the ExecuteGroovyScript processor with the metadata as attributes. I cannot get the groovy script to successfully extract from the zip. Today I will continue to try. My first change will be to switch my import to the correct Apache lib. MG, thank you for that recommendation. Is there a better way to do this? I would welcome any help. Does this explain a little more clearly the what and why? Jim On Sat, Feb 17, 2024 at 5:10 AM Bob Brown <b...@transentia.com.au> wrote: > Not entirely sure that is what James is looking for…I THINK he’s more > interested in reading than creating. > > > > Commons compress has some example code at > https://commons.apache.org/proper/commons-compress/examples.html: > > > > === > > InputStream fin = Files.newInputStream(Paths.get("some-file")); > > BufferedInputStream in = new BufferedInputStream(fin); > > OutputStream out = Files.newOutputStream(Paths.get("archive.tar")); > > Deflate64CompressorInputStream defIn = new > Deflate64CompressorInputStream(in); > > final byte[] buffer = new byte[buffersize]; > > int n = 0; > > while (-1 != (n = defIn.read(buffer))) { > > out.write(buffer, 0, n); > > } > > out.close(); > > defIn.close(); > > === > > > > BOB > > > > *From:* MG <mg...@arscreat.com> > *Sent:* Saturday, February 17, 2024 10:37 AM > *To:* users@groovy.apache.org; Bob Brown <b...@transentia.com.au> > *Subject:* Re: Cannot process zip file with Groovy > > > > I agree, would also recommend using Apache libs, we use e.g. the ZIP > classes that come with the ant lib in the Groovy distribution > (org.apache.tools.zip.*): > > Here is a quickly sanitzed version of our code (disclaimer: Not > compiled/tested; Zip64Mode.Always is important if you expect larger files): > > InputStream zipInputStream(String compressedFilename) { > final zipFile = new ZipFile(new File(compressedFilename)) > final zipEntry = (ZipEntry) zipFile.entries.nextElement() > if(zipEntry === null) { throw new Exception("${zipFile.name} has no > entries") } > final zis = zipFile.getInputStream(zipEntry) > return zis > } > > OutputStream zipOutputStream(String filename, String > compressedFileExtension = "zip") { > final fos = new FileOutputStream(filename + '.' + > compressedFileExtension) > final zos = new ZipOutputStream(fos) > zos.useZip64 = Zip64Mode.Always // To avoid > org.apache.tools.zip.Zip64RequiredException: ... exceeds the limit of > 4GByte. > final zipFileName = > org.apache.commons.io.FilenameUtils.getName(filename) > final zipEntry = new ZipEntry(zipFileName) > zos.putNextEntry(zipEntry) > return zos > } > > Cheers, > mg > > > On 17/02/2024 00:52, Bob Brown wrote: > > MY first thought was “are you SURE it is a kosher Zip file?” > > > > Sometimes one gets ‘odd’ gzip files masquerading as plain zip files. > > > > Also, apparently “java.util.Zip does not support DEFLATE64 compression > method.” : > https://www.ibm.com/support/pages/zip-file-fails-route-invalid-compression-method-error > > > > IF this is the case, you may need to use: > https://commons.apache.org/proper/commons-compress/zip.html > > (maybe worth looking at the “Known Interoperability Problems” section of > the above doc) > > > > May be helpful: https://stackoverflow.com/a/76321625 > > > > HTH > > > > BOB > > > > *From:* James McMahon <jsmcmah...@gmail.com> <jsmcmah...@gmail.com> > *Sent:* Saturday, February 17, 2024 4:20 AM > *To:* users@groovy.apache.org > *Subject:* Re: Cannot process zip file with Groovy > > > > Hello Paul, and thanks again for taking a moment to look at this. I tried > as you suggested: > > - - - - - - - - - - > > import java.util.zip.ZipInputStream > > def ff = session.get() > if (!ff) return > > try { > ff = session.write(ff, { inputStream, outputStream -> > def zipInputStream = new ZipInputStream(inputStream) > def entry = zipInputStream.getNextEntry() > while (entry != null) { > entry = zipInputStream.getNextEntry() > } > *outputStream = inputStream* > } as StreamCallback) > > session.transfer(ff, REL_SUCCESS) > } catch (Exception e) { > log.error('Error occurred processing FlowFile', e) > session.transfer(ff, REL_FAILURE) > } > > - - - - - - - - - - > > > > Once again it threw this error and failed: > > > > ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086] Error occurred > processing FlowFile: org.apache.nifi.processor.exception.ProcessException: > IOException thrown from > ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086]: > java.util.zip.ZipException: invalid compression method > > - Caused by: java.util.zip.ZipException: invalid compression method > > > > It bears repeating: I am able to list and unzip the file at the linux command > line, but cannot get it to work from the script. > > > > What is interesting (and a little frustrating) is that the NiFi UnpackContent > *will *successfully unzip the zip file. However, the reason I am trying to do > it in Groovy is that UnpackContent exposes the file metadata for each file in > a tar archive - lastModifiedDate, for example - but it does *not* do so for > files extracted from zips. And I need that metadata. So here I be. > > > > Can I explicitly set my (de)compression in the Groovy script? Where would I > do that, and what values does one typically encounter for zip compression? > > > > Jim > > > > On Thu, Feb 15, 2024 at 9:26 PM Paul King <pa...@asert.com.au> wrote: > > What you are doing to read the zip looks okay. > > Just a guess, but it could be that because you haven't written to the > output stream, it is essentially a corrupt data stream as far as NiFi > processing is concerned. What happens if you set "outputStream = > inputStream" as the last line of your callback? > > Paul. > > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > Virus-free.www.avast.com > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > On Fri, Feb 16, 2024 at 8:48 AM James McMahon <jsmcmah...@gmail.com> > wrote: > > > > I am struggling to build a Groovy scri[t I can run from a NiFi > ExecuteScript processor to extract from a zip file and stream to a tar > archive. > > > > I tried to tackle it all at once and made little progress. > > I am now just trying to read the zip file, and am getting this error: > > > > ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086] Error occurred > processing FlowFile: org.apache.nifi.processor.exception.ProcessException: > IOException thrown from > ExecuteScript[id=ae3e5de5-018d-1000-ff81-b0c807b75086]: > java.util.zip.ZipException: invalid compression method > > - Caused by: java.util.zip.ZipException: invalid compression method > > > > > > This is my simplified code: > > > > > > import java.util.zip.ZipInputStream > > > > def ff = session.get() > > if (!ff) return > > > > try { > > ff = session.write(ff, { inputStream, outputStream -> > > def zipInputStream = new ZipInputStream(inputStream) > > def entry = zipInputStream.getNextEntry() > > while (entry != null) { > > entry = zipInputStream.getNextEntry() > > } > > } as StreamCallback) > > > > session.transfer(ff, REL_SUCCESS) > > } catch (Exception e) { > > log.error('Error occurred processing FlowFile', e) > > session.transfer(ff, REL_FAILURE) > > } > > > > > > I am able to list and unzip the file at the linux command line, but > cannot get it to work from the script. > > > > > > Has anyone had success doing this? Can anyone help me get past this > error? > > > > > > Thanks in advance. > > > > Jim > > > > > > >