[
https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting updated TIKA-346:
-------------------------------
Attachment: TIKA-346.patch
The attached patch fixes this problem after recent Commons Compress changes
related to COMPRESS-93. We can apply the patch once Commons Compress 1.1 is
available.
> ZipParser throws "invalid compression method" error for some archives
> ---------------------------------------------------------------------
>
> Key: TIKA-346
> URL: https://issues.apache.org/jira/browse/TIKA-346
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.5
> Environment: Windows XP, JVM 1.6.16
> Reporter: Robert Trickey
> Attachments: moby.zip, TIKA-346.patch
>
>
> This could be a bug in the underlying apache-commons code. When trying to
> parse the attached file to extract text content, an error is thrown with the
> following stacktrace:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.pkg.zippar...@1b963c4
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at my.code.wherever.....
> Caused by: java.lang.IllegalArgumentException: invalid compression method
> at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
> at
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
> at
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
> at
> org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
> at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> ... 25 more
> I have extracted the content of the zip and ran the autodetect parser against
> all content files without problems, so it is definitely the zip that is the
> problem.
> The attached zip is from Project Gutenberg and hence public domain.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.