[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Andreas Beeker (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307472#comment-17307472 ] Andreas Beeker commented on TIKA-3164: -- I'm now recursing through .xlsx and .docx in

[jira] [Commented] (TIKA-3335) "manifest is not bound" new exception in some open office files

2021-03-23 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307450#comment-17307450 ] Hudson commented on TIKA-3335: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk

[jira] [Commented] (TIKA-3336) New zip bombs detected

2021-03-23 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307451#comment-17307451 ] Hudson commented on TIKA-3336: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk

[jira] [Commented] (TIKA-3334) Threadsafety bug in OpenDocumentParser

2021-03-23 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307449#comment-17307449 ] Hudson commented on TIKA-3334: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk

[jira] [Updated] (TIKA-3334) Threadsafety bug in OpenDocumentParser

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3334: -- Priority: Blocker (was: Major) > Threadsafety bug in OpenDocumentParser > -

[jira] [Commented] (TIKA-3334) Threadsafety bug in OpenDocumentParser

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307429#comment-17307429 ] Tim Allison commented on TIKA-3334: --- Fixed now in branch_1x. I'll cherrypick this and t

[jira] [Commented] (TIKA-3329) RTG Translator with many-to-eng translation

2021-03-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307428#comment-17307428 ] ASF GitHub Bot commented on TIKA-3329: -- thammegowda commented on pull request #419: U

[GitHub] [tika] thammegowda commented on pull request #419: fix for TIKA-3329 contributed by Thamme Gowda

2021-03-23 Thread GitBox
thammegowda commented on pull request #419: URL: https://github.com/apache/tika/pull/419#issuecomment-805283880 @lewismc Thanks for the suggestion. I will make a page on the wiki for this feature. -- This is an automated message from the Apache Git Service. To respond to the message, pl

[jira] [Commented] (TIKA-3334) Threadsafety bug in OpenDocumentParser

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307404#comment-17307404 ] Tim Allison commented on TIKA-3334: --- The EmbeddedDocumentUtil is not threadsafe. We nee

[jira] [Updated] (TIKA-3334) Threadsafety bug in OpenDocumentParser

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3334: -- Summary: Threadsafety bug in OpenDocumentParser (was: ODS content leaking into thumbnail entry in rmeta

Re: 1.26?

2021-03-23 Thread Tim Allison
The govdocs file has 1290 MACRO (javascript) "attachments" with Tika 1.26-SNAPSHOT and 930 with Tika 1.25. I have no idea why there are more macros in the more recent version of Tika, but there are "attachments" broadly speaking. I'll look into the NPEs. If those are a Java bug, I don't think th

Re: 1.26?

2021-03-23 Thread Tim Allison
Will take a look. Thank you! On Tue, Mar 23, 2021 at 2:58 PM Tilman Hausherr wrote: > > Am 23.03.2021 um 17:31 schrieb Tim Allison: > > Reports are available here: > > https://corpora.tika.apache.org/base/reports/1_25_v_1_26.tgz > > > govdocs1/966/966679.pdf > > claims to have 360 attachments mo

Re: 1.26?

2021-03-23 Thread Tilman Hausherr
Am 23.03.2021 um 17:31 schrieb Tim Allison: Reports are available here: https://corpora.tika.apache.org/base/reports/1_25_v_1_26.tgz govdocs1/966/966679.pdf claims to have 360 attachments more than last time. I don't see a single attachment, and when I run tika-app with "--extract" I get not

[jira] [Commented] (TIKA-3334) ODS content leaking into thumbnail entry in rmeta

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307352#comment-17307352 ] Tim Allison commented on TIKA-3334: --- LIBRE_OFFICE-132160-0.odt was fixed by the commit f

[jira] [Created] (TIKA-3336) New zip bombs detected

2021-03-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-3336: - Summary: New zip bombs detected Key: TIKA-3336 URL: https://issues.apache.org/jira/browse/TIKA-3336 Project: Tika Issue Type: Task Reporter: Tim Alliso

[jira] [Created] (TIKA-3335) "manifest is not bound" new exception in some open office files

2021-03-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-3335: - Summary: "manifest is not bound" new exception in some open office files Key: TIKA-3335 URL: https://issues.apache.org/jira/browse/TIKA-3335 Project: Tika Issue T

[jira] [Commented] (TIKA-3334) ODS content leaking into thumbnail entry in rmeta

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307228#comment-17307228 ] Tim Allison commented on TIKA-3334: --- Possibly related. There are a bunch of open office

[jira] [Created] (TIKA-3334) ODS content leaking into thumbnail entry in rmeta

2021-03-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-3334: - Summary: ODS content leaking into thumbnail entry in rmeta Key: TIKA-3334 URL: https://issues.apache.org/jira/browse/TIKA-3334 Project: Tika Issue Type: Task

Re: 1.26?

2021-03-23 Thread Tim Allison
Reports are available here: https://corpora.tika.apache.org/base/reports/1_25_v_1_26.tgz I haven't looked carefully yet, but it looks like we need a tweak to TIKA-3325...there are a couple of handfuls of new "potential zip bomb" exceptions. Will look deeper... On Mon, Mar 22, 2021 at 2:19 PM Tim

[jira] [Comment Edited] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307139#comment-17307139 ] Tim Allison edited comment on TIKA-3164 at 3/23/21, 2:42 PM: -

[jira] [Comment Edited] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307139#comment-17307139 ] Tim Allison edited comment on TIKA-3164 at 3/23/21, 2:41 PM: -

[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307139#comment-17307139 ] Tim Allison commented on TIKA-3164: --- Yep, that's exactly what's going on. I found that

[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread PJ Fanning (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307135#comment-17307135 ] PJ Fanning commented on TIKA-3164: -- [~tallison] I don't know for definite but we have 2 j

[jira] [Comment Edited] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307099#comment-17307099 ] Tim Allison edited comment on TIKA-3164 at 3/23/21, 2:07 PM: -

[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-03-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307099#comment-17307099 ] Tim Allison commented on TIKA-3164: --- [~fanningpj], many thanks for your help on this. I

[jira] [Created] (TIKA-3333) Healthcheck for TIKA server in docker container

2021-03-23 Thread Paul Vogel (Jira)
Paul Vogel created TIKA-: Summary: Healthcheck for TIKA server in docker container Key: TIKA- URL: https://issues.apache.org/jira/browse/TIKA- Project: Tika Issue Type: Wish