[jira] [Updated] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4218: -- Fix Version/s: 2.9.2 > Run regression tests to support 2.9.2 release >

[jira] [Resolved] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4171. --- Resolution: Fixed > Tika server only returns last value for PDFs that have multiple of the

[jira] [Resolved] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4238. --- Resolution: Fixed > replace some deprecated code > > >

[jira] [Created] (TIKA-4239) Update to 2.9.3

2024-04-06 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4239: - Summary: Update to 2.9.3 Key: TIKA-4239 URL: https://issues.apache.org/jira/browse/TIKA-4239 Project: Tika Issue Type: Task Components: build

[jira] [Updated] (TIKA-4239) Update to 2.9.3

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4239: -- Affects Version/s: 2.9.2 > Update to 2.9.3 > --- > > Key: TIKA-4239

[jira] [Resolved] (TIKA-4162) Update to 2.9.2

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4162. --- Assignee: Tilman Hausherr Resolution: Fixed > Update to 2.9.2 > --- > >

[jira] [Created] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4238: - Summary: replace some deprecated code Key: TIKA-4238 URL: https://issues.apache.org/jira/browse/TIKA-4238 Project: Tika Issue Type: Task Affects

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: 2.9.3 > tika-parser-nlp-module has an unnecessary Guava dependency >

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: (was: 2.9.2) > tika-parser-nlp-module has an unnecessary Guava dependency >

[jira] [Resolved] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4236. --- Assignee: Tilman Hausherr Resolution: Fixed > tika-parser-nlp-module has an unnecessary

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: 2.9.2 3.0.0 > tika-parser-nlp-module has an unnecessary Guava

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834385#comment-17834385 ] Tilman Hausherr commented on TIKA-4236: --- I found only a test dependency mentioned directly. It's

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834282#comment-17834282 ] Tilman Hausherr commented on TIKA-4236: --- https://tika.apache.org/ "The Apache Tika PMC has set

[jira] [Comment Edited] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834277#comment-17834277 ] Tilman Hausherr edited comment on TIKA-4236 at 4/5/24 12:21 PM: Is this

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834277#comment-17834277 ] Tilman Hausherr commented on TIKA-4236: --- Is this what you had in mind?

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-04-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833807#comment-17833807 ] Tilman Hausherr commented on TIKA-4231: --- Yes it is text, but the PDF is using a feature that we

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-04-02 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833385#comment-17833385 ] Tilman Hausherr commented on TIKA-4231: --- No this is not being worked on. You'll have to use OCR. >

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832291#comment-17832291 ] Tilman Hausherr commented on TIKA-4231: --- I have attached an extraction with pdfbox 2.0.31:

[jira] [Updated] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4231: -- Attachment: arabic-pdfbox.txt > Parsing Arabic PDF is returning bad data >

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832284#comment-17832284 ] Tilman Hausherr commented on TIKA-4231: --- This doesn't change my argument. The latest version is

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832258#comment-17832258 ] Tilman Hausherr commented on TIKA-4231: --- The current tika version is 2.9.1, soon to be 2.9.2. There

[jira] [Updated] (TIKA-4228) Tika parser crashes JVM when it gets metadata and embedded objects from pdf

2024-03-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4228: -- Affects Version/s: 2.9.0 > Tika parser crashes JVM when it gets metadata and embedded objects

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830954#comment-17830954 ] Tilman Hausherr commented on TIKA-4218: --- 6FOMNUPGPA6IG66Z4NIUEQIVOR5ON46Q (an MP4 file) has a loss

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830604#comment-17830604 ] Tilman Hausherr commented on TIKA-4218: --- To be honest I didn't look further, because these problems

[jira] [Comment Edited] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830110#comment-17830110 ] Tilman Hausherr edited comment on TIKA-4171 at 3/23/24 5:50 PM: We have a

[jira] [Updated] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4171: -- Attachment: testPDF_XFA_govdocs1_258578.pdf.html > Tika server only returns last value for PDFs

[jira] [Commented] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830113#comment-17830113 ] Tilman Hausherr commented on TIKA-4171: --- Proposed change: add these 3 lines before the last one in

[jira] [Commented] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830110#comment-17830110 ] Tilman Hausherr commented on TIKA-4171: --- We have a regression with the file [^876503.pdf] in the

[jira] [Updated] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4171: -- Attachment: 876503.pdf > Tika server only returns last value for PDFs that have multiple of the

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830105#comment-17830105 ] Tilman Hausherr commented on TIKA-4218: --- Follow up in TIKA-4171 > Run regression tests to support

[jira] [Reopened] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened TIKA-4171: --- > Tika server only returns last value for PDFs that have multiple of the same > key >

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830097#comment-17830097 ] Tilman Hausherr commented on TIKA-4218: --- Confirmed, I reverted just that change and then the text

[jira] [Comment Edited] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830094#comment-17830094 ] Tilman Hausherr edited comment on TIKA-4218 at 3/23/24 3:59 PM: Oops, or

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830094#comment-17830094 ] Tilman Hausherr commented on TIKA-4218: --- Oops, or it's part of XFA, I just found it too. > Run

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830093#comment-17830093 ] Tilman Hausherr commented on TIKA-4218: --- I found one difference: "Enter the full name of the

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830079#comment-17830079 ] Tilman Hausherr commented on TIKA-4218: --- The word "party" appears 36 times in the json file, 18

[jira] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218 ] Tilman Hausherr deleted comment on TIKA-4218: --- was (Author: tilman): There are also improvements not in my own test results, e.g. the "FOP" pdf file. Either something went wrong with my

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830071#comment-17830071 ] Tilman Hausherr commented on TIKA-4218: --- There are also improvements not in my own test results,

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830069#comment-17830069 ] Tilman Hausherr commented on TIKA-4218: --- Weird indeed, 876503.pdf didn't appear in the PDFBox

[jira] [Updated] (TIKA-4206) Variation on Zip Bomb

2024-03-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4206: -- Description: I see TIKA-216 which aims to prevent Zip bombs, but I'm seeing what looks like a

[jira] [Closed] (TIKA-4214) Update apache compress in tika to 1.26+ for CVE-2024-26308.

2024-03-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4214. - Resolution: Duplicate Duplicate of TIKA-4199. > Update apache compress in tika to 1.26+ for

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826996#comment-17826996 ] Tilman Hausherr commented on TIKA-4199: --- The original error you reported wasn't really a bug in

[jira] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166 ] Tilman Hausherr deleted comment on TIKA-4166: --- was (Author: tilman): I've reverted it and will investigate / fix this later. Seems to be a problem with angus-activation. > dependency

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824953#comment-17824953 ] Tilman Hausherr commented on TIKA-4166: --- I've reverted it and will investigate / fix this later.

[jira] [Resolved] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4199. --- Resolution: Fixed Commons-Compress has been updated to 1.26.1, I have reverted the workaround

[jira] [Assigned] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned TIKA-4199: - Assignee: Tilman Hausherr > commons-compress 1.26.0 breaks Apache Tika 2.9.1 >

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Fix Version/s: 3.0.0 > Add @deprecated annotation where needed >

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Affects Version/s: 3.0.0 > Add @deprecated annotation where needed >

[jira] [Resolved] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4203. --- Resolution: Fixed > Add @deprecated annotation where needed >

[jira] [Created] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4203: - Summary: Add @deprecated annotation where needed Key: TIKA-4203 URL: https://issues.apache.org/jira/browse/TIKA-4203 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4199: -- Fix Version/s: 2.9.2 3.0.0 > commons-compress 1.26.0 breaks Apache Tika

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818937#comment-17818937 ] Tilman Hausherr commented on TIKA-4199: --- I tried an another solution {code:java} if

[jira] [Commented] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818873#comment-17818873 ] Tilman Hausherr commented on TIKA-4201: --- Yeah, makes sense. > Add hard limit to stream reading in

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 3:37 PM: {quote}I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr commented on TIKA-4199: --- {quote}I'm not declaring this a problem with

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818823#comment-17818823 ] Tilman Hausherr commented on TIKA-4199: --- After merging I discovered that the SevenZWrapper class is

[jira] [Closed] (TIKA-4200) Fix broken build after upgrade to commons-compress

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4200. - Resolution: Duplicate Our CI is failing because of the CVE :-( Duplicate of TIKA-4199. I'm still

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 11:57 AM: - I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr commented on TIKA-4199: --- I'm working on it

[jira] [Updated] (TIKA-3841) An exception occurred when parsing some word documents using tika, tika_exception

2024-02-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3841: -- Summary: An exception occurred when parsing some word documents using tika, tika_exception

[jira] [Updated] (TIKA-3841) An exception occurred when parsing some word documents using tikatika_exception

2024-02-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3841: -- Summary: An exception occurred when parsing some word documents using tikatika_exception (was:

[jira] [Closed] (TIKA-4183) Update jackson-databind jar to 2.16.0 or higher (CVE-2023-35116)

2024-01-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4183. - Resolution: Duplicate duplicate of TIKA-4162, it was done there on 17.11.2023 in

[jira] [Updated] (TIKA-4162) Update to 2.9.2

2024-01-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4162: -- Fix Version/s: 2.9.2 > Update to 2.9.2 > --- > > Key: TIKA-4162 >

[jira] [Updated] (TIKA-4162) Update to 2.9.2

2023-12-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4162: -- Affects Version/s: 2.9.1 > Update to 2.9.2 > --- > > Key: TIKA-4162

[jira] [Closed] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-12-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4172. - Resolution: Not A Bug > Apple binary file incorrectly identified as text/x-sql due to filename >

[jira] [Commented] (TIKA-4173) Fix dev version in main branch

2023-12-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796450#comment-17796450 ] Tilman Hausherr commented on TIKA-4173: --- It wasn't really a problem locally, I only had to change

[jira] [Commented] (TIKA-4173) Fix dev version in main branch

2023-12-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796431#comment-17796431 ] Tilman Hausherr commented on TIKA-4173: --- I noticed that it didn't have the correct version, but I

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789647#comment-17789647 ] Tilman Hausherr commented on TIKA-4172: --- Your file starts with 00 14 64 30.

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789542#comment-17789542 ] Tilman Hausherr commented on TIKA-4172: --- application/octet-stream is defined as the default by the

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789318#comment-17789318 ] Tilman Hausherr commented on TIKA-4172: --- https://tika.apache.org/2.1.0/detection.html "Where the

[jira] [Comment Edited] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788982#comment-17788982 ] Tilman Hausherr edited comment on TIKA-4172 at 11/23/23 5:05 AM: - Which

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788982#comment-17788982 ] Tilman Hausherr commented on TIKA-4172: --- Which tika call are you using? Have you tried detecting

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2023-11-04 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782915#comment-17782915 ] Tilman Hausherr commented on TIKA-4166: --- The zookeeper update worked locally, but not on the CI :-(

[jira] [Created] (TIKA-4166) dependency updates for Tika 3.0

2023-11-03 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4166: - Summary: dependency updates for Tika 3.0 Key: TIKA-4166 URL: https://issues.apache.org/jira/browse/TIKA-4166 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4162) Update to 2.9.2

2023-10-21 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4162: - Summary: Update to 2.9.2 Key: TIKA-4162 URL: https://issues.apache.org/jira/browse/TIKA-4162 Project: Tika Issue Type: Task Components: build

[jira] [Commented] (TIKA-4135) Remove xerces from Tika 3.x/main branch?

2023-09-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770571#comment-17770571 ] Tilman Hausherr commented on TIKA-4135: --- Yes, but how to make sure it happens only in the test? I

[jira] [Commented] (TIKA-4135) Remove xerces from Tika 3.x/main branch?

2023-09-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770559#comment-17770559 ] Tilman Hausherr commented on TIKA-4135: --- There must be some way to run THIS test in a US locale, but

[jira] [Commented] (TIKA-4135) Remove xerces from Tika 3.x/main branch?

2023-09-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770550#comment-17770550 ] Tilman Hausherr commented on TIKA-4135: --- The build fails in Germany: Running

[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2023-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768361#comment-17768361 ] Tilman Hausherr commented on TIKA-4137: --- I've modified the jdk18 build on the ci to a jdk21 build

[jira] [Comment Edited] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2023-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768361#comment-17768361 ] Tilman Hausherr edited comment on TIKA-4137 at 9/24/23 9:05 AM: I've

[jira] [Closed] (TIKA-4136) Upgrade Commons compress to 1.24.x

2023-09-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4136. - Fix Version/s: (was: 2.9.1) Resolution: Duplicate Thanks, but this was done in

[jira] [Commented] (TIKA-4123) Update to 2.9.1

2023-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765755#comment-17765755 ] Tilman Hausherr commented on TIKA-4123: --- Yes that's fine. > Update to 2.9.1 > --- > >

[jira] [Comment Edited] (TIKA-4123) Update to 2.9.1

2023-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765743#comment-17765743 ] Tilman Hausherr edited comment on TIKA-4123 at 9/15/23 5:48 PM: Yes... I

[jira] [Commented] (TIKA-4123) Update to 2.9.1

2023-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765743#comment-17765743 ] Tilman Hausherr commented on TIKA-4123: --- Yes... I just set up a clone and had a look and it seems to

[jira] [Commented] (TIKA-3347) Upgrade to PDFBox 3.x when available

2023-09-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764461#comment-17764461 ] Tilman Hausherr commented on TIKA-3347: --- I ran PDFBox extractText on the file... the extractions are

[jira] [Commented] (TIKA-3347) Upgrade to PDFBox 3.x when available

2023-09-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764275#comment-17764275 ] Tilman Hausherr commented on TIKA-3347: --- 3 files: bug_trackers/poppler/poppler-58785-0.zip-7.pdf

[jira] [Updated] (TIKA-4123) Update to 2.9.1

2023-09-02 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4123: -- Component/s: build > Update to 2.9.1 > --- > > Key: TIKA-4123 >

[jira] [Created] (TIKA-4123) Update to 2.9.1

2023-09-02 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4123: - Summary: Update to 2.9.1 Key: TIKA-4123 URL: https://issues.apache.org/jira/browse/TIKA-4123 Project: Tika Issue Type: Task Reporter: Tilman

[jira] [Updated] (TIKA-4123) Update to 2.9.1

2023-09-02 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4123: -- Affects Version/s: 2.9.0 > Update to 2.9.1 > --- > > Key: TIKA-4123

[jira] [Closed] (TIKA-1203) Some metadata not extracted from PDF files when NonSequentialPDFParser is used

2023-08-30 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-1203. - Resolution: Not A Problem This issue is moot because the sequential parser no longer exists.

[jira] [Commented] (TIKA-3347) Upgrade to PDFBox 3.x when available

2023-08-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760037#comment-17760037 ] Tilman Hausherr commented on TIKA-3347: --- The IllegalArgumentException was fixed in PDFBOX-5652. So

[jira] [Commented] (TIKA-3347) Upgrade to PDFBox 3.x when available

2023-08-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760014#comment-17760014 ] Tilman Hausherr commented on TIKA-3347: --- I did:

[jira] [Updated] (TIKA-4114) Facilitate migration to PDFBox 3.0

2023-08-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4114: -- Parent: TIKA-3347 Issue Type: Sub-task (was: Task) > Facilitate migration to PDFBox

[jira] [Resolved] (TIKA-4114) Facilitate migration to PDFBox 3.0

2023-08-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4114. --- Assignee: Tilman Hausherr Resolution: Fixed > Facilitate migration to PDFBox 3.0 >

[jira] [Updated] (TIKA-4114) Facilitate migration to PDFBox 3.0

2023-08-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4114: -- Fix Version/s: 2.8.1 > Facilitate migration to PDFBox 3.0 > --

[jira] [Updated] (TIKA-4114) Facilitate migration to PDFBox 3.0

2023-08-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4114: -- Affects Version/s: 2.8.0 > Facilitate migration to PDFBox 3.0 >

[jira] [Resolved] (TIKA-3314) Treat soft hyphens like hyphens

2023-08-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3314. --- Resolution: Fixed > Treat soft hyphens like hyphens > --- > >

[jira] [Created] (TIKA-4114) Facilitate migration to PDFBox 3.0

2023-08-12 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4114: - Summary: Facilitate migration to PDFBox 3.0 Key: TIKA-4114 URL: https://issues.apache.org/jira/browse/TIKA-4114 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4105) Add autofilter tika-eval reports

2023-07-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745686#comment-17745686 ] Tilman Hausherr commented on TIKA-4105: --- All columns of the first row. What you get by clicking on

[jira] [Comment Edited] (TIKA-4105) Add autofilter to xls and xlsx extraction

2023-07-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745676#comment-17745676 ] Tilman Hausherr edited comment on TIKA-4105 at 7/21/23 5:23 PM: Just in

<    1   2   3   4   5   6   7   8   >