[jira] [Updated] (TIKA-4267) Not getting correct mime type for a few file extensions. example: csv

2024-06-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4267: -- Summary: Not getting correct mime type for a few file extensions. example: csv

[jira] [Updated] (TIKA-4267) Not getting correct mimet type for few file extensions. example :csv

2024-06-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4267: -- Affects Version/s: 1.28.4 > Not getting correct mimet type for few file extensions. exam

[jira] [Comment Edited] (TIKA-4267) Not getting correct mimet type for few file extensions. example :csv

2024-06-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851598#comment-17851598 ] Tilman Hausherr edited comment on TIKA-4267 at 6/3/24 12:06 PM

[jira] [Comment Edited] (TIKA-4267) Not getting correct mime type for a few file extensions. example: csv

2024-06-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851598#comment-17851598 ] Tilman Hausherr edited comment on TIKA-4267 at 6/3/24 12:07 PM

[jira] [Commented] (TIKA-4267) Not getting correct mimet type for few file extensions. example :csv

2024-06-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851598#comment-17851598 ] Tilman Hausherr commented on TIKA-4267: --- The current version is 2.9.2, please retry with that one

[jira] [Updated] (TIKA-1907) Big Pdf parsing to text - Out of memory

2024-05-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1907: -- Fix Version/s: 3.0.0 > Big Pdf parsing to text - Out of mem

[jira] [Comment Edited] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845590#comment-17845590 ] Tilman Hausherr edited comment on TIKA-4254 at 5/12/24 9:40 AM: THausherr

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845566#comment-17845566 ] Tilman Hausherr commented on TIKA-4254: --- Why would we ever run the test twice in the same

Re: Bump dependabot to weekly?

2024-04-29 Thread Tilman Hausherr
, Apr 29, 2024 at 10:47 AM Tilman Hausherr wrote: The positive side is that it's less interruptions. One negative side is that there seems to be a maximum. Today it didn't report the AWS update, which was detected in the past. Tilman

Re: Bump dependabot to weekly?

2024-04-29 Thread Tilman Hausherr
changing quickly, then that might be an argument for daily. On Apr 10, 2024, at 12:53 PM, Tilman Hausherr wrote: I'm fine with daily because this way we can learn ASAP if there are troubles with new dependency versions, although I'm now too busy. Tilman -- Original-Nachricht -- Von: Tim Allison

[jira] [Commented] (TIKA-4245) Tika does not get html content properly

2024-04-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840922#comment-17840922 ] Tilman Hausherr commented on TIKA-4245: --- The file claims to be utf-16 but it isn't. If I change

[jira] [Commented] (TIKA-4245) Tika does not get html content properly

2024-04-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840908#comment-17840908 ] Tilman Hausherr commented on TIKA-4245: --- Happens also with the tika app GUI. > Tika does not

[jira] [Updated] (TIKA-4245) Tika does not get html content properly

2024-04-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4245: -- Description: We use org.apache.tika.parser.AutoDetectParser to get the content of html files

[jira] [Comment Edited] (TIKA-4166) dependency updates for Tika 3.0

2024-04-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839745#comment-17839745 ] Tilman Hausherr edited comment on TIKA-4166 at 4/22/24 3:27 PM: It turned

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-04-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839745#comment-17839745 ] Tilman Hausherr commented on TIKA-4166: --- It turned out to be something different than the missing

Re: How to proceed when you are getting OSS index errors?

2024-04-22 Thread Tilman Hausherr
Hi, We look what the CVE is about. Some CVEs are irrelevant (see recent rant from Tim) and we can add an exclusion in the OSS section. Sometimes all what is needed is to update a dependency or add it in the management section or exclude it (in the assumptions that the tests cover everything).

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-04-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839652#comment-17839652 ] Tilman Hausherr commented on TIKA-4166: --- The latest Apache parent update means a javadoc update

[jira] [Commented] (TIKA-4240) Change dependabot to weekly

2024-04-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836236#comment-17836236 ] Tilman Hausherr commented on TIKA-4240: --- I prefer daily but if more people feel pressured or annoyed

[jira] [Updated] (TIKA-4240) Change dependabot to weekly

2024-04-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4240: -- Component/s: build > Change dependabot to wee

[jira] [Commented] (TIKA-4240) Change dependabot to weekly

2024-04-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836224#comment-17836224 ] Tilman Hausherr commented on TIKA-4240: --- Not a burden (that was Eric, sort-of), I just don't have

AW: Bump dependabot to weekly?

2024-04-10 Thread Tilman Hausherr
I'm fine with daily because this way we can learn ASAP if there are troubles with new dependency versions, although I'm now too busy. Tilman -- Original-Nachricht -- Von: Tim Allison Betreff: Bump dependabot to weekly? Datum: 10.04.2024, 18:08 Uhr An: All, Tilman has been doing heroic

[jira] [Commented] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834529#comment-17834529 ] Tilman Hausherr commented on TIKA-4238: --- This was a low-hanging fruit. I could also have done

[jira] [Comment Edited] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834529#comment-17834529 ] Tilman Hausherr edited comment on TIKA-4238 at 4/6/24 2:12 PM

[jira] [Updated] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4218: -- Affects Version/s: 2.9.1 > Run regression tests to support 2.9.2 rele

[jira] [Resolved] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4218. --- Assignee: Tim Allison Resolution: Fixed > Run regression tests to support 2.9.2 rele

[jira] [Assigned] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned TIKA-4171: - Assignee: Tim Allison > Tika server only returns last value for PDFs that have multi

[jira] [Updated] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4218: -- Fix Version/s: 2.9.2 > Run regression tests to support 2.9.2 rele

[jira] [Resolved] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4171. --- Resolution: Fixed > Tika server only returns last value for PDFs that have multi

[jira] [Resolved] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4238. --- Resolution: Fixed > replace some deprecated c

[jira] [Created] (TIKA-4239) Update to 2.9.3

2024-04-06 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4239: - Summary: Update to 2.9.3 Key: TIKA-4239 URL: https://issues.apache.org/jira/browse/TIKA-4239 Project: Tika Issue Type: Task Components: build

[jira] [Updated] (TIKA-4239) Update to 2.9.3

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4239: -- Affects Version/s: 2.9.2 > Update to 2.9.3 > --- > > Ke

[jira] [Resolved] (TIKA-4162) Update to 2.9.2

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4162. --- Assignee: Tilman Hausherr Resolution: Fixed > Update to 2.

[jira] [Created] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4238: - Summary: replace some deprecated code Key: TIKA-4238 URL: https://issues.apache.org/jira/browse/TIKA-4238 Project: Tika Issue Type: Task Affects

2.9.2 / 2.9.3 admin

2024-04-05 Thread Tilman Hausherr
I've created 2.9.3 version in JIRA administration. Someone (Tim?) please set the 2.9.2 version to released or whatever (I didn't want to touch that part) Tilman

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: 2.9.3 > tika-parser-nlp-module has an unnecessary Guava depende

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: (was: 2.9.2) > tika-parser-nlp-module has an unnecessary Guava depende

[jira] [Resolved] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4236. --- Assignee: Tilman Hausherr Resolution: Fixed > tika-parser-nlp-module has an unnecess

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: 2.9.2 3.0.0 > tika-parser-nlp-module has an unnecessary Gu

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834385#comment-17834385 ] Tilman Hausherr commented on TIKA-4236: --- I found only a test dependency mentioned directly. It's

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834282#comment-17834282 ] Tilman Hausherr commented on TIKA-4236: --- https://tika.apache.org/ "The Apache Tika PMC ha

[jira] [Comment Edited] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834277#comment-17834277 ] Tilman Hausherr edited comment on TIKA-4236 at 4/5/24 12:21 PM

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834277#comment-17834277 ] Tilman Hausherr commented on TIKA-4236: --- Is this what you had in mind? https://github.com/apache

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-04-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833807#comment-17833807 ] Tilman Hausherr commented on TIKA-4231: --- Yes it is text, but the PDF is using a feature that we

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-04-02 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833385#comment-17833385 ] Tilman Hausherr commented on TIKA-4231: --- No this is not being worked on. You'll have to use OCR

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832291#comment-17832291 ] Tilman Hausherr commented on TIKA-4231: --- I have attached an extraction with pdfbox 2.0.31: [^arabic

[jira] [Updated] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4231: -- Attachment: arabic-pdfbox.txt > Parsing Arabic PDF is returning bad d

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832284#comment-17832284 ] Tilman Hausherr commented on TIKA-4231: --- This doesn't change my argument. The latest version

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832258#comment-17832258 ] Tilman Hausherr commented on TIKA-4231: --- The current tika version is 2.9.1, soon to be 2.9.2

[jira] [Updated] (TIKA-4228) Tika parser crashes JVM when it gets metadata and embedded objects from pdf

2024-03-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4228: -- Affects Version/s: 2.9.0 > Tika parser crashes JVM when it gets metadata and embedded obje

Re: [VOTE] Release Apache Tika 2.9.2 Candidate #2

2024-03-26 Thread Tilman Hausherr
+1 successful build on Windows 10, oracle jdk 1.8.0_391 Tilman On 26.03.2024 16:52, Tim Allison wrote: A candidate for the Tika 2.9.2 release is available at: https://dist.apache.org/repos/dist/dev/tika/2.9.2 The release candidate is a zip archive of the sources in:

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830954#comment-17830954 ] Tilman Hausherr commented on TIKA-4218: --- 6FOMNUPGPA6IG66Z4NIUEQIVOR5ON46Q (an MP4 file) has a loss

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830604#comment-17830604 ] Tilman Hausherr commented on TIKA-4218: --- To be honest I didn't look further, because these problems

[jira] [Comment Edited] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830110#comment-17830110 ] Tilman Hausherr edited comment on TIKA-4171 at 3/23/24 5:50 PM: We have

[jira] [Updated] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4171: -- Attachment: testPDF_XFA_govdocs1_258578.pdf.html > Tika server only returns last value for P

[jira] [Commented] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830113#comment-17830113 ] Tilman Hausherr commented on TIKA-4171: --- Proposed change: add these 3 lines before the last one

[jira] [Commented] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830110#comment-17830110 ] Tilman Hausherr commented on TIKA-4171: --- We have a regression with the file [^876503.pdf

[jira] [Updated] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4171: -- Attachment: 876503.pdf > Tika server only returns last value for PDFs that have multi

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830105#comment-17830105 ] Tilman Hausherr commented on TIKA-4218: --- Follow up in TIKA-4171 > Run regression tests to supp

[jira] [Reopened] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened TIKA-4171: --- > Tika server only returns last value for PDFs that have multiple of the same &g

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830097#comment-17830097 ] Tilman Hausherr commented on TIKA-4218: --- Confirmed, I reverted just that change and then the text

[jira] [Comment Edited] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830094#comment-17830094 ] Tilman Hausherr edited comment on TIKA-4218 at 3/23/24 3:59 PM: Oops

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830094#comment-17830094 ] Tilman Hausherr commented on TIKA-4218: --- Oops, or it's part of XFA, I just found it too. >

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830093#comment-17830093 ] Tilman Hausherr commented on TIKA-4218: --- I found one difference: "Enter the full

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830079#comment-17830079 ] Tilman Hausherr commented on TIKA-4218: --- The word "party" appears 36 times in the jso

[jira] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218 ] Tilman Hausherr deleted comment on TIKA-4218: --- was (Author: tilman): There are also improvements not in my own test results, e.g. the "FOP" pdf file. Either something went wro

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830071#comment-17830071 ] Tilman Hausherr commented on TIKA-4218: --- There are also improvements not in my own test results, e.g

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830069#comment-17830069 ] Tilman Hausherr commented on TIKA-4218: --- Weird indeed, 876503.pdf didn't appear in the PDFBox

[jira] [Updated] (TIKA-4206) Variation on Zip Bomb

2024-03-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4206: -- Description: I see TIKA-216 which aims to prevent Zip bombs, but I'm seeing what looks like

[jira] [Closed] (TIKA-4214) Update apache compress in tika to 1.26+ for CVE-2024-26308.

2024-03-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4214. - Resolution: Duplicate Duplicate of TIKA-4199. > Update apache compress in tika to 1.26+ for

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826996#comment-17826996 ] Tilman Hausherr commented on TIKA-4199: --- The original error you reported wasn't really a bug

[jira] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166 ] Tilman Hausherr deleted comment on TIKA-4166: --- was (Author: tilman): I've reverted it and will investigate / fix this later. Seems to be a problem with angus-activation. > depende

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824953#comment-17824953 ] Tilman Hausherr commented on TIKA-4166: --- I've reverted it and will investigate / fix this later

[jira] [Resolved] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4199. --- Resolution: Fixed Commons-Compress has been updated to 1.26.1, I have reverted the workaround

[jira] [Assigned] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned TIKA-4199: - Assignee: Tilman Hausherr > commons-compress 1.26.0 breaks Apache Tika 2.

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Fix Version/s: 3.0.0 > Add @deprecated annotation where nee

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Affects Version/s: 3.0.0 > Add @deprecated annotation where nee

[jira] [Resolved] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4203. --- Resolution: Fixed > Add @deprecated annotation where nee

[jira] [Created] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4203: - Summary: Add @deprecated annotation where needed Key: TIKA-4203 URL: https://issues.apache.org/jira/browse/TIKA-4203 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4199: -- Fix Version/s: 2.9.2 3.0.0 > commons-compress 1.26.0 breaks Apache T

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818937#comment-17818937 ] Tilman Hausherr commented on TIKA-4199: --- I tried an another solution {code:java

[jira] [Commented] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818873#comment-17818873 ] Tilman Hausherr commented on TIKA-4201: --- Yeah, makes sense. > Add hard limit to stream read

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 3:37 PM: {quote}I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr commented on TIKA-4199: --- {quote}I'm not declaring this a problem with commons

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818823#comment-17818823 ] Tilman Hausherr commented on TIKA-4199: --- After merging I discovered that the SevenZWrapper class

[jira] [Closed] (TIKA-4200) Fix broken build after upgrade to commons-compress

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4200. - Resolution: Duplicate Our CI is failing because of the CVE :-( Duplicate of TIKA-4199. I'm still

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 11:57 AM: - I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr commented on TIKA-4199: --- I'm working on it https://github.com/apache/pdfbox/pull

[jira] [Updated] (TIKA-3841) An exception occurred when parsing some word documents using tika, tika_exception

2024-02-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3841: -- Summary: An exception occurred when parsing some word documents using tika, tika_exception

[jira] [Updated] (TIKA-3841) An exception occurred when parsing some word documents using tikatika_exception

2024-02-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3841: -- Summary: An exception occurred when parsing some word documents using tikatika_exception

[jira] [Closed] (TIKA-4183) Update jackson-databind jar to 2.16.0 or higher (CVE-2023-35116)

2024-01-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4183. - Resolution: Duplicate duplicate of TIKA-4162, it was done there on 17.11.2023

[jira] [Updated] (TIKA-4162) Update to 2.9.2

2024-01-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4162: -- Fix Version/s: 2.9.2 > Update to 2.9.2 > --- > > Ke

Re: Personal feedback on your last VOTE thread 3.0.0-BETA

2024-01-16 Thread Tilman Hausherr
mention that there are already 2 votes and you're still missing one. However ponymail is not showing me the vote of Tilman Hausherr. Do you know what happened there? Chris PS: I'm not subscribed to this list, so please keep me in CC

[jira] [Updated] (TIKA-4162) Update to 2.9.2

2023-12-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4162: -- Affects Version/s: 2.9.1 > Update to 2.9.2 > --- > > Ke

[jira] [Closed] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-12-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4172. - Resolution: Not A Bug > Apple binary file incorrectly identified as text/x-sql due to filen

[jira] [Commented] (TIKA-4173) Fix dev version in main branch

2023-12-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796450#comment-17796450 ] Tilman Hausherr commented on TIKA-4173: --- It wasn't really a problem locally, I only had to change

[jira] [Commented] (TIKA-4173) Fix dev version in main branch

2023-12-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796431#comment-17796431 ] Tilman Hausherr commented on TIKA-4173: --- I noticed that it didn't have the correct version, but I

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789647#comment-17789647 ] Tilman Hausherr commented on TIKA-4172: --- Your file starts with 00 14 64 30

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789542#comment-17789542 ] Tilman Hausherr commented on TIKA-4172: --- application/octet-stream is defined as the default

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789318#comment-17789318 ] Tilman Hausherr commented on TIKA-4172: --- https://tika.apache.org/2.1.0/detection.html "

[jira] [Comment Edited] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788982#comment-17788982 ] Tilman Hausherr edited comment on TIKA-4172 at 11/23/23 5:05 AM: - Which

  1   2   3   4   5   6   7   8   9   >