[GitHub] [tika] THausherr merged pull request #699: Bump aws.version from 1.12.303 to 1.12.304

2022-09-15 Thread GitBox
THausherr merged PR #699: URL: https://github.com/apache/tika/pull/699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #698: Bump jetty.version from 9.4.48.v20220622 to 9.4.49.v20220914

2022-09-15 Thread GitBox
THausherr merged PR #698: URL: https://github.com/apache/tika/pull/698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #700: Bump spring-context from 5.3.22 to 5.3.23

2022-09-15 Thread GitBox
THausherr merged PR #700: URL: https://github.com/apache/tika/pull/700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #701: Bump google-cloud-storage from 2.11.3 to 2.12.0

2022-09-15 Thread GitBox
THausherr merged PR #701: URL: https://github.com/apache/tika/pull/701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] dependabot[bot] opened a new pull request, #701: Bump google-cloud-storage from 2.11.3 to 2.12.0

2022-09-15 Thread GitBox
dependabot[bot] opened a new pull request, #701: URL: https://github.com/apache/tika/pull/701 Bumps [google-cloud-storage](https://github.com/googleapis/java-storage) from 2.11.3 to 2.12.0. Release notes Sourced from https://github.com/googleapis/java-storage/releases";>google-clou

[GitHub] [tika] dependabot[bot] opened a new pull request, #700: Bump spring-context from 5.3.22 to 5.3.23

2022-09-15 Thread GitBox
dependabot[bot] opened a new pull request, #700: URL: https://github.com/apache/tika/pull/700 Bumps [spring-context](https://github.com/spring-projects/spring-framework) from 5.3.22 to 5.3.23. Release notes Sourced from https://github.com/spring-projects/spring-framework/releases";

[GitHub] [tika] dependabot[bot] opened a new pull request, #699: Bump aws.version from 1.12.303 to 1.12.304

2022-09-15 Thread GitBox
dependabot[bot] opened a new pull request, #699: URL: https://github.com/apache/tika/pull/699 Bumps `aws.version` from 1.12.303 to 1.12.304. Updates `aws-java-sdk-transcribe` from 1.12.303 to 1.12.304 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELO

[GitHub] [tika] dependabot[bot] opened a new pull request, #698: Bump jetty.version from 9.4.48.v20220622 to 9.4.49.v20220914

2022-09-15 Thread GitBox
dependabot[bot] opened a new pull request, #698: URL: https://github.com/apache/tika/pull/698 Bumps `jetty.version` from 9.4.48.v20220622 to 9.4.49.v20220914. Updates `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914 Release notes Sourced from https://github.com/eclipse/jett

[jira] [Closed] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-3858. - Resolution: Duplicate > Ligatures convert on text extraction > --

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605596#comment-17605596 ] Tilman Hausherr commented on TIKA-3858: --- No, except OCR. There will always be files

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605580#comment-17605580 ] tom hill commented on TIKA-3858: Ok, thanks. Is there anything I can do as a Tika user to

[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858 ] Tilman Hausherr deleted comment on TIKA-3858: --- was (Author: tilman): Please attach the problematic file, and compare to what you get with Adobe Reader. > Ligatures convert on text extracti

[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858 ] Tilman Hausherr deleted comment on TIKA-3858: --- was (Author: JIRAUSER295805): Apologies, I was still editing the cloned issue. You are responding to the old text. I will update.     > Lig

[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858 ] Tilman Hausherr deleted comment on TIKA-3858: --- was (Author: JIRAUSER295805): Ok, the description has been updated.  > Ligatures convert on text extraction >

[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858 ] Tilman Hausherr deleted comment on TIKA-3858: --- was (Author: tilman): The current PDFBox version (2.0.26) doesn't use it. It's used in PDFBox 1.8.17 which has many drawbacks. The latest tika

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3858: -- Labels: ActualText (was: ) > Ligatures convert on text extraction > --

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605576#comment-17605576 ] Tilman Hausherr commented on TIKA-3858: --- The font has an incorrect /ToUnicode stream

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605518#comment-17605518 ] tom hill commented on TIKA-3858: When I open TikaChromeInboxLigature.pdf in Adobe reader,

[jira] [Commented] (TIKA-3856) Upgrade to jempbox 1.8.17

2022-09-15 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605517#comment-17605517 ] Hudson commented on TIKA-3856: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #8

[jira] [Comment Edited] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605512#comment-17605512 ] tom hill edited comment on TIKA-3858 at 9/15/22 8:14 PM: - For the

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605512#comment-17605512 ] tom hill commented on TIKA-3858: For the attachment TikaChromeInboxLigature.pdf   % java

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom hill updated TIKA-3858: --- Attachment: TikaChromeInboxLigature.pdf > Ligatures convert on text extraction >

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3858: -- Affects Version/s: 2.4.1 (was: 1.5) > Ligatures convert on text extraction >

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605486#comment-17605486 ] Tilman Hausherr commented on TIKA-3858: --- Please attach the problematic file, and com

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605485#comment-17605485 ] tom hill commented on TIKA-3858: Ok, the description has been updated.  > Ligatures conv

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom hill updated TIKA-3858: --- Description: It appears that the issue in TIKA-1289 is still present. Ligatures get replaced by a question ma

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom hill updated TIKA-3858: --- Description: It appears that the issue in TIKA-1289 is still present. Ligatures get replaced by a question ma

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom hill updated TIKA-3858: --- Description: It appears that the issue in TIKA-1289 is still present. Ligatures get replaced by a question ma

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605482#comment-17605482 ] tom hill commented on TIKA-3858: Apologies, I was still editing the cloned issue. You are

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom hill updated TIKA-3858: --- Description: It appears that the issue in TIKA-1289 is still present. Ligatures get replaced by a question mar

[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605481#comment-17605481 ] Tilman Hausherr commented on TIKA-3858: --- The current PDFBox version (2.0.26) doesn't

[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3858: -- Fix Version/s: (was: 1.7) > Ligatures convert on text extraction >

[jira] [Created] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
tom hill created TIKA-3858: -- Summary: Ligatures convert on text extraction Key: TIKA-3858 URL: https://issues.apache.org/jira/browse/TIKA-3858 Project: Tika Issue Type: Bug Components: pa

[jira] [Commented] (TIKA-3855) Implement upsert for OpenSearch emitter

2022-09-15 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605474#comment-17605474 ] Hudson commented on TIKA-3855: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #7

[jira] [Resolved] (TIKA-3856) Upgrade to jempbox 1.8.17

2022-09-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3856. --- Fix Version/s: 2.5.0 Resolution: Fixed > Upgrade to jempbox 1.8.17 > -

[jira] [Created] (TIKA-3857) Upgrade to POI 5.2.3

2022-09-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-3857: - Summary: Upgrade to POI 5.2.3 Key: TIKA-3857 URL: https://issues.apache.org/jira/browse/TIKA-3857 Project: Tika Issue Type: Task Reporter: Tim Allison

[jira] [Created] (TIKA-3856) Upgrade to jempbox 1.8.17

2022-09-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-3856: - Summary: Upgrade to jempbox 1.8.17 Key: TIKA-3856 URL: https://issues.apache.org/jira/browse/TIKA-3856 Project: Tika Issue Type: Task Reporter: Tim All

[jira] [Resolved] (TIKA-3855) Implement upsert for OpenSearch emitter

2022-09-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3855. --- Fix Version/s: 2.5.0 Resolution: Fixed > Implement upsert for OpenSearch emitter >

[jira] [Created] (TIKA-3855) Implement upsert for OpenSearch emitter

2022-09-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-3855: - Summary: Implement upsert for OpenSearch emitter Key: TIKA-3855 URL: https://issues.apache.org/jira/browse/TIKA-3855 Project: Tika Issue Type: Task Rep

RE: Issue related to file mime type detection

2022-09-15 Thread Nick Burch
On Thu, 15 Sep 2022, Sindhu Mahadevappa wrote: We have been looking for the latest Tika 2.4.1 jar file, looks like it is not available anywhere. You can get the Tika App and Tika Server jars for 2.4.1 from https://tika.apache.org/download.html For the core and parser jars, manually downloading

RE: Issue related to file mime type detection

2022-09-15 Thread Sindhu Mahadevappa
Hi Team, Thanks for the quick response. We have been looking for the latest Tika 2.4.1 jar file, looks like it is not available anywhere. Can you please share the link where we can get the latest 2.4.1 jar file, it will be very helpful. Thanks & Regards Sindhu Mahadevappa > -Original Mess