THausherr merged PR #699:
URL: https://github.com/apache/tika/pull/699
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org
THausherr merged PR #698:
URL: https://github.com/apache/tika/pull/698
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org
THausherr merged PR #700:
URL: https://github.com/apache/tika/pull/700
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org
THausherr merged PR #701:
URL: https://github.com/apache/tika/pull/701
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org
dependabot[bot] opened a new pull request, #701:
URL: https://github.com/apache/tika/pull/701
Bumps [google-cloud-storage](https://github.com/googleapis/java-storage)
from 2.11.3 to 2.12.0.
Release notes
Sourced from https://github.com/googleapis/java-storage/releases";>google-clou
dependabot[bot] opened a new pull request, #700:
URL: https://github.com/apache/tika/pull/700
Bumps [spring-context](https://github.com/spring-projects/spring-framework)
from 5.3.22 to 5.3.23.
Release notes
Sourced from https://github.com/spring-projects/spring-framework/releases";
dependabot[bot] opened a new pull request, #699:
URL: https://github.com/apache/tika/pull/699
Bumps `aws.version` from 1.12.303 to 1.12.304.
Updates `aws-java-sdk-transcribe` from 1.12.303 to 1.12.304
Changelog
Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELO
dependabot[bot] opened a new pull request, #698:
URL: https://github.com/apache/tika/pull/698
Bumps `jetty.version` from 9.4.48.v20220622 to 9.4.49.v20220914.
Updates `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914
Release notes
Sourced from https://github.com/eclipse/jett
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed TIKA-3858.
-
Resolution: Duplicate
> Ligatures convert on text extraction
> --
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605596#comment-17605596
]
Tilman Hausherr commented on TIKA-3858:
---
No, except OCR. There will always be files
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605580#comment-17605580
]
tom hill commented on TIKA-3858:
Ok, thanks.
Is there anything I can do as a Tika user to
[ https://issues.apache.org/jira/browse/TIKA-3858 ]
Tilman Hausherr deleted comment on TIKA-3858:
---
was (Author: tilman):
Please attach the problematic file, and compare to what you get with Adobe
Reader.
> Ligatures convert on text extracti
[ https://issues.apache.org/jira/browse/TIKA-3858 ]
Tilman Hausherr deleted comment on TIKA-3858:
---
was (Author: JIRAUSER295805):
Apologies, I was still editing the cloned issue. You are responding to the old
text. I will update.
> Lig
[ https://issues.apache.org/jira/browse/TIKA-3858 ]
Tilman Hausherr deleted comment on TIKA-3858:
---
was (Author: JIRAUSER295805):
Ok, the description has been updated.
> Ligatures convert on text extraction
>
[ https://issues.apache.org/jira/browse/TIKA-3858 ]
Tilman Hausherr deleted comment on TIKA-3858:
---
was (Author: tilman):
The current PDFBox version (2.0.26) doesn't use it. It's used in PDFBox 1.8.17
which has many drawbacks. The latest tika
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-3858:
--
Labels: ActualText (was: )
> Ligatures convert on text extraction
> --
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605576#comment-17605576
]
Tilman Hausherr commented on TIKA-3858:
---
The font has an incorrect /ToUnicode stream
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605518#comment-17605518
]
tom hill commented on TIKA-3858:
When I open TikaChromeInboxLigature.pdf in Adobe reader,
[
https://issues.apache.org/jira/browse/TIKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605517#comment-17605517
]
Hudson commented on TIKA-3856:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #8
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605512#comment-17605512
]
tom hill edited comment on TIKA-3858 at 9/15/22 8:14 PM:
-
For the
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605512#comment-17605512
]
tom hill commented on TIKA-3858:
For the attachment TikaChromeInboxLigature.pdf
% java
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tom hill updated TIKA-3858:
---
Attachment: TikaChromeInboxLigature.pdf
> Ligatures convert on text extraction
>
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3858:
--
Affects Version/s: 2.4.1
(was: 1.5)
> Ligatures convert on text extraction
>
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605486#comment-17605486
]
Tilman Hausherr commented on TIKA-3858:
---
Please attach the problematic file, and com
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605485#comment-17605485
]
tom hill commented on TIKA-3858:
Ok, the description has been updated.
> Ligatures conv
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tom hill updated TIKA-3858:
---
Description:
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced
by a question ma
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tom hill updated TIKA-3858:
---
Description:
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced
by a question ma
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tom hill updated TIKA-3858:
---
Description:
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced
by a question ma
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605482#comment-17605482
]
tom hill commented on TIKA-3858:
Apologies, I was still editing the cloned issue. You are
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tom hill updated TIKA-3858:
---
Description: It appears that the issue in TIKA-1289 is still present.
Ligatures get replaced by a question mar
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605481#comment-17605481
]
Tilman Hausherr commented on TIKA-3858:
---
The current PDFBox version (2.0.26) doesn't
[
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-3858:
--
Fix Version/s: (was: 1.7)
> Ligatures convert on text extraction
>
tom hill created TIKA-3858:
--
Summary: Ligatures convert on text extraction
Key: TIKA-3858
URL: https://issues.apache.org/jira/browse/TIKA-3858
Project: Tika
Issue Type: Bug
Components: pa
[
https://issues.apache.org/jira/browse/TIKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605474#comment-17605474
]
Hudson commented on TIKA-3855:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #7
[
https://issues.apache.org/jira/browse/TIKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3856.
---
Fix Version/s: 2.5.0
Resolution: Fixed
> Upgrade to jempbox 1.8.17
> -
Tim Allison created TIKA-3857:
-
Summary: Upgrade to POI 5.2.3
Key: TIKA-3857
URL: https://issues.apache.org/jira/browse/TIKA-3857
Project: Tika
Issue Type: Task
Reporter: Tim Allison
Tim Allison created TIKA-3856:
-
Summary: Upgrade to jempbox 1.8.17
Key: TIKA-3856
URL: https://issues.apache.org/jira/browse/TIKA-3856
Project: Tika
Issue Type: Task
Reporter: Tim All
[
https://issues.apache.org/jira/browse/TIKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3855.
---
Fix Version/s: 2.5.0
Resolution: Fixed
> Implement upsert for OpenSearch emitter
>
Tim Allison created TIKA-3855:
-
Summary: Implement upsert for OpenSearch emitter
Key: TIKA-3855
URL: https://issues.apache.org/jira/browse/TIKA-3855
Project: Tika
Issue Type: Task
Rep
On Thu, 15 Sep 2022, Sindhu Mahadevappa wrote:
We have been looking for the latest Tika 2.4.1 jar file, looks like it
is not available anywhere.
You can get the Tika App and Tika Server jars for 2.4.1 from
https://tika.apache.org/download.html
For the core and parser jars, manually downloading
Hi Team,
Thanks for the quick response.
We have been looking for the latest Tika 2.4.1 jar file, looks like it is not
available anywhere.
Can you please share the link where we can get the latest 2.4.1 jar file, it
will be very helpful.
Thanks & Regards
Sindhu Mahadevappa
> -Original Mess
41 matches
Mail list logo