[
https://issues.apache.org/jira/browse/TIKA-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304958#comment-16304958
]
Ewan Mellor commented on TIKA-2518:
---
It looks like this is being fixed under TIKA-2490.
Ewan Mellor created TIKA-2581:
-
Summary: testOCROutputsHOCR fails with Tesseract 4.0
Key: TIKA-2581
URL: https://issues.apache.org/jira/browse/TIKA-2581
Project: Tika
Issue Type: Bug
Co
[
https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ewan Mellor updated TIKA-2581:
--
Description:
TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0.
With 3.x, the output is
Ewan Mellor created TIKA-2582:
-
Summary: Tesseract 4.0 includes a FF character by default,
breaking parsers
Key: TIKA-2582
URL: https://issues.apache.org/jira/browse/TIKA-2582
Project: Tika
Issu
[
https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ewan Mellor updated TIKA-2582:
--
Description:
Tesseract 4.0 includes a change to use form feed characters to separate pages
by default in
Ewan Mellor created TIKA-2580:
-
Summary: SafeContentHandler documentation is incorrect about
replacement character
Key: TIKA-2580
URL: https://issues.apache.org/jira/browse/TIKA-2580
Project: Tika
Ewan Mellor created TIKA-2583:
-
Summary: Tika readme should mention builds.apache.org
Key: TIKA-2583
URL: https://issues.apache.org/jira/browse/TIKA-2583
Project: Tika
Issue Type: Bug
C
Ewan Mellor created TIKA-2584:
-
Summary: Tika should have a way to pass arbitrary Tesseract options
Key: TIKA-2584
URL: https://issues.apache.org/jira/browse/TIKA-2584
Project: Tika
Issue Type: I
[
https://issues.apache.org/jira/browse/TIKA-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372074#comment-16372074
]
Ewan Mellor commented on TIKA-2583:
---
I wasn't trying to tell users where to find builds,
Ewan Mellor created TIKA-2586:
-
Summary: PDFParser documentation has incorrect DPI default
Key: TIKA-2586
URL: https://issues.apache.org/jira/browse/TIKA-2586
Project: Tika
Issue Type: Improvemen
Ewan Mellor created TIKA-2613:
-
Summary: Tesseract 4.0 has removed -psm, so Tika must update
Key: TIKA-2613
URL: https://issues.apache.org/jira/browse/TIKA-2613
Project: Tika
Issue Type: Improvem
[
https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ewan Mellor updated TIKA-2613:
--
Description:
Tesseract 4.0 (currently in beta-1) has removed the {{\-psm}} flag, in favor of
{{\-\-psm}}
[
https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419501#comment-16419501
]
Ewan Mellor commented on TIKA-2620:
---
[https://bugs.openjdk.java.net/browse/JDK-8041125]
[
https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419752#comment-16419752
]
Ewan Mellor commented on TIKA-2613:
---
Build failures were not this change; they were from
[
https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419798#comment-16419798
]
Ewan Mellor commented on TIKA-2582:
---
Build failures were not this change; they were from
Ewan Mellor created TIKA-2624:
-
Summary: Rendering PDFs for OCR with Tesseract uses different DPI
than claimed
Key: TIKA-2624
URL: https://issues.apache.org/jira/browse/TIKA-2624
Project: Tika
I
[
https://issues.apache.org/jira/browse/TIKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ewan Mellor updated TIKA-2624:
--
Description:
Tika has two properties in {{PDFParser.properties}} that control what happens
in AbstractPD
[
https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422984#comment-16422984
]
Ewan Mellor commented on TIKA-2620:
---
See TIKA-2624. I think that the statement re 300 DP
[
https://issues.apache.org/jira/browse/TIKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423040#comment-16423040
]
Ewan Mellor commented on TIKA-2624:
---
There were definitely changes between 1.8 and 2.0, e
[
https://issues.apache.org/jira/browse/TIKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423108#comment-16423108
]
Ewan Mellor commented on TIKA-2624:
---
[~talli...@mitre.org] I don't know what your release
Ewan Mellor created TIKA-2651:
-
Summary: tika-translate jar contains duplicate classes from
tika-core jar
Key: TIKA-2651
URL: https://issues.apache.org/jira/browse/TIKA-2651
Project: Tika
Issue
21 matches
Mail list logo