[
https://issues.apache.org/jira/browse/TIKA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov closed TIKA-2347.
---
> Underlined text is not decorated as such when extracting from word documents
>
[
https://issues.apache.org/jira/browse/TIKA-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov resolved TIKA-2601.
-
Resolution: Duplicate
I mark it as duplicate for TIKA-2555 which I'm currently looking
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810288#comment-16810288
]
Tim Allison commented on TIKA-2847:
---
Last hope:
{noformat}
PDFParserConfig pdfParserConfig = new
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810256#comment-16810256
]
Tim Allison commented on TIKA-2749:
---
[~rossj], this is very helpful...any recs on how to detect "not a
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810172#comment-16810172
]
Ross Johnson commented on TIKA-2749:
OCRing the inlined images directly can be tricky, in my
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810165#comment-16810165
]
Ashish Tiwari commented on TIKA-2847:
-
yes TikaInputStream.get(infile) gave me same error.
>
[
https://issues.apache.org/jira/browse/TIKA-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Gribov reassigned TIKA-2555:
---
Assignee: Konstantin Gribov
> Text with [underline] + [another format] in word
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison edited comment on TIKA-2749 at 4/4/19 5:47 PM:
---
There are several
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810150#comment-16810150
]
Tim Allison commented on TIKA-2847:
---
sorry. I meant {{TikaInputStream.get(infile)}}...
>
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810146#comment-16810146
]
Ashish Tiwari commented on TIKA-2847:
-
Thanks Tim setting "setUseSAXDocxExtractor" to true worked for
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810131#comment-16810131
]
Tim Allison commented on TIKA-2847:
---
Try opening the InputStream with
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison edited comment on TIKA-2749 at 4/4/19 5:13 PM:
---
There are several
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison edited comment on TIKA-2749 at 4/4/19 5:12 PM:
---
There are several
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison edited comment on TIKA-2749 at 4/4/19 5:12 PM:
---
There are several
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison edited comment on TIKA-2749 at 4/4/19 5:08 PM:
---
There are several
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810071#comment-16810071
]
Tim Allison commented on TIKA-2749:
---
Thank you, [~tilman]. Fixed.
> OCR on PDFs should "just work" out
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison edited comment on TIKA-2749 at 4/4/19 4:49 PM:
---
There are several
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810059#comment-16810059
]
Tilman Hausherr commented on TIKA-2749:
---
You probably mean "vector graphics".
> OCR on PDFs should
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809992#comment-16809992
]
Ashish Tiwari commented on TIKA-2847:
-
Please find below code snippet, below code snippet is used for
[
https://issues.apache.org/jira/browse/TIKA-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809949#comment-16809949
]
chandra commented on TIKA-2840:
---
hi tim,
Looks like simple batch files which are starting upper case @ECHO
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809928#comment-16809928
]
Tim Allison commented on TIKA-2847:
---
How are you loading the PDF? Can you attach it/share it? You may
[
https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809904#comment-16809904
]
Ashish Tiwari commented on TIKA-2847:
-
Thanks Tim i will check by setting SAX docx parser, but what in
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison edited comment on TIKA-2749 at 4/4/19 1:44 PM:
---
There are several
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison commented on TIKA-2749:
---
There are several reasons why one might want to run OCR on a PDF
24 matches
Mail list logo