[jira] [Resolved] (TIKA-3256) Update maven and maven min version

2020-12-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3256. --- Resolution: Fixed > Update maven and maven min version > -- >

[jira] [Created] (TIKA-3256) Update maven and maven min version

2020-12-27 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-3256: - Summary: Update maven and maven min version Key: TIKA-3256 URL: https://issues.apache.org/jira/browse/TIKA-3256 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-3253) improve "attachments" tika-eval report directory

2020-12-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3253: -- Fix Version/s: 1.26 2.0.0 > improve "attachments" tika-eval report directory

[jira] [Resolved] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3246. --- Fix Version/s: 1.26 2.0.0 Assignee: Tilman Hausherr

[jira] [Commented] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252160#comment-17252160 ] Tilman Hausherr commented on TIKA-3246: --- Now waiting for the release of PDFBox 2.0.22, which is

[jira] [Commented] (TIKA-3253) improve "attachments" tika-eval report directory

2020-12-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252112#comment-17252112 ] Tilman Hausherr commented on TIKA-3253: --- It works, thanks! (And I'm amazed that this only needed a

[jira] [Commented] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251831#comment-17251831 ] Tilman Hausherr commented on TIKA-3246: --- I ran another test and this time it didn't happen. I'm

[jira] [Updated] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3246: -- Attachment: TIKA-3246.patch > IllegalArgumentException when generation of appearances fails >

[jira] [Commented] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251734#comment-17251734 ] Tilman Hausherr commented on TIKA-3246: --- I don't know... that file is slow for

[jira] [Commented] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251720#comment-17251720 ] Tilman Hausherr commented on TIKA-3246: --- That worked, there's one left,

[jira] [Comment Edited] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251244#comment-17251244 ] Tilman Hausherr edited comment on TIKA-3246 at 12/17/20, 6:47 PM: -- The

[jira] [Commented] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251244#comment-17251244 ] Tilman Hausherr commented on TIKA-3246: --- The cause of the NPE is a PDFBox bug (), but I realize

[jira] [Commented] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251240#comment-17251240 ] Tilman Hausherr commented on TIKA-3246: --- The missing files mentioned in TIKA-3253 are gone, however

[jira] [Commented] (TIKA-3253) improve "attachments" tika-eval report directory

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251052#comment-17251052 ] Tilman Hausherr commented on TIKA-3253: --- don't know, but {code} SELECT * FROM "PUBLIC".CONTAINERS

[jira] [Commented] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251034#comment-17251034 ] Tilman Hausherr commented on TIKA-3246: --- calling {{getAcroform()}} with {{new

[jira] [Updated] (TIKA-3253) improve "attachments" tika-eval report directory

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3253: -- Description: While doing regression testing for PDFBox I found

[jira] [Updated] (TIKA-3253) improve "attachments" tika-eval report directory

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3253: -- Description: While doing regression testing for PDFBox I found

[jira] [Updated] (TIKA-3253) improve "attachments" tika-eval report directory

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3253: -- Summary: improve "attachments" tika-eval report directory (was: improve "attachments" report

[jira] [Updated] (TIKA-3253) improve "attachments" report directory

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3253: -- Priority: Minor (was: Major) > improve "attachments" report directory >

[jira] [Created] (TIKA-3253) improve "attachments" report directory

2020-12-17 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-3253: - Summary: improve "attachments" report directory Key: TIKA-3253 URL: https://issues.apache.org/jira/browse/TIKA-3253 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-3252) Remove (or change) "rackspace" from wiki

2020-12-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3252: -- Summary: Remove (or change) "rackspace" from wiki (was: Remove "rackspace" from wiki) >

[jira] [Created] (TIKA-3252) Remove "rackspace" from wiki

2020-12-17 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-3252: - Summary: Remove "rackspace" from wiki Key: TIKA-3252 URL: https://issues.apache.org/jira/browse/TIKA-3252 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-3244) General upgrades for 1.26

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3244: -- Fix Version/s: 1.26 2.0.0 > General upgrades for 1.26 >

[jira] [Updated] (TIKA-3244) General upgrades for 1.26

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3244: -- Affects Version/s: 1.25 > General upgrades for 1.26 > - > >

[jira] [Updated] (TIKA-3248) ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3248: -- Fix Version/s: (was: 1.25) 1.26 > ClassCastException: class

[jira] [Updated] (TIKA-3248) ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3248: -- Fix Version/s: 1.25 > ClassCastException: class PDSimpleFileSpecification cannot be cast to >

[jira] [Updated] (TIKA-3248) ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3248: -- Fix Version/s: 2.0.0 > ClassCastException: class PDSimpleFileSpecification cannot be cast to >

[jira] [Resolved] (TIKA-3248) ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3248. --- Resolution: Fixed > ClassCastException: class PDSimpleFileSpecification cannot be cast to >

[jira] [Commented] (TIKA-3104) Detection of memgraph files exported from Xcode

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248830#comment-17248830 ] Tilman Hausherr commented on TIKA-3104: --- Was this one fixed? It is targeted for 1.25 which has been

[jira] [Resolved] (TIKA-3112) NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-app GUI

2020-12-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-3112. --- Assignee: Tim Allison Resolution: Fixed Setting to resolved, this was fixed some time

[jira] [Created] (TIKA-3248) ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification

2020-12-13 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-3248: - Summary: ClassCastException: class PDSimpleFileSpecification cannot be cast to PDComplexFileSpecification Key: TIKA-3248 URL: https://issues.apache.org/jira/browse/TIKA-3248

[jira] [Updated] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3246: -- Description: {noformat} java.lang.IllegalArgumentException: No glyph for U+0041 (A) in font

[jira] [Updated] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3246: -- Attachment: (was: REDHAT-1301016-0.pdf) > IllegalArgumentException when generation of

[jira] [Updated] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3246: -- Attachment: REDHAT-1301016-0.pdf > IllegalArgumentException when generation of appearances

[jira] [Created] (TIKA-3246) IllegalArgumentException when generation of appearances fails

2020-12-11 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-3246: - Summary: IllegalArgumentException when generation of appearances fails Key: TIKA-3246 URL: https://issues.apache.org/jira/browse/TIKA-3246 Project: Tika

[jira] [Closed] (TIKA-3245) update jaxb and remove javax.activation

2020-12-10 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-3245. - Resolution: Won't Do Sadly this was a mistake because I had tested on jdk8 only. It failed on

[jira] [Updated] (TIKA-3245) update jaxb and remove javax.activation

2020-12-10 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3245: -- Summary: update jaxb and remove javax.activation (was: update jaxb) > update jaxb and remove

[jira] [Created] (TIKA-3245) update jaxb

2020-12-10 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-3245: - Summary: update jaxb Key: TIKA-3245 URL: https://issues.apache.org/jira/browse/TIKA-3245 Project: Tika Issue Type: Task Reporter: Tilman

[jira] [Updated] (TIKA-3244) General upgrades for 1.26

2020-12-08 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3244: -- Summary: General upgrades for 1.26 (was: CLONE - General upgrades for 1.26) > General upgrades

[jira] [Updated] (TIKA-3244) CLONE - General upgrades for 1.26

2020-12-08 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3244: -- Reporter: Tilman Hausherr (was: Tim Allison) > CLONE - General upgrades for 1.26 >

[jira] [Created] (TIKA-3244) CLONE - General upgrades for 1.26

2020-12-08 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-3244: - Summary: CLONE - General upgrades for 1.26 Key: TIKA-3244 URL: https://issues.apache.org/jira/browse/TIKA-3244 Project: Tika Issue Type: Task

[jira] [Comment Edited] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-08 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228005#comment-17228005 ] Tilman Hausherr edited comment on TIKA-3224 at 11/8/20, 2:18 PM: - related

[jira] [Commented] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-08 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228005#comment-17228005 ] Tilman Hausherr commented on TIKA-3224: --- related issue is resolved > Stackoverflow with Embedded

[jira] [Updated] (TIKA-3175) Upgrade version of TPS: commons-io

2020-08-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3175: -- External issue URL: (was: https://issues.apache.org/jira/browse/IO-559) > Upgrade version of

[jira] [Commented] (TIKA-3175) Upgrade version of TPS: commons-io

2020-08-21 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181788#comment-17181788 ] Tilman Hausherr commented on TIKA-3175: --- This has already been done in the current repository as

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180244#comment-17180244 ] Tilman Hausherr commented on TIKA-3172: --- Valid field names with the change: ocrStrategy,

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180025#comment-17180025 ] Tilman Hausherr commented on TIKA-3172: --- AnnotationUtils.assignFieldParams() has a list of "allowed"

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179978#comment-17179978 ] Tilman Hausherr commented on TIKA-3172: --- I'm researching this a bit... what I found out is that the

[jira] [Issue Comment Deleted] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3172: -- Comment: was deleted (was: Please try if setting and changing "sortByPosition" has any effect.

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179353#comment-17179353 ] Tilman Hausherr commented on TIKA-3172: --- Please try if setting and changing "sortByPosition" has any

[jira] [Commented] (TIKA-3170) PDF extraction space issue

2020-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179338#comment-17179338 ] Tilman Hausherr commented on TIKA-3170: --- Please insert this in the PDF segment of your config.xml

[jira] [Commented] (TIKA-3170) PDF extraction space issue

2020-08-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179139#comment-17179139 ] Tilman Hausherr commented on TIKA-3170: --- This is because the glyphs are so much apart. You and me,

[jira] [Updated] (TIKA-3131) PDFParserConfig default values were accidentally swapped

2020-07-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3131: -- Fix Version/s: 1.25 > PDFParserConfig default values were accidentally swapped >

[jira] [Updated] (TIKA-3112) NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-app GUI

2020-07-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3112: -- Fix Version/s: (was: 1.23) 1.25 > NullPointerException at

[jira] [Updated] (TIKA-3131) PDFParserConfig default values were accidentally swapped

2020-07-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3131: -- Component/s: parser config > PDFParserConfig default values were accidentally

[jira] [Updated] (TIKA-3112) NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-app GUI

2020-06-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3112: -- Summary: NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-app GUI

[jira] [Updated] (TIKA-3112) NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-gui

2020-06-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3112: -- Summary: NullPointerException at AbstractPDF2XHTML.extractXMPXFA() when using tika-gui (was:

[jira] [Updated] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar

2020-06-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3112: -- Labels: regression (was: ) > New bugs introduced in Tika-app-1.24.1.jar >

[jira] [Commented] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135203#comment-17135203 ] Tilman Hausherr commented on TIKA-3111: --- Now it works > Upgrade to PDFBox 2.0.20 >

[jira] [Comment Edited] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134768#comment-17134768 ] Tilman Hausherr edited comment on TIKA-3111 at 6/13/20, 11:34 AM: -- I did

[jira] [Commented] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134768#comment-17134768 ] Tilman Hausherr commented on TIKA-3111: --- I did (after reverting my change in Tika), and it doesn't

[jira] [Commented] (TIKA-3114) Error reading transcript from document

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134638#comment-17134638 ] Tilman Hausherr commented on TIKA-3114: --- [~dbalasub] Your stack trace does not contain anything from

[jira] [Commented] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134303#comment-17134303 ] Tilman Hausherr commented on TIKA-3111: --- No, I got it to work with several changes in

[jira] [Comment Edited] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134264#comment-17134264 ] Tilman Hausherr edited comment on TIKA-3111 at 6/12/20, 3:09 PM: - Ignore

[jira] [Commented] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134275#comment-17134275 ] Tilman Hausherr commented on TIKA-3111: --- Got it. PDFStreamEngine calls the (new) 4 parameter

[jira] [Commented] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134264#comment-17134264 ] Tilman Hausherr commented on TIKA-3111: --- Ignore my comment, it isn't helpful here, I was just

[jira] [Comment Edited] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133978#comment-17133978 ] Tilman Hausherr edited comment on TIKA-3111 at 6/12/20, 6:42 AM: - It's

[jira] [Comment Edited] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133978#comment-17133978 ] Tilman Hausherr edited comment on TIKA-3111 at 6/12/20, 6:41 AM: - tail of

[jira] [Comment Edited] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133978#comment-17133978 ] Tilman Hausherr edited comment on TIKA-3111 at 6/12/20, 6:39 AM: - tail of

[jira] [Commented] (TIKA-3111) Upgrade to PDFBox 2.0.20

2020-06-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133978#comment-17133978 ] Tilman Hausherr commented on TIKA-3111: --- tail of debug log for 2.0.19: {quote} Warning

[jira] [Comment Edited] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar

2020-06-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133605#comment-17133605 ] Tilman Hausherr edited comment on TIKA-3112 at 6/11/20, 7:19 PM: - I get

[jira] [Comment Edited] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar

2020-06-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133605#comment-17133605 ] Tilman Hausherr edited comment on TIKA-3112 at 6/11/20, 7:18 PM: - I get

[jira] [Commented] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar

2020-06-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133605#comment-17133605 ] Tilman Hausherr commented on TIKA-3112: --- I get this too... the part at the bottom of the stack trace

[jira] [Updated] (TIKA-3112) New bugs introduced in Tika-app-1.24.1.jar

2020-06-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3112: -- Description: After start tika-app-1.24.1 using: java -jar tika-app-1.24.1.jar -g from an linux

[jira] [Commented] (TIKA-3102) Unmappable chars for PDF

2020-05-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106520#comment-17106520 ] Tilman Hausherr commented on TIKA-3102: --- Yes, you should use the OCR feature in Tika to extract this

[jira] [Commented] (TIKA-3080) CharsetMatch.getString can get stuck in infinite loop

2020-04-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086864#comment-17086864 ] Tilman Hausherr commented on TIKA-3080: --- [~tallison] can this one be resolved? It looks good to me.

[jira] [Commented] (TIKA-3091) java.lang.NullPointerException when calling hashCode after instantiating PDFParserConfig

2020-04-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086862#comment-17086862 ] Tilman Hausherr commented on TIKA-3091: --- [~tallison] can this one be resolved? It looks good to me.

[jira] [Commented] (TIKA-3091) java.lang.NullPointerException when calling hashCode after instantiating PDFParserConfig

2020-04-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082835#comment-17082835 ] Tilman Hausherr commented on TIKA-3091: --- Ouch. Two ways to solve this: - check for null in equals()

[jira] [Reopened] (TIKA-3075) Add an HTTP parser

2020-03-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened TIKA-3075: --- > Add an HTTP parser > -- > > Key: TIKA-3075 >

[jira] [Closed] (TIKA-3075) Add an HTTP parser

2020-03-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-3075. - Resolution: Invalid > Add an HTTP parser > -- > > Key: TIKA-3075

[jira] [Commented] (TIKA-3065) Not able to parse the document with inline image

2020-03-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056729#comment-17056729 ] Tilman Hausherr commented on TIKA-3065: --- [~suchendra] ignore that one. That was a mistake in the

[jira] [Commented] (TIKA-3065) Not able to parse the document with inline image

2020-03-10 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056108#comment-17056108 ] Tilman Hausherr commented on TIKA-3065: --- I suggest you save whatever you get in that inputstream

[jira] [Commented] (TIKA-3067) Different numbers of embedded inline images with PDF inline image extraction code

2020-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055402#comment-17055402 ] Tilman Hausherr commented on TIKA-3067: --- I think my test was with an old version. I'm not fully up

[jira] [Commented] (TIKA-3067) Different numbers of embedded inline images with PDF inline image extraction code

2020-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055367#comment-17055367 ] Tilman Hausherr commented on TIKA-3067: --- PDFBox ExtractImages brings 4 images. The image masks are

[jira] [Comment Edited] (TIKA-3067) Different numbers of embedded inline images with PDF inline image extraction code

2020-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055367#comment-17055367 ] Tilman Hausherr edited comment on TIKA-3067 at 3/9/20, 8:53 PM: PDFBox

[jira] [Commented] (TIKA-3065) Not able to parse the document with inline image

2020-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054863#comment-17054863 ] Tilman Hausherr commented on TIKA-3065: --- works for me: {code} public class TIKA3065 { public

[jira] [Commented] (TIKA-3059) New NPE in ImageGraphicsEngine

2020-03-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052514#comment-17052514 ] Tilman Hausherr commented on TIKA-3059: --- Fixed in PDFBox. > New NPE in ImageGraphicsEngine >

[jira] [Comment Edited] (TIKA-3059) New NPE in ImageGraphicsEngine

2020-03-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052332#comment-17052332 ] Tilman Hausherr edited comment on TIKA-3059 at 3/5/20, 5:03 PM: And yes

[jira] [Comment Edited] (TIKA-3059) New NPE in ImageGraphicsEngine

2020-03-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052332#comment-17052332 ] Tilman Hausherr edited comment on TIKA-3059 at 3/5/20, 4:57 PM: And yes

[jira] [Commented] (TIKA-3059) New NPE in ImageGraphicsEngine

2020-03-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052332#comment-17052332 ] Tilman Hausherr commented on TIKA-3059: --- And yes it's a feature, getExtGStateNames() returns the

[jira] [Commented] (TIKA-3059) New NPE in ImageGraphicsEngine

2020-03-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052329#comment-17052329 ] Tilman Hausherr commented on TIKA-3059: --- I found a place in PDFBox where we catch this, and another

[jira] [Issue Comment Deleted] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3035: -- Comment: was deleted (was: I have no position on this. [~sorend] did not bring any further

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043805#comment-17043805 ] Tilman Hausherr commented on TIKA-3035: --- I have no position on this. [~sorend] did not bring any

[jira] [Comment Edited] (TIKA-2650) Soft-hyphen is not extracted properly

2020-02-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040308#comment-17040308 ] Tilman Hausherr edited comment on TIKA-2650 at 2/19/20 6:25 PM: I wrote

[jira] [Commented] (TIKA-2650) Soft-hyphen is not extracted properly

2020-02-19 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040308#comment-17040308 ] Tilman Hausherr commented on TIKA-2650: --- I wrote that it depends. There is no perfect solution. In

[jira] [Commented] (TIKA-2650) Soft-hyphen is not extracted properly

2020-02-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039701#comment-17039701 ] Tilman Hausherr commented on TIKA-2650: --- Text extraction of multi column pages partly depends on

[jira] [Commented] (TIKA-3040) PDF inline OCR: Exception while processing certain image (others in same PDF work)

2020-02-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035685#comment-17035685 ] Tilman Hausherr commented on TIKA-3040: --- I expected that it works with the "render page" strategy,

[jira] [Commented] (TIKA-3040) PDF inline OCR: Exception while processing certain image (others in same PDF work)

2020-02-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035683#comment-17035683 ] Tilman Hausherr commented on TIKA-3040: --- I have not managed to reproduce the effect, I ran this:

[jira] [Commented] (TIKA-3041) ExtractInlineImages missing images from PDFBOX-52

2020-02-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035571#comment-17035571 ] Tilman Hausherr commented on TIKA-3041: --- Thanks, I've also added to to PDFBOX-4771 because I need to

[jira] [Comment Edited] (TIKA-3040) PDF inline OCR: Exception while processing certain image (others in same PDF work)

2020-02-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035049#comment-17035049 ] Tilman Hausherr edited comment on TIKA-3040 at 2/12/20 4:59 AM: Unrelated

<    1   2   3   4   5   6   7   8   >