[jira] [Created] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
Andreas Baumann created TIKA-2241: - Summary: DumpTikaConfigExample generates strange tika-config.xml Key: TIKA-2241 URL: https://issues.apache.org/jira/browse/TIKA-2241 Project: Tika Issue T

[jira] [Updated] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Baumann updated TIKA-2241: -- Description: {code:none|borderStyle=solid} mvn exec:java -Dexec.mainClass="org.apache.tika.exampl

[jira] [Updated] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Baumann updated TIKA-2241: -- Description: {code:none|borderStyle=solid} mvn exec:java -Dexec.mainClass="org.apache.tika.exampl

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825700#comment-15825700 ] Nick Burch commented on TIKA-2241: -- The example is no longer the recommended way to genera

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825720#comment-15825720 ] Andreas Baumann commented on TIKA-2241: --- The double classes stem from maven (tika-par

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825729#comment-15825729 ] Nick Burch commented on TIKA-2241: -- You only need to specify a mimetype for a parser if yo

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825740#comment-15825740 ] Andreas Baumann commented on TIKA-2241: --- I thought it would dump all the MIME types i

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825786#comment-15825786 ] Nick Burch commented on TIKA-2241: -- To get the list of mime types listed as supported by e

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825801#comment-15825801 ] Andreas Baumann commented on TIKA-2241: --- {code}--list-parser-details{code} prints exa

[jira] [Created] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Jan Van Raemdonck (JIRA)
Jan Van Raemdonck created TIKA-2242: --- Summary: opendocument parsing produces malformed xml Key: TIKA-2242 URL: https://issues.apache.org/jira/browse/TIKA-2242 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Jan Van Raemdonck (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Van Raemdonck updated TIKA-2242: Attachment: (was: 2017-01-02-16B833-16B833VANCAUTEREN.odt) > opendocument parsing produce

[jira] [Updated] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Jan Van Raemdonck (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Van Raemdonck updated TIKA-2242: Attachment: 2017-01-02-16B833-16B833VANCAUTEREN.odt > opendocument parsing produces malformed

[jira] [Updated] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Jan Van Raemdonck (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Van Raemdonck updated TIKA-2242: Attachment: 2017-01-02-16B833-16B833VANCAUTEREN.odt > opendocument parsing produces malformed

[jira] [Updated] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Baumann updated TIKA-2241: -- Priority: Trivial (was: Major) > DumpTikaConfigExample generates strange tika-config.xml >

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826253#comment-15826253 ] Andreas Baumann commented on TIKA-2241: --- A use case could be a very restrictive envir

[jira] [Comment Edited] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826251#comment-15826251 ] Andreas Baumann edited comment on TIKA-2241 at 1/17/17 3:35 PM: -

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826251#comment-15826251 ] Andreas Baumann commented on TIKA-2241: --- Commenting out the PDF converter: {code|xml

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826325#comment-15826325 ] Nick Burch commented on TIKA-2241: -- Can you please open a fresh bug for the grobid issue?

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826453#comment-15826453 ] Nick Burch commented on TIKA-2241: -- Support added in git in {{320a1f1ede36cf1f62f6f2b8cab4

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826478#comment-15826478 ] Hudson commented on TIKA-2241: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1180 (See [h

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826595#comment-15826595 ] Hudson commented on TIKA-2241: -- SUCCESS: Integrated in Jenkins build tika-2.x #199 (See [http

[jira] [Commented] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826609#comment-15826609 ] Tim Allison commented on TIKA-2242: --- Bad nesting of markup or something else? {noformat}

[jira] [Commented] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Jan Van Raemdonck (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826650#comment-15826650 ] Jan Van Raemdonck commented on TIKA-2242: - Yes, the bad nesting is the issue i'm ha

[jira] [Commented] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826839#comment-15826839 ] Tim Allison commented on TIKA-2242: --- Thank you. The attached test file also shows that w

[GitHub] tika pull request #147: TIKA-2231: Improved param validation of TesseractOCR...

2017-01-17 Thread ham1
GitHub user ham1 opened a pull request: https://github.com/apache/tika/pull/147 TIKA-2231: Improved param validation of TesseractOCRConfig.setLanguage() I also improved and added more test cases. You can merge this pull request into a Git repository by running: $ git pull https

[jira] [Commented] (TIKA-2231) Invalid language code exception

2017-01-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826902#comment-15826902 ] ASF GitHub Bot commented on TIKA-2231: -- GitHub user ham1 opened a pull request: h

[jira] [Commented] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827321#comment-15827321 ] Tim Allison commented on TIKA-2242: --- And we have to handle {{style:text-underline-style="

[jira] [Resolved] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2242. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Thank you for opening this and subm

[GitHub] tika pull request #147: TIKA-2231: Improved param validation of TesseractOCR...

2017-01-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/147 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[jira] [Commented] (TIKA-2231) Invalid language code exception

2017-01-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827335#comment-15827335 ] ASF GitHub Bot commented on TIKA-2231: -- Github user asfgit closed the pull request at:

[jira] [Resolved] (TIKA-2231) Invalid language code exception

2017-01-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2231. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Thank you for opening this issue.

tika-2.x - Build # 200 - Failure

2017-01-17 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x (build #200) Status: Failure Check console output at https://builds.apache.org/job/tika-2.x/200/ to view the results.

[jira] [Commented] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827341#comment-15827341 ] Hudson commented on TIKA-2242: -- FAILURE: Integrated in Jenkins build tika-2.x #200 (See [http

[jira] [Commented] (TIKA-2242) opendocument parsing produces malformed xml

2017-01-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827344#comment-15827344 ] Hudson commented on TIKA-2242: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1181 (See [h

[jira] [Commented] (TIKA-2231) Invalid language code exception

2017-01-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827381#comment-15827381 ] Hudson commented on TIKA-2231: -- FAILURE: Integrated in Jenkins build tika-2.x #201 (See [http

tika-2.x - Build # 201 - Still Failing

2017-01-17 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x (build #201) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x/201/ to view the results.

[jira] [Commented] (TIKA-2231) Invalid language code exception

2017-01-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827387#comment-15827387 ] Hudson commented on TIKA-2231: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1182 (See [h

[jira] [Created] (TIKA-2243) GrobidRESTParser executes when no parser matches to MIME-type

2017-01-17 Thread Andreas Baumann (JIRA)
Andreas Baumann created TIKA-2243: - Summary: GrobidRESTParser executes when no parser matches to MIME-type Key: TIKA-2243 URL: https://issues.apache.org/jira/browse/TIKA-2243 Project: Tika I

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827518#comment-15827518 ] Andreas Baumann commented on TIKA-2241: --- Opened new bug TIKA-2243 for the GrobidRESTP

[jira] [Commented] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827525#comment-15827525 ] Andreas Baumann commented on TIKA-2241: --- Tested the --dump-static-full-config, looks

[jira] [Comment Edited] (TIKA-2241) DumpTikaConfigExample generates strange tika-config.xml

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827525#comment-15827525 ] Andreas Baumann edited comment on TIKA-2241 at 1/18/17 7:16 AM: -

[jira] [Commented] (TIKA-2243) GrobidRESTParser executes when no parser matches to MIME-type

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827532#comment-15827532 ] Andreas Baumann commented on TIKA-2243: --- Using the new --dump-static-full-config opti

[jira] [Comment Edited] (TIKA-2243) GrobidRESTParser executes when no parser matches to MIME-type

2017-01-17 Thread Andreas Baumann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827532#comment-15827532 ] Andreas Baumann edited comment on TIKA-2243 at 1/18/17 7:21 AM: -