Re: ISO 19115 as a metadata model for Tika?

2015-10-13 Thread Martin Desruisseaux
Le 12/10/15 14:22, Nick Burch a écrit : > Currently, it's very easy for a new user of Tika to get the metadata > they want out, they can just fetch a simple string value to get > started with. You can, when you learn more, start getting more richly > typed values out, but the quickstart is simple.

[GitHub] tika pull request: lower priority on magic for application/xhtml+x...

2015-10-13 Thread jeremybmerrill
GitHub user jeremybmerrill opened a pull request: https://github.com/apache/tika/pull/58 lower priority on magic for application/xhtml+xml to avoid misdetecting xhtml-containing emails as XHTML docs. Emails I have (happy to share if you want) contain XHTML, as one part of a

[jira] [Updated] (TIKA-1768) Document headers and footers in metadata

2015-10-13 Thread Aeham Abushwashi (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aeham Abushwashi updated TIKA-1768: --- Priority: Critical (was: Major) > Document headers and footers in metadata > -

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-13 Thread Ben McCann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955248#comment-14955248 ] Ben McCann commented on TIKA-1285: -- I didn't really do any load or memory testing. My test

[jira] [Updated] (TIKA-1768) Document headers and footers in metadata

2015-10-13 Thread Aeham Abushwashi (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aeham Abushwashi updated TIKA-1768: --- Attachment: HeaderAndFooterTestFiles.zip Attached data files separately because they cannot be

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-13 Thread Timo Boehme (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954946#comment-14954946 ] Timo Boehme commented on TIKA-1285: --- Did you try using the new memory settings possibilit

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954947#comment-14954947 ] Tim Allison commented on TIKA-1285: --- Y, that's the first thing on my todo list on our wra

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954860#comment-14954860 ] Tim Allison commented on TIKA-1285: --- Thank you for testing the dev wrapper and PDFBox 2.0

Re: Issue with tika-core & tika-parsers

2015-10-13 Thread Nick Burch
On 13/10/15 06:55, Ravi Kishan Telu wrote: I have an issue with *tika-core* & *tika-parsers* jar but while using *tika-app* jar *pdf, doc, docx,* etc files are being parsed but here in my requirement i cannot use *tika-app *jar file as i am getting *NO SUCH FIELD : INSTANCE* error due to the conf