[jira] [Created] (TIKA-1358) Add support for newer iWork file formats

2014-06-26 Thread Jelle Kastelein (JIRA)
Jelle Kastelein created TIKA-1358: - Summary: Add support for newer iWork file formats Key: TIKA-1358 URL: https://issues.apache.org/jira/browse/TIKA-1358 Project: Tika Issue Type: Wish

[jira] [Commented] (TIKA-1358) Add support for newer iWork file formats

2014-06-26 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044468#comment-14044468 ] Nick Burch commented on TIKA-1358: -- First thing we'd probably want is to re-create the

[jira] [Commented] (TIKA-1288) Epub's content extracted partially

2014-06-26 Thread Jelle Kastelein (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044591#comment-14044591 ] Jelle Kastelein commented on TIKA-1288: --- Quite possibly the same issue: I'm not

RE: Question re installing Tika

2014-06-26 Thread Richard
Thanks very much Chris ... its all working now. You haven't by chance happen to have programmatically looped through a directory full of pdfs and used Tika to extract each of their pdf contents into separate text or xml files? If so, what do you recommend to do the extraction? Kind regards

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-06-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044668#comment-14044668 ] Tim Allison commented on TIKA-1302: --- Agreed. If there's a grad student with some time on

[jira] [Commented] (TIKA-1332) Create eval code

2014-06-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044682#comment-14044682 ] Tim Allison commented on TIKA-1332: --- To my mind, there are three families of things that

Julia wrapper around Tika

2014-06-26 Thread Mattmann, Chris A (3980)
Hey Guys, The Julia program language folks at MIT have created a Julia wrapper around Tika called Taro.jl: https://github.com/aviks/Taro.jl Woot. Tika is now available in the Julia programming language! Cheers, Chris ++ Chris

[jira] [Updated] (TIKA-1300) Switch default PDFBox parser to NonSequentialParser

2014-06-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1300: -- Attachment: tika_1_6_ClassicsVsNonSeq.zip The attached shows the results of running Tika 1.6 trunk with

[jira] [Commented] (TIKA-1233) PDFBox can throw StringIndexOutOfBoundsException on some dates

2014-06-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044881#comment-14044881 ] Tim Allison commented on TIKA-1233: --- Hindsight and current eval methodology turn out to

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-06-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044891#comment-14044891 ] Lewis John McGibbney commented on TIKA-1302: I would love to work with

[jira] [Commented] (TIKA-1300) Switch default PDFBox parser to NonSequentialParser

2014-06-26 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045119#comment-14045119 ] Tilman Hausherr commented on TIKA-1300: --- My impression was that the NSP had better