[jira] [Created] (TIKA-1428) Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character

2014-09-25 Thread JIRA
Theodor Sjöstedt created TIKA-1428: -- Summary: Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character Key: TIKA-1428 URL: https://issues.apache.org/jira/browse/TIKA-1428

[jira] [Updated] (TIKA-1428) Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character

2014-09-25 Thread JIRA
[ https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Theodor Sjöstedt updated TIKA-1428: --- Attachment: TIKA-doc-footnotes-issue.png Original document to the left. TIKA 1.4 in Center

[jira] [Commented] (TIKA-1428) Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character

2014-09-25 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147880#comment-14147880 ] Hong-Thai Nguyen commented on TIKA-1428: Thanks [~theoettheo], any chance to have a

[jira] [Updated] (TIKA-1330) Add robust tika-batch code

2014-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1330: -- Attachment: TIKA-1330v1-patch.zip This is the first version of tika-batch. Much cleanup remains. This

[jira] [Comment Edited] (TIKA-1330) Add robust tika-batch code

2014-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121454#comment-14121454 ] Tim Allison edited comment on TIKA-1330 at 9/25/14 4:18 PM:

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2014-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147922#comment-14147922 ] Tim Allison commented on TIKA-1330: --- [~tilman], I leave it as an exercise to implement a

Re: Tika at ApacheCon Europe - 2 months time!

2014-09-25 Thread David Meikle
Hey Nick, On 22 Sep 2014, at 23:21, Nick Burch n...@apache.org wrote: It's only 2 months to go until ApacheCon Europe in Budapest. I'm simultaneously exciting by all the great Tika stuff going on, and worried by how many talks I need to finish writing... As usual for an ApacheCon, we've

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-25 Thread Vineet Ghatge (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148398#comment-14148398 ] Vineet Ghatge commented on TIKA-1423: - Pulling up the data and JAR file and trying to

[jira] [Commented] (TIKA-1415) PowerPoint2003 embedded with word. The embedded file can not be detected.

2014-09-25 Thread sunxingzhe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148528#comment-14148528 ] sunxingzhe commented on TIKA-1415: -- Atthachment is the correction results, please

[jira] [Comment Edited] (TIKA-1415) PowerPoint2003 embedded with word. The embedded file can not be detected.

2014-09-25 Thread sunxingzhe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148528#comment-14148528 ] sunxingzhe edited comment on TIKA-1415 at 9/26/14 2:44 AM: ---

Apache Tika - JSON?

2014-09-25 Thread Vineet Ghatge Hemantkumar
Hello all, I was wondering if there any in built parser to get help in conversion from XHTML to JSON. My research showed that there is one named org.apache.io.json which just one method implemented. Also, I tried GJSON library to do this, but it does not seem to work with Tika. Any suggestions