[jira] [Updated] (TIKA-713) Tika can not parse all of the persian pdf files

2011-09-13 Thread Ahmad Ajiloo (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmad Ajiloo updated TIKA-713: -- Attachment: ebrat.pdf this is a persian pdf file that Tika can't parse it. Tika can not parse all of

[jira] [Commented] (TIKA-713) Tika can not parse all of the persian pdf files

2011-09-13 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103394#comment-13103394 ] Robert Muir commented on TIKA-713: -- Thanks Ahmad... I took a look at this PDF and I suspect

[jira] [Updated] (TIKA-708) NPE Parsing MS Word 12.0.0

2011-09-13 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy updated TIKA-708: -- Comment: was deleted (was: This bug required additional commit to Tika, r1169702. ) NPE

[jira] [Commented] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.

2011-09-13 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103469#comment-13103469 ] Nick Burch commented on TIKA-431: - Any chance someone could work up a failing unit test for

[jira] [Commented] (TIKA-712) Master slide text isn't extracted

2011-09-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103679#comment-13103679 ] Michael McCandless commented on TIKA-712: - OK I opened

[jira] [Updated] (TIKA-712) Master slide text isn't extracted

2011-09-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-712: Attachment: testPPT_masterFooter2.pptx testPPT_masterFooter2.ppt Corrected

[jira] [Created] (TIKA-714) Word art isn't extracted for various doc types

2011-09-13 Thread Michael McCandless (JIRA)
Word art isn't extracted for various doc types -- Key: TIKA-714 URL: https://issues.apache.org/jira/browse/TIKA-714 Project: Tika Issue Type: Bug Reporter: Michael McCandless