[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409094#comment-13409094 ] Michael McCandless commented on TIKA-948: - bq. However, it doesn't look like it stor

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-08 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409024#comment-13409024 ] Nick Burch commented on TIKA-948: - If someone feels keen, we could add CompObj decoding. Whe

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409002#comment-13409002 ] Michael McCandless commented on TIKA-948: - Thanks for taking this Nick! Can you add

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408436#comment-13408436 ] Nick Burch commented on TIKA-948: - I think r1358467 should fix the file extension problem in

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408300#comment-13408300 ] Nick Burch commented on TIKA-948: - I've had a go at fixing this in r1358404, using a slightl

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-06 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408180#comment-13408180 ] Alex Ott commented on TIKA-948: --- Maybe you also reuse information from prop stream nearby of C

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408142#comment-13408142 ] Nick Burch commented on TIKA-948: - Should we not just pass the bytes to a Detector if we hav