[jira] [Created] (TIKA-710) Make the Tika facade implement the Parser and Detector interfaces

2011-09-09 Thread Jukka Zitting (JIRA)
Make the Tika facade implement the Parser and Detector interfaces - Key: TIKA-710 URL: https://issues.apache.org/jira/browse/TIKA-710 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-710) Expose the Parser and Detector instances within the Tika facade

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-710: --- Summary: Expose the Parser and Detector instances within the Tika facade (was: Make the Tika facade im

[jira] [Commented] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101077#comment-13101077 ] Jukka Zitting commented on TIKA-704: Thanks! I added the test cases in revision 1167052.

[jira] [Resolved] (TIKA-710) Expose the Parser and Detector instances within the Tika facade

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-710. Resolution: Fixed Done in revision 1167051. > Expose the Parser and Detector instances within the Ti

[jira] [Commented] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101087#comment-13101087 ] Jukka Zitting commented on TIKA-704: Hmm, there was still a hidden copy of the Yamaha ma

[jira] [Created] (TIKA-711) Word parser doesn't extract optional hyphen correctly

2011-09-09 Thread Michael McCandless (JIRA)
Word parser doesn't extract optional hyphen correctly - Key: TIKA-711 URL: https://issues.apache.org/jira/browse/TIKA-711 Project: Tika Issue Type: Bug Components: parser

[jira] [Updated] (TIKA-711) Word parser doesn't extract optional hyphen correctly

2011-09-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-711: Attachment: testOptionalHyphen.rtf testOptionalHyphen.pptx tes

[jira] [Commented] (TIKA-711) Word parser doesn't extract optional hyphen correctly

2011-09-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101323#comment-13101323 ] Michael McCandless commented on TIKA-711: - The WordExtractor seems to receive ASCII

index video and image format with nutch 1.3?

2011-09-09 Thread hadi
when i want to index video file with nutch 1.3 i get the following error : *Error parsing: file:///D:/film.avi: failed(2,0): Can't retrieve Tika parser for mime-type video/x-msvideo* (also it is the same error for images file) and in hadoop log the detail error is: *parse.ParserFactory - Par