RTF parser fails to extract the body ------------------------------------ Key: TIKA-748 URL: https://issues.apache.org/jira/browse/TIKA-748 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.10 Reporter: Andrzej Bialecki
Using tika-app I'm getting the following result of parsing the attached document: {noformat} <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="subject" content="tests"/> <meta name="Content-Length" content="2235"/> <meta name="comment" content="StarWriter"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/> <meta name="X-Parsed-By" content="org.apache.tika.parser.rtf.RTFParser"/> <meta name="Content-Type" content="application/rtf"/> <meta name="resourceName" content="test.rtf"/> <title>test rft document</title> </head> <body/></html> {noformat} The expected result would be a non-empty body containing the text "The quick brown fox jumps over the lazy dog ". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira