Reinhard Schwab created TIKA-1500: ------------------------------------- Summary: FeedParser extracts XML markup with BodyContentHandler Key: TIKA-1500 URL: https://issues.apache.org/jira/browse/TIKA-1500 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.6 Reporter: Reinhard Schwab Priority: Minor Fix For: 1.8
I am using FeedParser to extract text and links from feeds and have discovered, that the extracted text contains XML markup. Usually FeedParser strips markup from text when generating SAX events, but one line is missing it. The fix is trivial. I will provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)