XHTMLContentHandler wont emit newline when html element matches ENDLINE set
---------------------------------------------------------------------------

                 Key: TIKA-889
                 URL: https://issues.apache.org/jira/browse/TIKA-889
             Project: Tika
          Issue Type: Bug
            Reporter: John Conwell


XHTMLContentHandler.endElement checks if the element is in the ENDLINE set to 
see if it should emit a newline.  The html elements in ENDLINE are all lower 
case, but the HtmlParser class uses the XHTMLDowngradeHandler handler to upper 
case all html elements.  This means that none of the html elements in the web 
page will match the elements in the ENDLINE set.  

This also is a problem with the INDENT set as well

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to