[ https://issues.apache.org/jira/browse/SOLR-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654437#action_12654437 ]
Dawid Weiss commented on SOLR-882: ---------------------------------- Argh, good catch, Grant. The entire patch is fine, with the exception of the main method. What you saw in there was a dump of entities that I had to make in order to test which entities are recognized in uppercase mode and which were not. Apologies that this slipped through somehow. Do you want me to remove this from the patch or can you simply disregard the fragment that applies to the main method? > HTMLStripReader improvement - padding corrected for hexadecimal entities, > option not to emit padding at all added > ----------------------------------------------------------------------------------------------------------------- > > Key: SOLR-882 > URL: https://issues.apache.org/jira/browse/SOLR-882 > Project: Solr > Issue Type: Improvement > Reporter: Dawid Weiss > Assignee: Grant Ingersoll > Priority: Trivial > Attachments: patch > > > Improvements to HTMLStripHighlighter: > - fix padding of hexadecimal entities (currently off by 1) > - add an option not to emit padding at all. In certain applications padding > emitted after entities such as ó may split words that are in fact > single terms. > - add entities that are recognized when written all in uppercase and > recognized by browsers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.