[ 
https://issues.apache.org/jira/browse/SOLR-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654437#action_12654437
 ] 

Dawid Weiss commented on SOLR-882:
----------------------------------

Argh, good catch, Grant. The entire patch is fine, with the exception of the 
main method. What you saw in there was a dump of entities that I had to make in 
order to test which entities are recognized in uppercase mode and which were 
not. Apologies that this slipped through somehow. Do you want me to remove this 
from the patch or can you simply disregard the fragment that applies to the 
main method? 

> HTMLStripReader improvement - padding corrected for hexadecimal entities, 
> option not to emit padding at all added
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-882
>                 URL: https://issues.apache.org/jira/browse/SOLR-882
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Grant Ingersoll
>            Priority: Trivial
>         Attachments: patch
>
>
> Improvements to HTMLStripHighlighter:
> - fix padding of hexadecimal entities (currently off by 1)
> - add an option not to emit padding at all. In certain applications padding 
> emitted after entities such as ó may split words that are in fact 
> single terms.
> - add entities that are recognized when written all in uppercase and 
> recognized by browsers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to