Yonik Seeley wrote:
> 
> HTMLStripWhitespaceTokenizerFactory works in two phases...
> HTMLStripReader removes the HTML and passes the result to
> WhitespaceTokenizer... at that point, Tokens are generated, but the
> offsets will correspond to the text after HTML removal, not before.
> 
> I did it this way so that HTMLStripReader  could go before any
> tokenizer (like StandardTokenizer).
> 
> Can you open a JIRA bug for this?  The fix would be a special version
> of HTMLStripReader integrated with a WhitespaceTokenizer to keep
> offsets correct.
> 
> -Yonik
> 
> 
Is there a fix for this problem?

my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
highlighting still
doesn't work. All the wrong items are highlighted.
-- 
View this message in context: 
http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to