Yonik Seeley wrote: > > HTMLStripWhitespaceTokenizerFactory works in two phases... > HTMLStripReader removes the HTML and passes the result to > WhitespaceTokenizer... at that point, Tokens are generated, but the > offsets will correspond to the text after HTML removal, not before. > > I did it this way so that HTMLStripReader could go before any > tokenizer (like StandardTokenizer). > > Can you open a JIRA bug for this? The fix would be a special version > of HTMLStripReader integrated with a WhitespaceTokenizer to keep > offsets correct. > > -Yonik > > Is there a fix for this problem?
my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory + highlighting still doesn't work. All the wrong items are highlighted. -- View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253 Sent from the Solr - User mailing list archive at Nabble.com.