Hi Erick! Thanks for the reply. The goal is to get two character terms like 1a, 1b, 2a, 2b, 3a, etc. to get highlighted in the documents. Additional testing shows that any alpha-numeric combo returns a blank highlight, regardless of length. Thus, "pr0blem" will not highlight because of the zero in the middle of the term.
I came across a ServerFault article where it was suggested that the fieldType must be tokenized in order for highlighting to work correctly. Setting the field type to text_general was suggested as a solution. In my case the data is stored as a string fieldType, which is then copied using copyField to a field that has a fieldType of text_general, but I'm still not getting a good highlight on terms like "1a". Highlighting works for any other non-alpha-numeric term though. Other articles pointed to termVectors and termOffsets, but none of these seemed to help. Here's my config: <field name="contents" type="string" indexed="true" stored="true" termPositions="true" termVectors="true" termOffsets="true" /> <field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/> <copyField source="contents" dest="text"/> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" generateNumberParts="0" generateWordParts="0" /> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.ApostropheFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" generateNumberParts="0" generateWordParts="0" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.ApostropheFilterFactory"/> </analyzer> </fieldType> In the solrconfig file highlighting is set to use the text field: <str name="hl.fl">text</str> Thoughts? Appreciate the help! Thanks! -Teague -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, February 1, 2017 2:49 PM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos How far into the text field are these tokens? The highlighter defaults to the first 10K characters under control of hl.maxAnalyzedChars. It's vaguely possible that the values happen to be farther along in the text than that. Not likely, mind you but possible. Best, Erick On Wed, Feb 1, 2017 at 8:24 AM, Teague James <teag...@insystechinc.com> wrote: > Hello everyone! I'm still stuck on this issue and could really use > some help. I have a Solr 6.0.0 instance that is storing documents > peppered with text like "1a", "2e", "4c", etc. If I search the > documents for a word, "ms", "in", "the", etc., I get the correct > number of hits and the results are highlighted correctly in the > highlighting section. But when I search for "1a" or "2e" I get hits, > but the highlights are blank. Further testing revealed that the > highlighter fails to highlight any combination of alpha-numeric two character > value, such a n0, b1, 1z, etc.: > <result name="response" numFound="1" start="0"> ... > <lst name="highlighting"> > <lst name="8667"/> > > Where "8667" is the document ID of the record that had the hit, but no > highlight. Other searches, "ms" for example, return: > <result name="response" numFound="1" start="0"> ... > <lst name="highlighting"> > <lst name="8667"/> > <arr name="text"> > <str> > <em>MS</em> > </str> > </arr> > </lst> > </lst> > > Why does highlighting fail for "1a" type searches? Any help is appreciated! > Thanks! > > -Teague James >