To select the whole string, I think you want hl.fragmenter=regex and to create a regex pattern for your entire strings:
http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighter+multi-valued This will let you select the entire string field. But I don't know how to avoid the non-matching prefixes. That's a really interesting quirk of highlighting. On Tue, Feb 9, 2010 at 6:18 AM, gwk <g...@eyefi.nl> wrote: > On 2/9/2010 2:57 PM, Ahmet Arslan wrote: >>> >>> I'm trying to improve the search box on our website by >>> adding an autosuggest field. The dataset is a set of >>> properties in the world (mostly europe) and the searchbox is >>> intended to be filled with a country-, region- or city name. >>> To do this I've created a separate, simple core with one >>> document per geographic location, for example the document >>> for the country "France" contains several fields including >>> the number of properties (so we can show the approximate >>> amount of results in the autosuggest box) and the name of >>> the country France in several languages and some other >>> bookkeeping information. The name of the property is stored >>> in two fields: "name" which simple contains the canonical >>> name of the country, region or city and "names" which is a >>> multivalued field containing the name in several different >>> languages. Both fields use an EdgeNGramFilter during >>> analysis so the query "Fr" can match "France". >>> >>> This all seems to work, the autosuggest box gives >>> appropriate suggestions. But when I turn on highlighting the >>> results are less than desirable, for example the query "rho" >>> using dismax (and hl.snippets=5) returns the >>> following: >>> >>> <lst name="5119"> >>> <arr name="names"> >>> <str><em>Rég</em>ion >>> Rhône-Alpes</str> >>> <str><em>Rhô</em>ne-Alpes</str> >>> <str><em>Rhô</em>ne-Alpes</str> >>> <str><em>Rhô</em>ne-Alpes</str> >>> <str><em>Rhô</em>ne-Alpes</str> >>> </arr> >>> <arr name="name"> >>> <str><em>Rég</em>ion >>> Rhône-Alpes</str> >>> </arr> >>> </lst> >>> <lst name="5440"> >>> <arr name="names"> >>> <str><em>Dép</em>artement du >>> Rhône</str> >>> <str><em>Dép</em>artement du >>> Rhône</str> >>> <str><em>Rhô</em>ne</str> >>> <str><em>Dép</em>artement du >>> Rhône</str> >>> <str><em>Rhô</em>ne</str> >>> </arr> >>> <arr name="name"> >>> <str><em>Dép</em>artement du >>> Rhône</str> >>> </arr> >>> </lst> >>> >>> As you can see, no matter where the match is, the first 3 >>> characters are highlighted. Obviously not correct for many >>> of the fields. Is this because of the NGramFilterFactory or >>> am I doing something wrong? >>> >> I used https://issues.apache.org/jira/browse/SOLR-357 for this sometime >> ago. It was giving correct highlights. >> > I just ran a test with the NGramFilter removed (and reindexing) which did > give correct highlighting results but I had to query using the whole word. > I'll try the PrefixingFilterFactory next although according to the comments > it's nothing but a subset of the EdgeNGramFilterFactory so unless I'm > configuring it wrong it should yield the same results... > >> However we are now using >> http://www.ajaxupdates.com/mootools-autocomplete-ajax-script/ It >> automatically makes bold matching characters without using solr >> highlighting. >> > Using a pure javascript based solution isn't really an option for us as that > wouldn't work for the diacritical marks without a lot of transliteration > brouhaha. > > Regards, > > gwk > -- Lance Norskog goks...@gmail.com