To select the whole string, I think you want hl.fragmenter=regex and
to create a regex pattern for your entire strings:

http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighter+multi-valued

This will let you select the entire string field. But I don't know how
to avoid the non-matching prefixes. That's a really interesting quirk
of highlighting.

On Tue, Feb 9, 2010 at 6:18 AM, gwk <g...@eyefi.nl> wrote:
> On 2/9/2010 2:57 PM, Ahmet Arslan wrote:
>>>
>>> I'm trying to improve the search box on our website by
>>> adding an autosuggest field. The dataset is a set of
>>> properties in the world (mostly europe) and the searchbox is
>>> intended to be filled with a country-, region- or city name.
>>> To do this I've created a separate, simple core with one
>>> document per geographic location, for example the document
>>> for the country "France" contains several fields including
>>> the number of properties (so we can show the approximate
>>> amount of results in the autosuggest box) and the name of
>>> the country France in several languages and some other
>>> bookkeeping information. The name of the property is stored
>>> in two fields: "name" which simple contains the canonical
>>> name of the country, region or city and "names" which is a
>>> multivalued field containing the name in several different
>>> languages. Both fields use an EdgeNGramFilter during
>>> analysis so the query "Fr" can match "France".
>>>
>>> This all seems to work, the autosuggest box gives
>>> appropriate suggestions. But when I turn on highlighting the
>>> results are less than desirable, for example the query "rho"
>>> using dismax  (and hl.snippets=5) returns the
>>> following:
>>>
>>> <lst name="5119">
>>> <arr name="names">
>>> <str><em>Rég</em>ion
>>> Rhône-Alpes</str>
>>> <str><em>Rhô</em>ne-Alpes</str>
>>> <str><em>Rhô</em>ne-Alpes</str>
>>> <str><em>Rhô</em>ne-Alpes</str>
>>> <str><em>Rhô</em>ne-Alpes</str>
>>> </arr>
>>> <arr name="name">
>>> <str><em>Rég</em>ion
>>> Rhône-Alpes</str>
>>> </arr>
>>> </lst>
>>> <lst name="5440">
>>> <arr name="names">
>>> <str><em>Dép</em>artement du
>>> Rhône</str>
>>> <str><em>Dép</em>artement du
>>> Rhône</str>
>>> <str><em>Rhô</em>ne</str>
>>> <str><em>Dép</em>artement du
>>> Rhône</str>
>>> <str><em>Rhô</em>ne</str>
>>> </arr>
>>> <arr name="name">
>>> <str><em>Dép</em>artement du
>>> Rhône</str>
>>> </arr>
>>> </lst>
>>>
>>> As you can see, no matter where the match is, the first 3
>>> characters are highlighted. Obviously not correct for many
>>> of the fields. Is this because of the NGramFilterFactory or
>>> am I doing something wrong?
>>>
>> I used https://issues.apache.org/jira/browse/SOLR-357 for this sometime
>> ago. It was giving correct highlights.
>>
> I just ran a test with the NGramFilter removed (and reindexing) which did
> give correct highlighting results but I had to query using the whole word.
> I'll try the PrefixingFilterFactory next although according to the comments
> it's nothing but a subset of the EdgeNGramFilterFactory so unless I'm
> configuring it wrong it should yield the same results...
>
>> However we are now using
>> http://www.ajaxupdates.com/mootools-autocomplete-ajax-script/ It
>> automatically makes bold matching characters without using solr
>> highlighting.
>>
> Using a pure javascript based solution isn't really an option for us as that
> wouldn't work for the diacritical marks without a lot of transliteration
> brouhaha.
>
> Regards,
>
> gwk
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to