Arturas:

Try to field-qualify your hl.q parameter. That looks like:

hl.q=trans:Kundigung
or
hl.q=trans:Kündigung

I saw the exact behavior you describe when I did _not_ specify the
field in the hl.q parameter, i.e.

hl.q=Kundigung
or
hl.q=Kündigung

didn't show all highlights.

But when I did specify the field, it worked.

Here's what I think is happening: Solr uses the default search
field when parsing an un-field-qualified query. I.e.

q=something

is parsed as

q=default_search_field:something.

The default field is controlled in solrconfig.xml with the "df"
parameter, you'll see entries like:
<str name="df">my_field</str>

Also when I changed the "df" parameter to the field I was highlighting
on, I didn't need to specify the field on the hl.q parameter.

hl.q=Kundigung
or
hl.q=Kündigung

The default  field is usually "text", which knows nothing about
the German-specific filters you've applied unless you changed it.

So in the absence of a field-qualification for the hl.q parameter Solr
was parsing the query according to the analysis chain specifed
in your default field, and probably passed ü through without
transforming it. Since your indexing analysis chain for that field
folded ü to just plain u, it wasn't found or highlighted.

On the surface, this does seem like something that should be
changed, I'll go ahead and ping the dev list.

NOTE: I was trying this on Solr 7.1

Best,
Erick

On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika <maze...@gmail.com> wrote:
> Hi Erick,
>
> Thanks for the update and the infos. Your post brought quite a bit of light
> into the picture and now I understand quite a bit more about what you are
> saying. Your explanation makes sense and can be quite useful in certain
> scenarious.
>
> What stroke me from your description is that you are saying that the
> analyzer-chain needs to be applied for the highlighting queries as well.
> The tragedy is that I am not able to get this for a german collection: if
> the query is set (no explicit highlighting query), the highlighting is
> correct. It is also correct, if I replace the umaults into the
> corresponding latin chars. Getting the analyzer chain for the highlighting
> terms remains the challenge.
>
> Do you think you have a look at the following stakoverflow link? Maybe
> something comes to your mind...
>
> *https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted
> <https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted>*
>
> *Cheers,*
>
> *Arturas*
> On Fri, Mar 23, 2018, 17:43 Erick Erickson <erickerick...@gmail.com> wrote:
>
>> bq: this is not a typical case that one searches for a keyword but
>> highlights something else
>>
>> This isn't really an unusual case, apparently I mislead you.
>>
>> What I was trying to convey is that the analysis chain used is firmly
>> attached to a particular _field_. There's no way to say "use one
>> analysis chain for the query and another for highlighting on the
>> _same_ field".
>>
>> You can use two different fields with different analysis chains, one
>> for each purpose. So something like
>>
>> q=f1:something&hl.fl=f2,f3&hl.q=other
>>
>> is certainly reasonable. It'll search for "something" in f1, and
>> highlight "other" in f2 and f3
>>
>> Each fields processes its input with the analysis chain defined in the
>> schema.
>>
>> The rest about stored="true" can be ignored, it's just me wandering
>> off into the weeds about an optimization that only stores the data
>> once rather than redundantly in multiple fields.
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika <maze...@gmail.com>
>> wrote:
>> > Hi Mathesis (Stefan),
>> >
>> > Thanks for the questions. This made me look at the problem from a
>> distance
>> > and re-frame the situation. Good questions indeed.
>> >
>> > Trying to go around: consider a user who describes herself as being a BMW
>> > fan, being convinced that all BMW need to be the blackest color possible
>> > (for a sake of argument) who would like to search and later browse the
>> > entries in the discussion forum (of course not everything but BMW of the
>> > blackest color), and what interest her are the snippets that have
>> > understood, craziest as keywords or the like (because she is looking for
>> a
>> > dozen of discussions that she saw before).
>> >
>> > What I was not able to achieve so far is: (i) combine query term for
>> > filtering and highlighting, (ii) using the analyzer-chain from the
>> > attribute to rewrite the highlight query (or define one in the search)
>> >
>> > CTR+F technique is a very powerful one, indeed. Works most of the time.
>> The
>> > difficulties with it are query rewriting, enriching, etc.
>> >
>> > Cheers,
>> > Arturas
>> >
>> > On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis <
>> matheis.ste...@gmail.com>
>> > wrote:
>> >
>> >> Perhaps we try it the other way round .. what's your use case for this?
>> I'm
>> >> trying to think of a situation where I'd need this a as user?
>> >>
>> >> The only reason I see myself doing this is CTRL+F in a page when the
>> search
>> >> result is not  immediately visible for me ;)
>> >>
>> >> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <maze...@gmail.com> wrote:
>> >>
>> >> > Hi Erick et al,
>> >> >
>> >> > From your answer I understand that this is not a typical case that one
>> >> > searches for a keyword but highlights something else. Since we have
>> two
>> >> > parameters (q vs hl.q) I thought they are freely combinable. From your
>> >> > answer I understand that this is not really the case. My current
>> >> > understanding came from [1] that says:
>> >> >
>> >> > hl.q
>> >> >
>> >> > A query to use for highlighting. This parameter allows you to
>> highlight
>> >> > different terms than those being used to retrieve documents.
>> >> > what I hear from you is something different: i.e., that this is not
>> >> enough
>> >> > just to combine the q with hl.q, that there are caveats to achieve the
>> >> task
>> >> > (multiple fields, FastVectorHighlighter).
>> >> >
>> >> > Your infos are very helpful.
>> >> >
>> >> > Cheers,
>> >> > Arturas
>> >> >
>> >> > [1]  https://lucene.apache.org/solr/guide/7_2/highlighting.html
>> >> >
>> >> > On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <
>> erickerick...@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> > > Basically you need to use a copyField, but in several variants:
>> >> > >
>> >> > > If you use the field _exclusively_ for highlighting then store the
>> raw
>> >> > > content there and have the field use whatever analyzer you want. You
>> >> > > do _not_ need to have indexed="true" set for the field if you're
>> >> > > highlighting on the fly. So you're searching against field1 (which
>> has
>> >> > > indexed="true" stored="false" set) but highlighting against field2
>> >> > > (which has indexed="false" stored="true" set). Of course any time
>> you
>> >> > > want to return the contents in a doc your fl needs to specify
>> >> > > field2...
>> >> > >
>> >> > > The above does not bloat your index at all since the cost of
>> >> > > stored="true" indexed="true" is the same as if you use two fields,
>> >> > > each with only one option turned on.
>> >> > >
>> >> > > The second approach if you want to use FastVectorHighlighter or the
>> >> > > like is simply to index both fields.
>> >> > >
>> >> > > Best,
>> >> > > Erick
>> >> > >
>> >> > > On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <maze...@gmail.com
>> >
>> >> > > wrote:
>> >> > > > Hi Solr-Users,
>> >> > > >
>> >> > > > I've been playing with a german collection of documents, where I
>> >> tried
>> >> > to
>> >> > > > search for one word (q=Tag) and highlighted another:
>> >> (hl.q=Kundigung).
>> >> > Is
>> >> > > > this a "legal" use case? My key question is how can I tell solr
>> which
>> >> > > query
>> >> > > > analyzer to use for highlighting? Strictly speaking, I should use
>> >> > > > hl.q=Kündigung to conceptually look for relevant information, but
>> in
>> >> > this
>> >> > > > case, no highlighting is returned (as all umlauts are left out in
>> the
>> >> > > > index) .
>> >> > > >
>> >> > > > Additional infos:
>> >> > > >
>> >> > > > solr version: 7.2
>> >> > > > urls to query:
>> >> > > >
>> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1
>> >> > > >
>> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> > > true&hl.fl=trans&hl.q=K%C3%BCndigung&hl.snippets=3&wt=xml&rows=1
>> >> > > > <http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1>
>> >> > > >
>> >> > > > Managed-schema:
>> >> > > >
>> >> > > >   <fieldType name="text_de" class="solr.TextField"
>> >> > > positionIncrementGap="100">
>> >> > > >     <analyzer>
>> >> > > >       <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> > > >       <filter class="solr.LowerCaseFilterFactory"/>
>> >> > > >       <filter class="solr.StopFilterFactory" format="snowball"
>> >> > > > words="lang/stopwords_de.txt" ignoreCase="true"/>
>> >> > > >       <filter class="solr.GermanNormalizationFilterFactory"/>
>> >> > > >       <filter class="solr.GermanLightStemFilterFactory"/>
>> >> > > >     </analyzer>
>> >> > > >   </fieldType>
>> >> > > >
>> >> > > >
>> >> > > > Other additional infos:
>> >> > > > https://stackoverflow.com/questions/49276093/solr-
>> >> > > highlighting-terms-with-umlaut-not-found-not-highlighted
>> >> > > >
>> >> > > > Cheers,
>> >> > > > Arturas
>> >> > >
>> >> >
>> >>
>>

Reply via email to