Arturas: Try to field-qualify your hl.q parameter. That looks like:
hl.q=trans:Kundigung or hl.q=trans:Kündigung I saw the exact behavior you describe when I did _not_ specify the field in the hl.q parameter, i.e. hl.q=Kundigung or hl.q=Kündigung didn't show all highlights. But when I did specify the field, it worked. Here's what I think is happening: Solr uses the default search field when parsing an un-field-qualified query. I.e. q=something is parsed as q=default_search_field:something. The default field is controlled in solrconfig.xml with the "df" parameter, you'll see entries like: <str name="df">my_field</str> Also when I changed the "df" parameter to the field I was highlighting on, I didn't need to specify the field on the hl.q parameter. hl.q=Kundigung or hl.q=Kündigung The default field is usually "text", which knows nothing about the German-specific filters you've applied unless you changed it. So in the absence of a field-qualification for the hl.q parameter Solr was parsing the query according to the analysis chain specifed in your default field, and probably passed ü through without transforming it. Since your indexing analysis chain for that field folded ü to just plain u, it wasn't found or highlighted. On the surface, this does seem like something that should be changed, I'll go ahead and ping the dev list. NOTE: I was trying this on Solr 7.1 Best, Erick On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika <maze...@gmail.com> wrote: > Hi Erick, > > Thanks for the update and the infos. Your post brought quite a bit of light > into the picture and now I understand quite a bit more about what you are > saying. Your explanation makes sense and can be quite useful in certain > scenarious. > > What stroke me from your description is that you are saying that the > analyzer-chain needs to be applied for the highlighting queries as well. > The tragedy is that I am not able to get this for a german collection: if > the query is set (no explicit highlighting query), the highlighting is > correct. It is also correct, if I replace the umaults into the > corresponding latin chars. Getting the analyzer chain for the highlighting > terms remains the challenge. > > Do you think you have a look at the following stakoverflow link? Maybe > something comes to your mind... > > *https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted > <https://stackoverflow.com/questions/49276093/solr-highlighting-terms-with-umlaut-not-found-not-highlighted>* > > *Cheers,* > > *Arturas* > On Fri, Mar 23, 2018, 17:43 Erick Erickson <erickerick...@gmail.com> wrote: > >> bq: this is not a typical case that one searches for a keyword but >> highlights something else >> >> This isn't really an unusual case, apparently I mislead you. >> >> What I was trying to convey is that the analysis chain used is firmly >> attached to a particular _field_. There's no way to say "use one >> analysis chain for the query and another for highlighting on the >> _same_ field". >> >> You can use two different fields with different analysis chains, one >> for each purpose. So something like >> >> q=f1:something&hl.fl=f2,f3&hl.q=other >> >> is certainly reasonable. It'll search for "something" in f1, and >> highlight "other" in f2 and f3 >> >> Each fields processes its input with the analysis chain defined in the >> schema. >> >> The rest about stored="true" can be ignored, it's just me wandering >> off into the weeds about an optimization that only stores the data >> once rather than redundantly in multiple fields. >> >> Best, >> Erick >> >> On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika <maze...@gmail.com> >> wrote: >> > Hi Mathesis (Stefan), >> > >> > Thanks for the questions. This made me look at the problem from a >> distance >> > and re-frame the situation. Good questions indeed. >> > >> > Trying to go around: consider a user who describes herself as being a BMW >> > fan, being convinced that all BMW need to be the blackest color possible >> > (for a sake of argument) who would like to search and later browse the >> > entries in the discussion forum (of course not everything but BMW of the >> > blackest color), and what interest her are the snippets that have >> > understood, craziest as keywords or the like (because she is looking for >> a >> > dozen of discussions that she saw before). >> > >> > What I was not able to achieve so far is: (i) combine query term for >> > filtering and highlighting, (ii) using the analyzer-chain from the >> > attribute to rewrite the highlight query (or define one in the search) >> > >> > CTR+F technique is a very powerful one, indeed. Works most of the time. >> The >> > difficulties with it are query rewriting, enriching, etc. >> > >> > Cheers, >> > Arturas >> > >> > On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis < >> matheis.ste...@gmail.com> >> > wrote: >> > >> >> Perhaps we try it the other way round .. what's your use case for this? >> I'm >> >> trying to think of a situation where I'd need this a as user? >> >> >> >> The only reason I see myself doing this is CTRL+F in a page when the >> search >> >> result is not immediately visible for me ;) >> >> >> >> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <maze...@gmail.com> wrote: >> >> >> >> > Hi Erick et al, >> >> > >> >> > From your answer I understand that this is not a typical case that one >> >> > searches for a keyword but highlights something else. Since we have >> two >> >> > parameters (q vs hl.q) I thought they are freely combinable. From your >> >> > answer I understand that this is not really the case. My current >> >> > understanding came from [1] that says: >> >> > >> >> > hl.q >> >> > >> >> > A query to use for highlighting. This parameter allows you to >> highlight >> >> > different terms than those being used to retrieve documents. >> >> > what I hear from you is something different: i.e., that this is not >> >> enough >> >> > just to combine the q with hl.q, that there are caveats to achieve the >> >> task >> >> > (multiple fields, FastVectorHighlighter). >> >> > >> >> > Your infos are very helpful. >> >> > >> >> > Cheers, >> >> > Arturas >> >> > >> >> > [1] https://lucene.apache.org/solr/guide/7_2/highlighting.html >> >> > >> >> > On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson < >> erickerick...@gmail.com >> >> > >> >> > wrote: >> >> > >> >> > > Basically you need to use a copyField, but in several variants: >> >> > > >> >> > > If you use the field _exclusively_ for highlighting then store the >> raw >> >> > > content there and have the field use whatever analyzer you want. You >> >> > > do _not_ need to have indexed="true" set for the field if you're >> >> > > highlighting on the fly. So you're searching against field1 (which >> has >> >> > > indexed="true" stored="false" set) but highlighting against field2 >> >> > > (which has indexed="false" stored="true" set). Of course any time >> you >> >> > > want to return the contents in a doc your fl needs to specify >> >> > > field2... >> >> > > >> >> > > The above does not bloat your index at all since the cost of >> >> > > stored="true" indexed="true" is the same as if you use two fields, >> >> > > each with only one option turned on. >> >> > > >> >> > > The second approach if you want to use FastVectorHighlighter or the >> >> > > like is simply to index both fields. >> >> > > >> >> > > Best, >> >> > > Erick >> >> > > >> >> > > On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <maze...@gmail.com >> > >> >> > > wrote: >> >> > > > Hi Solr-Users, >> >> > > > >> >> > > > I've been playing with a german collection of documents, where I >> >> tried >> >> > to >> >> > > > search for one word (q=Tag) and highlighted another: >> >> (hl.q=Kundigung). >> >> > Is >> >> > > > this a "legal" use case? My key question is how can I tell solr >> which >> >> > > query >> >> > > > analyzer to use for highlighting? Strictly speaking, I should use >> >> > > > hl.q=Kündigung to conceptually look for relevant information, but >> in >> >> > this >> >> > > > case, no highlighting is returned (as all umlauts are left out in >> the >> >> > > > index) . >> >> > > > >> >> > > > Additional infos: >> >> > > > >> >> > > > solr version: 7.2 >> >> > > > urls to query: >> >> > > > >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl= >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1 >> >> > > > >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl= >> >> > > true&hl.fl=trans&hl.q=K%C3%BCndigung&hl.snippets=3&wt=xml&rows=1 >> >> > > > <http://localhost:8983/solr/trans/select?q=trans:Zeit&hl= >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1> >> >> > > > >> >> > > > Managed-schema: >> >> > > > >> >> > > > <fieldType name="text_de" class="solr.TextField" >> >> > > positionIncrementGap="100"> >> >> > > > <analyzer> >> >> > > > <tokenizer class="solr.StandardTokenizerFactory"/> >> >> > > > <filter class="solr.LowerCaseFilterFactory"/> >> >> > > > <filter class="solr.StopFilterFactory" format="snowball" >> >> > > > words="lang/stopwords_de.txt" ignoreCase="true"/> >> >> > > > <filter class="solr.GermanNormalizationFilterFactory"/> >> >> > > > <filter class="solr.GermanLightStemFilterFactory"/> >> >> > > > </analyzer> >> >> > > > </fieldType> >> >> > > > >> >> > > > >> >> > > > Other additional infos: >> >> > > > https://stackoverflow.com/questions/49276093/solr- >> >> > > highlighting-terms-with-umlaut-not-found-not-highlighted >> >> > > > >> >> > > > Cheers, >> >> > > > Arturas >> >> > > >> >> > >> >> >>