Thanks Erick, indeed that was my problem and you helped me understand how hl 
component works, but still I cant understand how can I avoid storing all 
field’s variations? For example, if I need to support morphological search, I 
have 2 fields:
<field name="doc_text" type="text_general" indexed="true" stored="true"/>
<field name="doc_text_morph" type="custom_morphological_field" indexed="true" 
stored="false"/>
<copyField source="doc_text" dest="doc_text_morph"/>
Say we indexed the following doc:
{
    “doc_text”: “walking dead”
}
Following queries should match:
q = walking
q = walk
I am issuing edismax query with qf="doc_text^2 doc_text_morph” (boosts are 
currently missing) and add highlight params. ‘walk’ will be matched on 
doc_text_morph, but will only be highlighted iff doc_text_morph is stored (no 
match on stored field doc_text...). Is there any way to make it highlighted 
without also storing doc_text_morph field?
Thanks again...
 
 
 
 

Sent: Monday, June 08, 2020 at 3:39 PM
From: "Erick Erickson" <erickerick...@gmail.com>
To: solr-user@lucene.apache.org
Subject: Re: Highlighting values of non stored fields
When highlighting, the stored data for the field is re-analyzed against the 
query based on the field you’re highlighting. My bet is that if you query just 
“q=doc_text:mosh” you will not get a hit. Check your text_ws fieldType, it’s 
probably case sensitive. So if you changed the doc_text type to text_general 
(the same as your dynamic field), I think you’d be fine. re-index your data of 
course….

I’ll add by-the-by that text_ws is a fairly restricted, and is rarely useful 
for searching on anything humans have to key in. It’ll include punctuation for 
instance, i.e. input like “dog dog.” will produce two tokens, one with a period 
in the token and one without. It’s most useful for heavily-preprocessed data 
where the app normalizes the input or machine-generated input.

There’s no reason, BTW, to index your doc_text for highlighting purposes since 
the stored data is what counts. Unless, of course, you want to search on that 
field specifically.

Best,
Erick

> On Jun 7, 2020, at 11:32 PM, mosh bla <moshe...@mail.com> wrote:
>
>
> Thanks Erick for the reply. Your answer is eaxctly what I was expecting from 
> the highlight component but it seems like I am getting different behaviour.
> I'll try to give a simple example and I hope you can explain where is my 
> mistake.
> Say I have the following fields configuration:
> <field name="doc_text" type="text_ws" indexed="true" stored="true"/>
> <dynamicField name="*_lw" type="text_general" indexed="true" stored="false"/>
> <copyField source="doc_text" dest="doc_text_lw"/>
>
> And I indexed the following document:
> {
> "doc_text": "MOSH"
> }
>
> When executing the following query 
> "http://.../select?q=doc_text_lw:mosh&hl=true&hl.fl=doc_text"; - the document 
> is matched and returned in response, but the highlighed fragment is empty.
> I also tried to change 'hl.method' param to 'unified' and 'fastVector' but no 
> luck either. My conclusion was that 'hl.fl' param should be set to 
> 'doc_text_lw' and it must be also stored...
>
>
>
>
> Sent: Tuesday, June 02, 2020 at 3:15 PM
> From: "Erick Erickson" <erickerick...@gmail.com>
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting values of non stored fields
> Why do you think even variants need to be stored/highlighted? Usually
> when you store variants for ranking purposes those extra copies are
> invisible to the user. So most often people store exactly one copy
> of a particular field and highlight _that_ field in the return.
>
> So say my field is f1 and I have indexed f1_1, f1_2, f1_3. I just store
> f1_1 and return the highlighted text from that one.
>
> You could even just stored the data only once in a field that’s never
> indexed and return/highlight that if you wanted.
>
> Best,
> Erick
>
>> On Jun 2, 2020, at 3:24 AM, mosheB <moshe...@mail.com> wrote:
>>
>> Our use case is as follow:
>> We are indexing free text documents. Each document contains metadata fields
>> (such as author, creation date...) which are kinda small, and one "big"
>> field that holds the document's text itself.
>>
>> For ranking purpose each field is indexed in more then one "variation" and
>> query is executed with edismax query parser. Things are working alright, but
>> now a new feature is requested by the customer - highlighting.
>> To enable highlighting every field must be stored, including all variations
>> of the big text field. This pushes our storage to the limit (and probably
>> the document cache...) and feels a bit redundant, as the stored value is
>> duplicated n times... Is there any way to “reference” stored value from one
>> field to another?
>> For example:
>> Say we have the following config:
>> <dynamicField name="*_bigrams” type="bigrams” indexed="true” stored="false”
>> />
>> <dynamicField name="*_phrases” type="phrases” indexed="true” stored="false”
>> />
>>
>> <field name="doc_text” type="text_general” indexed="true” stored="true” />
>> <copyField source="doc_text” dest="doc_text_bigrams” />
>> <copyField source="doc_text” dest="doc_text_phrases” />
>>
>> And we execute the following query:
>> http://.../select?defType=edismax&q=desired_terms&qf=doc_text^2
>> doc_text_bigrams^3
>> doc_text_phrases^4&hl=on&hl.fl=doc_text,doc_text_bigrams,doc_text_phrases
>>
>> Highlight fragments in response will be blank if match occurred on the
>> non-stored fields (doc_text_bigrams or doc_text_phrases). Is it possible to
>> pass extra parameter to the highlight component, to point it to the stored
>> data of the “original” doc_text field? a kind of “stored value reference
>> field”?
>>
>> Thanks in advance.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
 

Reply via email to