Re: Exact match

Ere Maijala Wed, 04 Dec 2019 05:04:42 -0800

Hi,

Here's our example of exact match fields:


https://github.com/NatLibFi/finna-solr/blob/master/vufind/biblio/conf/schema.xml#L48

textProper_l requires a partial match from the beginning. textProper_lr
requires a full match. I'm not sure if this works for you, but at least
we have this creative use of PathHierarchyTokenizerFactory allowing the
left-anchored search.

HTH,
Ere

Paras Lehana kirjoitti 3.12.2019 klo 13.49:
> Hi Omer,
> 
> If you mean exact match with same number of words (Emir's), you can also
> add an identifier in the beginning and end of the some other field like
> title_exact. This can be done in your indexing script or using Pattern
> Replace. During query side, you can use this identifier. For example,
> indexing "united states" with "exactStart united states exactEnd" and
> querying with the same. Obviously, you can have scoring issues here so only
> use if you want it to debug or retrieve docs.
> 
> Just adding to the all possible ways. *Anyways, I like the Keyword method.*
> 
> On Tue, 3 Dec 2019 at 03:59, Erick Erickson <erickerick...@gmail.com> wrote:
> 
>> There are two different interpretations of “exact match” going on here,
>> don’t be confused!
>>
>> Emir’s version is “the text has to match the _entire_ input. So a field
>> with “a b c d” will NOT match “a b” or “a b c” or “b c", but only “a b c d”.
>>
>> David’s version is “The text has to contain some sequence of words that
>> exactly matches my query”, so a field with “a b c d” _would_ match “a b”,
>> “a b c”, “a b c d”, “b c”, “c d”, etc.
>>
>> Both are entirely valid use-cases, depending on what you mean by “exact
>> match"
>>
>> Best,
>> Erick
>>
>>> On Dec 2, 2019, at 4:38 PM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>>
>>> Hi Omer,
>>> From performance perspective, it is the best if you index title as a
>> single token: KeywordTokenizer + LowerCaseFilter
>>>
>>> If you need to query that field in some other way, you can index it
>> differently as some other field using copyField.
>>>
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>
>>>
>>>
>>>> On 2 Dec 2019, at 21:43, OTH <omer.t....@gmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> What would be the best way to get exact matches (if any) to a query?
>>>>
>>>> E.g.:  Let's the document text is:  "united states of america".
>>>> Currently, any query containing one or more of the three words "united",
>>>> "states", or "america" will match with the above document.  I would
>> like a
>>>> way so that the document matches only and only if the query were also
>>>> "united states of america" (case-insensitive).
>>>>
>>>> Document field type:  TextField
>>>> Index Analyzer: TokenizerChain
>>>> Index Tokenizer: StandardTokenizerFactory
>>>> Index Token Filters: StopFilterFactory, LowerCaseFilterFactory,
>>>> SnowballPorterFilterFactory
>>>> The Query Analyzer / Tokenizer / Token Filters are the same as the Index
>>>> ones above.
>>>>
>>>> FYI I'm relatively novice at Solr / Lucene / Search.
>>>>
>>>> Much appreciated
>>>> Omer
>>>
>>
>>
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Exact match

Reply via email to