For a simple illustration of Charlie's point and a side bonus on the 78 reasons 
to use the ICUFoldingFilter if you happen to be processing Arabic script 
languages, see slides 31-33:

https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf
 

-----Original Message-----
From: Charlie Hull [mailto:char...@flax.co.uk] 
Sent: Thursday, March 29, 2018 9:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Query redg : diacritics in keyword search

On 29/03/2018 14:12, Peter Lancaster wrote:
> Hi,
> 
> You don't say whether the AsciiFolding filter is at index time or query time. 
> In any case you can easily look at what's happening using the admin analysis 
> tool which helpfully will even highlight where the analysed query and index 
> token match.
> 
> That said I'd expect what you want to work if you simply use <filter 
> class="solr.ASCIIFoldingFilterFactory"/> on both index and query.

Simply put:

You use the filter at indexing time to collapse any variants of a term into a 
single variant, which is then stored in your index.

You use the filter at query time to collapse any variants of a term that users 
type into a single variant, and if this exists in your index you get a match.

If you don't use the same filter at both ends you won't get a match.

Cheers

Charlie

> 
> Cheers,
> Peter.
> 
> -----Original Message-----
> From: Paul, Lulu [mailto:lulu.p...@bl.uk]
> Sent: 29 March 2018 12:03
> To: solr-user@lucene.apache.org
> Subject: Query redg : diacritics in keyword search
> 
> Hi,
> 
> The keyword search Carré  returns values Carré and Carre (this works 
> well as I added the tokenizer <filter 
> class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/> in 
> the schema config to enable returning of both sets of values)
> 
> Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
> work. Solr only returns Carre) – any ideas on how this scenario can be 
> achieved?
> 
> Thanks & Best Regards,
> Lulu Paul
> 
> 
> 
> **********************************************************************
> ********************************************
> Experience the British Library online at www.bl.uk<http://www.bl.uk/> 
> The British Library’s latest Annual Report and Accounts : 
> www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/in
> dex.html> Help the British Library conserve the world's knowledge. 
> Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
> The Library's St Pancras site is WiFi - enabled
> **********************************************************************
> *******************************************
> The information contained in this e-mail is confidential and may be legally 
> privileged. It is intended for the addressee(s) only. If you are not the 
> intended recipient, please delete this e-mail and notify the 
> postmas...@bl.uk<mailto:postmas...@bl.uk> : The contents of this e-mail must 
> not be disclosed or copied without the sender's consent.
> The statements and opinions expressed in this message are those of the author 
> and do not necessarily reflect those of the British Library. The British 
> Library does not take any responsibility for the views of the author.
> **********************************************************************
> *******************************************
> Think before you print
> ________________________________
> 
> This message is confidential and may contain privileged information. You 
> should not disclose its contents to any other person. If you are not the 
> intended recipient, please notify the sender named above immediately. It is 
> expressly declared that this e-mail does not constitute nor form part of a 
> contract or unilateral obligation. Opinions, conclusions and other 
> information in this message that do not relate to the official business of 
> findmypast shall be understood as neither given nor endorsed by it.
> ________________________________
> 
> ______________________________________________________________________
> ____
> 
> This email has been checked for virus and other malicious content prior to 
> leaving our network.
> ______________________________________________________________________
> ____
> 


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to