Hi Markus,

Thanks for your prompt reply to my issue.  I will try your suggestions and 
report back.

Thanks,
-Miguel

-----Original Message-----
From: Markus Jelsma <[email protected]>
Sent: Tuesday, September 27, 2022 6:36 AM
To: [email protected]
Subject: Re: Solr Search - Mixed Case Issue

CAUTION: This email originated from outside the organization. Do not click 
links or open attachments unless you recognize the sender and expect that the 
content is safe.

Hello Miguel,

The problem lies with the different index-time and query-time 
WordDelimiterFilter configurations.

> In addition, its strange that we get search results on some mixed case
email addresses

Yes, precisely!

See the splitOnCaseChange attributes, that is where the problem is. In your 
case you should be able to copy the index-time configuration to the query-time 
and get rid of the problem without reindex. It 'should' solve the problem. If 
not, try to enable catenateAll, on both sides, but that requires reindex.

Ideally you should probably also get rid of the StopFilterFactory, unless very 
well configured (which i do not suspect) it will cause additional weird 
problems. This does require reindexing.

Regards,
Markus

Op di 27 sep. 2022 om 11:55 schreef Miguel Joy
<[email protected]>:

> Hi all,
>
> I'm new to Solr and recently inherited a Solr application (version
> 5.4) from a previous developer with very little documentation.  At any
> rate, my problem is this:
>
> I have some email addresses that are stored as mixed case.
>
> [email protected]<mailto:[email protected]> = Success [querying for
> this email address and passing in the full email address in any case
> [upper or lower] returns the correct result]
>
> [email protected]<mailto:[email protected]> = Fail [querying
> for this email address and passing in the full email address in any
> case [upper or lower] returns zero results]
>
> And here's the fieldType definition that's used for email addresses:
>
> <fieldType name="text_phonetic" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> splitOnNumerics="0"/>
>                 <filter class="solr.PhoneticFilterFactory"
> encoder="Caverphone" inject="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> splitOnNumerics="0"/>
>                                 <filter class="solr.PhoneticFilterFactory"
> encoder="Caverphone" inject="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>                 <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> I've spent a couple days researching this issue, and my best guess at
> a fix would be to re-index this data using the LowerCaseFilterFatory
> so that all email addresses are stored in lower case, but that would
> be a significant change as I have over 10 million docs indexed.  In
> addition, its strange that we get search results on some mixed case
> email addresses, but not all, so I'm hoping that maybe all we need is
> to tweak the query analyzer?  Thanks in advance for your help with
> this question.  Please let me know if you need any additional details.
>
> -Miguel
>
>
>
> ________________________________
>
> Notice: GBT Travel Services UK Limited (GBT UK) and its authorised
> sublicensees (including Ovation Travel Group and Egencia) use certain
> trademarks and service marks of American Express Company or its
> subsidiaries (American Express) in the 'American Express Global
> Business Travel' and 'American Express Meetings & Events' brands and
> in connection with its business for permitted uses only under a
> limited licence from American Express (Licensed Marks). The Licensed
> Marks are trademarks or service marks of, and the property of,
> American Express. GBT UK is a subsidiary of Global Business Travel
> Group, Inc. (NYSE: GBTG). American Express holds a minority interest
> in GBTG, which operates as a separate company from American Express.
>
> ________________________________
>
> This email message and all attachments transmitted with it are solely
> for the use of the intended recipient(s) and may contain confidential
> and/or privileged information. If the reader of this message is not
> the intended recipient, you are hereby notified that any
> dissemination, distribution, copying and/or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender and delete it immediately.
> Unintended transmission shall not constitute a waiver of the attorney-client 
> or any other privilege.
>
> ________________________________
> Avis : GBT Travel Services UK Limited (GBT UK) et ses d?tenteurs de
> sous-licence autoris?s (notamment Ovation Travel Group et Egencia)
> utilise certaines marques commerciales et marques de services
> d'American Express Company ou de ses filiales (American Express) dans
> les marques < American Express Global Business Travel > et < American
> Express Meetings & Events > ainsi qu'en lien avec son activit?, ? des
> fins autoris?es uniquement, sous une licence limit?e accord?e par American 
> Express (marques sous licence).
> Les marques sous licence sont des marques commerciales ou des marques
> de services d'American Express, dont elles sont la propri?t?. GBT UK
> est une filiale de Global Business Travel Group, Inc. (NYSE : GBTG).
> American Express d?tient une participation minoritaire dans GBTG, qui
> op?re en tant que soci?t? distincte d'American Express.
>
> ________________________________
>
> Ce message ?lectronique et toutes les pi?ces jointes transmises avec
> celui-ci sont uniquement destin?s ? l'usage du ou des destinataires
> vis?s et peuvent contenir des informations confidentielles et/ou
> privil?gi?es. Si le lecteur de ce message n'est pas le destinataire pr?vu, 
> vous ?tes inform?
> par la pr?sente que toute diffusion, distribution, copie et/ou autre
> utilisation de ce message ou de ses pi?ces jointes est strictement
> interdite. Si vous avez re?u ce message par erreur, veuillez en
> informer l'exp?diteur et le supprimer imm?diatement. Une transmission
> involontaire ne constitue pas une renonciation au secret professionnel
> ou ? toute autre pr?rogative.
>
> ________________________________
>


________________________________

Notice: GBT Travel Services UK Limited (GBT UK) and its authorised sublicensees 
(including Ovation Travel Group and Egencia) use certain trademarks and service 
marks of American Express Company or its subsidiaries (American Express) in the 
'American Express Global Business Travel' and 'American Express Meetings & 
Events' brands and in connection with its business for permitted uses only 
under a limited licence from American Express (Licensed Marks). The Licensed 
Marks are trademarks or service marks of, and the property of, American 
Express. GBT UK is a subsidiary of Global Business Travel Group, Inc. (NYSE: 
GBTG). American Express holds a minority interest in GBTG, which operates as a 
separate company from American Express.

________________________________

This email message and all attachments transmitted with it are solely for the 
use of the intended recipient(s) and may contain confidential and/or privileged 
information. If the reader of this message is not the intended recipient, you 
are hereby notified that any dissemination, distribution, copying and/or other 
use of this message or its attachments is strictly prohibited. If you have 
received this message in error, please notify the sender and delete it 
immediately. Unintended transmission shall not constitute a waiver of the 
attorney-client or any other privilege.

________________________________
Avis : GBT Travel Services UK Limited (GBT UK) et ses détenteurs de 
sous-licence autorisés (notamment Ovation Travel Group et Egencia) utilise 
certaines marques commerciales et marques de services d’American Express 
Company ou de ses filiales (American Express) dans les marques « American 
Express Global Business Travel » et « American Express Meetings & Events » 
ainsi qu’en lien avec son activité, à des fins autorisées uniquement, sous une 
licence limitée accordée par American Express (marques sous licence). Les 
marques sous licence sont des marques commerciales ou des marques de services 
d’American Express, dont elles sont la propriété. GBT UK est une filiale de 
Global Business Travel Group, Inc. (NYSE : GBTG). American Express détient une 
participation minoritaire dans GBTG, qui opère en tant que société distincte 
d’American Express.

________________________________

Ce message électronique et toutes les pièces jointes transmises avec celui-ci 
sont uniquement destinés à l’usage du ou des destinataires visés et peuvent 
contenir des informations confidentielles et/ou privilégiées. Si le lecteur de 
ce message n’est pas le destinataire prévu, vous êtes informé par la présente 
que toute diffusion, distribution, copie et/ou autre utilisation de ce message 
ou de ses pièces jointes est strictement interdite. Si vous avez reçu ce 
message par erreur, veuillez en informer l’expéditeur et le supprimer 
immédiatement. Une transmission involontaire ne constitue pas une renonciation 
au secret professionnel ou à toute autre prérogative.

________________________________

Reply via email to