There's a long blog on wildcards here:
https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

The gist is that when you are analyzing a token, if the analysis chain
splits a token into more than one part then wildcards are impossible
to get right. So any "MultiTermAware" filter will barf if you ask it
to emit more than one token when doing wildcard searches. For filters
that are _not_ "MultiTermAware", they're just skipped in the query
analysis chain.

That leaves the question of why your query chain seems to emit two
tokens for  MöllerGruppen but not MollerGruppen. I think it's because
you have preserveOriginal set to true in the query analysis chain
here:
 <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>

So this entry emits both
MöllerGruppen and MollerGruppen
for the input
MöllerGruppen
but not for
MollerGruppen
since MollerGruppen doesn't need any folding. This violates this
constraint imposed by ASCIIFoldingFilterFactory being
"MultiTermAware", which means if it emits two tokens it barfs.

You do not need to set "preserveOriginal='true' " in your _query_
chain since your indexing chain puts both the folded and un-folded
versions in the index at the same position.

So I think if you set perserveOriginal to false (again, in the _query_
analysis chain, leave it true in the index analysis chain) you'll be
OK. Your queries will also be somewhat faster.

Best,
Erick

On Wed, Jun 28, 2017 at 6:25 AM, Preeti Bhat <preeti.b...@shoregrp.com> wrote:
> Hi All,
>
> I have a requirement where the user can give an Unicode or ascii character as 
> input but expects same result.
>
> For example: MöllerGruppen AS vs MollerGruppen AS should give out same result.
>
> I am able to get this done using <filter 
> class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>, but due to 
> some reason when it try to do MöllerGruppen* I am getting the below message.
>
> ""metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"analyzer returned too many terms for multiTerm term: 
> MöllerGruppen",
>     "code":400}}
> "
>
> It works for MollerGruppen* though.
>
> Could someone please advise on this.
>
> Below is the fieldtype of this field.
>
> <fieldType name="string_ci" class="solr.TextField">
>     <analyzer type="index">
>             <charFilter class="solr.HTMLStripCharFilterFactory"/>
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.ASCIIFoldingFilterFactory" 
> preserveOriginal="true"/>
>               <filter class="solr.TrimFilterFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>               <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" splitOnCaseChange="0" catenateWords="1" 
> splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1"/>
>     </analyzer>
>     <analyzer type="query">
>             <charFilter class="solr.HTMLStripCharFilterFactory"/>
>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.ASCIIFoldingFilterFactory" 
> preserveOriginal="true"/>
>               <filter class="solr.TrimFilterFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt" 
> ignoreCase="true"/>
>               <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" splitOnCaseChange="0" catenateWords="1" 
> splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1"/>
>     </analyzer>
>   </fieldType>
>
>
>
> Thanks and Regards,
> Preeti
>
>
>
> NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
> privileged information. If you are not the intended recipient (or have 
> received this communication in error) please notify the sender and 
> it-supp...@shoregrp.com immediately, and destroy this communication. Any 
> unauthorized copying, disclosure or distribution of the material in this 
> communication is strictly forbidden. Any views or opinions presented in this 
> email are solely those of the author and do not necessarily represent those 
> of the company. Finally, the recipient should check this email and any 
> attachments for the presence of viruses. The company accepts no liability for 
> any damage caused by any virus transmitted by this email.
>
>

Reply via email to