Re: Solr 3.4 problem with words separated by coma without space

elisabeth benoit Thu, 08 Dec 2011 01:32:02 -0800

same problem with Solr 4.0

2011/12/8 elisabeth benoit <elisaelisael...@gmail.com>


>
>
> Hello,
>
> I'm using Solr 3.4, and I'm having a problem with a request returning
> different results if I have or not a space after a coma.
>
> The request "name, number rue taine paris" returns results with 4 words
> out of 5 matching ("name", "number", "rue", "paris")
>
> The request "name,number rue taine paris" (no space between coma and
> "number") returns no results, unless I set mm=3, and then matching words
> are "rue", "taine", "paris".
>
> If I check in the solr.admin.analyzer, I get the same analysis for the two
> different requests. But it seems, if fact, that the lacking space after
> coma prevents name and number from matching.
>
>
> My field type is
>
>
>       <analyzer type="query">
>         <!-- découpage standard -->
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <!-- normalisation des accents, cédilles, e dans l'o,... -->
>         <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <!-- suppression des . (I.B.M. => IBM) -->
>         <filter class="solr.StandardFilterFactory"/>
>         <!-- passage en minuscules -->
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <!-- suppression de la ponctuation -->
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>         <!-- suppression des tokens vides et des mots démesurés -->
>         <filter class="solr.LengthFilterFactory" min="1" max="100" />
>         <!-- découpage des mots composés -->
>         <filter class="solr.WordDelimiterFilterFactory"
> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
> generateWordParts="1"
>
> generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" preserveOriginal="1"/>
>         <!-- suppression des élisions (l', qu',...) -->
>         <filter class="solr.ElisionFilterFactory"
> articles="elisionwords.txt"/>
>         <!-- suppression des mots insignifiants -->
>         <filter class="solr.StopFilterFactory" ignoreCase="1"
> words="stopwords.txt" enablePositionIncrements="true"/>
>         <!-- lemmatisation (pluriels,...) -->
>         <filter class="solr.SnowballPorterFilterFactory" language="French"
> protected="protwords.txt"/>
>         <!-- suppression des doublons éventuels -->
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>
> Anyone has a clue?
>
> Thanks,
> Elisabeth
>

Re: Solr 3.4 problem with words separated by coma without space

Reply via email to