Re: Synonyms problem

Plamen Mihaylov Fri, 29 Mar 2013 11:14:59 -0700

Thank you a lot, Walter. I removed most of the filters and now it returns
the same number of results. It looks simply this way:


        <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>

Can I ask you another question: I have Magento + Solr and have a
requirement to create an admin magento module, where I can add/remove
synonyms dynamically. Is this possible? I searched google but it seems not
possible.

Regards
Plamen

2013/3/29 Walter Underwood <wun...@wunderwood.org>

> There are several problems with this config.
>
> Indexing uses the phonetic filter, but query does not. This almost
> guarantees that nothing will match. Numbers could match, if the filter
> passes them.
>
> Query time has two stopword filters with different lists. Indexing only
> has one. This isn't fatal, but it is pretty weird. Is letterstops.txt
> trying to do the same thing as the length filter? If so, use the length
> filter both place. Or not at all. Deleting single all single characters is
> a bad idea. You'll never find "Vitamin C".
>
> The same synonyms are used at index and query time, which is unnecessary.
> Only use synonyms at index time unless you really know what you are doing
> and have a special need.
>
> wunder
>
> On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote:
>
> > Guys,
> >
> > This is a commented line where expand is false. I moved the synonym
> filter
> > after tokenizer, but the result is the same.
> >
> > Actual configuration:
> >
> >        <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >            <analyzer type="index">
> >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >                <filter class="solr.SynonymFilterFactory"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >                <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> >                <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >                    catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="1" />
> >                <filter class="solr.LowerCaseFilterFactory" />
> >                <filter class="solr.PhoneticFilterFactory"
> > encoder="DoubleMetaphone" inject="true" />
> >                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> >                <filter class="solr.LengthFilterFactory" min="2" max="100"
> > />
> >                <!-- <filter class="solr.SnowballPorterFilterFactory"
> > language="English" /> -->
> >            </analyzer>
> >            <analyzer type="query">
> >                <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >                <filter class="solr.SynonymFilterFactory"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true" />
> >                <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >                <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >                    catenateNumbers="0" catenateAll="0" />
> >                <filter class="solr.LowerCaseFilterFactory" />
> >                <!-- <filter class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/> -->
> >                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> >                <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="letterstops.txt" enablePositionIncrements="true" />
> >            </analyzer>
> >        </fieldType>
> >
> > 2013/3/29 Walter Underwood <wun...@wunderwood.org>
> >
> >> Also, all the filters need to be after the tokenizer. There are two
> >> synonym filters specified, one before the tokenizer and one after.
> >>
> >> I'm surprised that works at all. Shouldn't that be fatal error when
> >> loading the config?
> >>
> >> wunder
> >>
> >> On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
> >>
> >>> Hi Plamen
> >>>
> >>> You should set expand to true during
> >>>
> >>> <analyzer type="index">
> >>> ....
> >>> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
> >>>             ignoreCase="true" expand="true"/>
> >>>
> >>>
> >>> ...
> >>>
> >>> Greetings,
> >>>
> >>> Thomas
> >>>
> >>> Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
> >>>> Hey guys,
> >>>>
> >>>> I have the following problem - I have a website with sport players,
> >> where
> >>>> using Solr indexing their data. I have defined synonyms like: NY, New
> >> York.
> >>>> When I search for New York - there are 145 results found, but when I
> >> search
> >>>> for NY - there are 142 results found. Why there is a diff and how can
> I
> >> fix
> >>>> this?
> >>>>
> >>>> Configuration snippets:
> >>>>
> >>>> synonyms.txt
> >>>>
> >>>> ...
> >>>> NY, New York
> >>>> ...
> >>>>
> >>>> ------
> >>>> schema.xml
> >>>>
> >>>> ...
> >>>>        <fieldType name="text" class="solr.TextField"
> >>>> positionIncrementGap="100">
> >>>>           <analyzer type="index">
> >>>>               <filter class="solr.
> >>>> SynonymFilterFactory" synonyms="synonyms.txt"
> >>>>                   ignoreCase="true" expand="true"/>
> >>>>               <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >>>>               <!-- we will only use synonyms at query time <filter
> >>>> class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
> >>>>                   ignoreCase="true" expand="false"/> -->
> >>>>
> >>>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
> >>>> words="stopwords.txt" enablePositionIncrements="true" />
> >>>>               <filter class="solr.WordDelimiterFilterFactory"
> >>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>>>                   catenateNumbers="1" catenateAll="0"
> >>>> splitOnCaseChange="1" />
> >>>>               <filter class="solr.LowerCaseFilterFactory" />
> >>>>               <filter class="solr.PhoneticFilterFactory"
> >>>> encoder="DoubleMetaphone" inject="true" />
> >>>>               <filter class="solr.RemoveDuplicatesTokenFilterFactory"
> >> />
> >>>>               <filter class="solr.LengthFilterFactory" min="2"
> >> max="100"
> >>>> />
> >>>>               <!-- <filter class="solr.SnowballPorterFilterFactory"
> >>>> language="English" /> -->
> >>>>           </analyzer>
> >>>>           <analyzer type="query">
> >>>>               <filter class="solr.SynonymFilterFactory"
> >>>> synonyms="synonyms.txt" ignoreCase="true" expand="true" />
> >>>>               <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >>>>
> >>>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
> >>>> words="stopwords.txt" />
> >>>>               <filter class="solr.WordDelimiterFilterFactory"
> >>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >>>>                   catenateNumbers="0" catenateAll="0" />
> >>>>               <filter class="solr.LowerCaseFilterFactory" />
> >>>>               <!-- <filter class="solr.EnglishPorterFilterFactory"
> >>>> protected="protwords.txt"/> -->
> >>>>               <filter class="solr.RemoveDuplicatesTokenFilterFactory"
> >> />
> >>>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
> >>>> words="letterstops.txt" enablePositionIncrements="true" />
> >>>>           </analyzer>
> >>>>       </fieldType>
> >>>>
> >>>>
> >>>> Thanks in advance.
> >>>> Plamen
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> ontopica GmbH
> >>> Prinz-Albert-Str. 2b
> >>> 53113 Bonn
> >>> Germany
> >>> fon: +49-228-227229-22
> >>> fax: +49-228-227229-77
> >>> web: http://www.ontopica.de
> >>> ontopica GmbH
> >>> Sitz der Gesellschaft: Bonn
> >>>
> >>> Geschäftsführung: Thomas Krämer, Christoph Okpue
> >>> Handelsregister: Amtsgericht Bonn, HRB 17852
> >>>
> >>>
> >>
>
>
>
>
>


-- 
Поздрави
Пламен Михайлов

Re: Synonyms problem

Reply via email to