Re: Priority in search an synonyms

Ahmet Arslan Thu, 11 Dec 2014 03:58:10 -0800

Hi Antoine,

By saying "The problem I have now is that ebc_libelle synonyms reported for the 
field are not show", you mean you have synonym entry for the word Castaroma, 
and documents containing those synonym entries do not show up in fist 100 
documents?


If yes, play with boost values (5 versus 75), tweak them until you have 
satisfactory diverse result set.

By the way, I think filling first/initial result set (whenever possible) with 
exact matches is a good thing.
I believe, the user types her query for some reason. If exact matched documents 
are too few, then other techniques (stemming, synonym, etc) should kick in. 
Please note that this approach makes sense for search applications where 
precision is more valuable than recall.

Ahmet





On Thursday, December 11, 2014 12:20 PM, Antoine REBOUL 
<antoine.reb...@plebicom.com> wrote:
Hello,

First of all thank you for your answers !

In my schema.xml file:
- I created this field :
    <fieldType name="tmp_libelle" class="solr.TextField"
positionIncrementGap="100" >
        <analyzer type="index"> <tokenizer
class="solr.StandardTokenizerFactory"/></analyzer>
        <analyzer type="query"><tokenizer
class="solr.StandardTokenizerFactory"/></analyzer>
    </fieldType>
- the type of this field is a "copyfiled" :
    <field name="tmp_libelle" type="tmp_libelle" indexed="true"
stored="true" required="false"/>
    <copyField source="ebc_libelle" dest="tmp_libelle"/>

I wonder if the following statement is required :
<defaultSearchField>ebc_libelle</defaultSearchField>

I test my results with the following settings :
http://IP:8983/solr/select/?qf=tmp_libelle
^75%20ebc_libelle^5&pf=ebc_libelle&q=Castorama&start=0&rows=100&indent=on&defType=edismax&sort=score%20asc

The problem I have now is that ebc_libelle synonyms reported for the field
are not show


The field ebc_libelle is analyzed/indexed as follows :
   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
                <filter class="solr.ElisionFilterFactory"
articles="elisions.txt"/>
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms2.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
                        generateWordParts="1"
                        generateNumberParts="1"
                        catenateWords="1"
                        catenateNumbers="1"
                        catenateAll="1"
                        splitOnCaseChange="1"
                        splitOnNumerics="1"
                        preserveOriginal="1"   />
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
                        generateWordParts="1"
                        generateNumberParts="1"
                        catenateWords="1"
                        catenateNumbers="0"
                        catenateAll="1"
                        splitOnCaseChange="1"
                        preserveOriginal="1"/>
                <filter class="solr.StopFilterFactory"
                        ignoreCase="true"
                        words="stopwords.txt"
                        enablePositionIncrements="true"/>
                <filter class="solr.ElisionFilterFactory"
articles="elisions.txt"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms2.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>



Best Regards.

*Antoine Reboul*
Responsable Comparateurs / Plateforme emailing
Plebicom -  eBuyClub - Cashstore - Checkdeal

PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon
Tel  : 04 72 85 81 49
Fax : 04 78 83 39 74


2014-12-10 16:40 GMT+01:00 Alexandre Rafalovitch <arafa...@gmail.com>:

> This might be written just for you:
>
> http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/
>
> Merchant would be same as title = short text
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 10 December 2014 at 10:00, Antoine REBOUL
> <antoine.reb...@plebicom.com> wrote:
> > hello,
> >
> > I have a question , I do not know if there is a solution ...
> >
> > I will index and search a field named " Libel " .
> > I use a " synomims " file.
> >
> > I have for example the following line in my file synonyms " ipad = >
> Apple,
> > Priceminister , Amazon"
> >
> > Research on iPad gives me much Apple, and Amazon Priceminister ( expected
> > result)
> > But when I am searching "Apple", i want that the merchant Apple is
> returned
> > first.
> > This is not the case , in fact, it is Amazon who gets the first place.
> >
> > Sorry for my poor English , I'm using a translator.
> >
> > Best Regards.
> >
> > *Antoine Reboul*
> > Responsable Comparateurs / Plateforme emailing
> > Plebicom -  eBuyClub - Cashstore - Checkdeal
> >
> > PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon
> > Tel  : 04 72 85 81 49
> > Fax : 04 78 83 39 74
>

Re: Priority in search an synonyms

Reply via email to