I think you can use pf2 and pf3 in your requestHandler.

Best regards,
Elisabeth

2011/10/16 Vijay Ramachandran <vijay...@gmail.com>

> Hello. I have an application where I try to match longer queries
> (sentences)
> to short documents (search phrases). Typically, the documents are 3-5 terms
> in length. I am facing a problem where phrase match in the indicated phrase
> fields via "pf" doesn't seem to match in most cases, and I am stumped.
> Please help!
>
> For instance, when my query is "should I buy a house now while the rates
> are
> low. We filed BR 2 yrs ago. Rent now, w/ some sch loan debt"
>
> I expect the document "buy a house" to match much higher than "house
> loan rates".
> However, the latter is the document which always matches higher.
>
>
> I tried to do this the following way (solr 3.1):
> 1. Score phrase matches high
> 2. Score single word matches lower
> 3. Use dismax with a "mm" of 1, and very high boost for exact phrase match.
>
> I used the s "text" definition in the schema for the single words, and the
> following for the phrase:
>
>    <fieldType name="shingle" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
>        catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>    <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> outputUnigrams="false"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
>        catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>    <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> outputUnigrams="false"/>
>      </analyzer>
>    </fieldType>
>
> and my schema fields look like this:
>
>   <field name="kw_stopped" type="text_en" indexed="true" omitNorms="True"
> />
>
>   <!-- keywords almost as is - to provide truer match for full phrases -->
>   <field name="kw_phrases" type="shingle" indexed="true" omitNorms="True"
> />
>
> This is my search handler config:
>
>  <requestHandler name="edismax" class="solr.SearchHandler" default="true">
>    <lst name="defaults">
>     <str name="defType">edismax</str>
>     <str name="echoParams">explicit</str>
>     <float name="tie">0.1</float>
>     <str name="fl">
>       kpid,advid,campaign,keywords
>     </str>
>     <str name="mm">1</str>
>     <str name="qf">
>       kw_stopped^1.0
>     </str>
>     <str name="pf">
>       kw_phrases^50.0
>     </str>
>     <int name="ps">3</int>
>     <int name="qs">3</int>
>     <str name="q.alt">*:*</str>
>     <!-- example highlighter config, enable per-query with hl=true -->
>     <str name="hl.fl">keywords</str>
>     <!-- for this field, we want no fragmenting, just highlighting -->
>     <str name="f.name.hl.fragsize">0</str>
>     <!-- instructs Solr to return the field itself if no query terms are
>          found -->
>     <str name="f.name.hl.alternateField">title</str>
>     <str name="f.text.hl.fragmenter">regex</str> <!-- defined below -->
>    </lst>
>  </requestHandler>
>
> These are the match score debugQuery explanations:
>
> 8.480054E-4 = (MATCH) sum of:
>  8.480054E-4 = (MATCH) product of:
>    0.0031093531 = (MATCH) sum of:
>      0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of:
>        2.8209004E-4 = queryWeight(kw_stopped:hous), product of:
>          5.514656 = idf(docFreq=25, maxDocs=2375)
>          5.1152787E-5 = queryNorm
>        5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of:
>          1.0 = tf(termFreq(kw_stopped:hous)=1)
>          5.514656 = idf(docFreq=25, maxDocs=2375)
>          1.0 = fieldNorm(field=kw_stopped, doc=1812)
>      8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of:
>        2.0471694E-4 = queryWeight(kw_stopped:rate), product of:
>          4.002068 = idf(docFreq=117, maxDocs=2375)
>          5.1152787E-5 = queryNorm
>        4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of:
>          1.0 = tf(termFreq(kw_stopped:rate)=1)
>          4.002068 = idf(docFreq=117, maxDocs=2375)
>          1.0 = fieldNorm(field=kw_stopped, doc=1812)
>      7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of:
>        1.9382538E-4 = queryWeight(kw_stopped:loan), product of:
>          3.7891462 = idf(docFreq=145, maxDocs=2375)
>          5.1152787E-5 = queryNorm
>        3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product
> of:
>          1.0 = tf(termFreq(kw_stopped:loan)=1)
>          3.7891462 = idf(docFreq=145, maxDocs=2375)
>          1.0 = fieldNorm(field=kw_stopped, doc=1812)
>    0.27272728 = coord(3/11)
>
> for "house loan rates" vs
>
> 8.480054E-4 = (MATCH) sum of:
>  8.480054E-4 = (MATCH) product of:
>    0.0031093531 = (MATCH) sum of:
>      0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of:
>        2.8209004E-4 = queryWeight(kw_stopped:hous), product of:
>          5.514656 = idf(docFreq=25, maxDocs=2375)
>          5.1152787E-5 = queryNorm
>        5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of:
>          1.0 = tf(termFreq(kw_stopped:hous)=1)
>          5.514656 = idf(docFreq=25, maxDocs=2375)
>          1.0 = fieldNorm(field=kw_stopped, doc=1812)
>      8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of:
>        2.0471694E-4 = queryWeight(kw_stopped:rate), product of:
>          4.002068 = idf(docFreq=117, maxDocs=2375)
>          5.1152787E-5 = queryNorm
>        4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of:
>          1.0 = tf(termFreq(kw_stopped:rate)=1)
>          4.002068 = idf(docFreq=117, maxDocs=2375)
>          1.0 = fieldNorm(field=kw_stopped, doc=1812)
>      7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of:
>        1.9382538E-4 = queryWeight(kw_stopped:loan), product of:
>          3.7891462 = idf(docFreq=145, maxDocs=2375)
>          5.1152787E-5 = queryNorm
>        3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product
> of:
>          1.0 = tf(termFreq(kw_stopped:loan)=1)
>          3.7891462 = idf(docFreq=145, maxDocs=2375)
>          1.0 = fieldNorm(field=kw_stopped, doc=1812)
>    0.27272728 = coord(3/11)
>
> for "buy a house".
>
> Unless I try an exact phrase "buy a house" as the query, the kw_phrases
> never shows up in the explanation.
>
> What am I doing wrong? Please help!
>
> thanks,
> Vijay
>

Reply via email to