I think you can use pf2 and pf3 in your requestHandler. Best regards, Elisabeth
2011/10/16 Vijay Ramachandran <vijay...@gmail.com> > Hello. I have an application where I try to match longer queries > (sentences) > to short documents (search phrases). Typically, the documents are 3-5 terms > in length. I am facing a problem where phrase match in the indicated phrase > fields via "pf" doesn't seem to match in most cases, and I am stumped. > Please help! > > For instance, when my query is "should I buy a house now while the rates > are > low. We filed BR 2 yrs ago. Rent now, w/ some sch loan debt" > > I expect the document "buy a house" to match much higher than "house > loan rates". > However, the latter is the document which always matches higher. > > > I tried to do this the following way (solr 3.1): > 1. Score phrase matches high > 2. Score single word matches lower > 3. Use dismax with a "mm" of 1, and very high boost for exact phrase match. > > I used the s "text" definition in the schema for the single words, and the > following for the phrase: > > <fieldType name="shingle" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" > catenateWords="1" catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="3" > outputUnigrams="false"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" > catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="3" > outputUnigrams="false"/> > </analyzer> > </fieldType> > > and my schema fields look like this: > > <field name="kw_stopped" type="text_en" indexed="true" omitNorms="True" > /> > > <!-- keywords almost as is - to provide truer match for full phrases --> > <field name="kw_phrases" type="shingle" indexed="true" omitNorms="True" > /> > > This is my search handler config: > > <requestHandler name="edismax" class="solr.SearchHandler" default="true"> > <lst name="defaults"> > <str name="defType">edismax</str> > <str name="echoParams">explicit</str> > <float name="tie">0.1</float> > <str name="fl"> > kpid,advid,campaign,keywords > </str> > <str name="mm">1</str> > <str name="qf"> > kw_stopped^1.0 > </str> > <str name="pf"> > kw_phrases^50.0 > </str> > <int name="ps">3</int> > <int name="qs">3</int> > <str name="q.alt">*:*</str> > <!-- example highlighter config, enable per-query with hl=true --> > <str name="hl.fl">keywords</str> > <!-- for this field, we want no fragmenting, just highlighting --> > <str name="f.name.hl.fragsize">0</str> > <!-- instructs Solr to return the field itself if no query terms are > found --> > <str name="f.name.hl.alternateField">title</str> > <str name="f.text.hl.fragmenter">regex</str> <!-- defined below --> > </lst> > </requestHandler> > > These are the match score debugQuery explanations: > > 8.480054E-4 = (MATCH) sum of: > 8.480054E-4 = (MATCH) product of: > 0.0031093531 = (MATCH) sum of: > 0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of: > 2.8209004E-4 = queryWeight(kw_stopped:hous), product of: > 5.514656 = idf(docFreq=25, maxDocs=2375) > 5.1152787E-5 = queryNorm > 5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of: > 1.0 = tf(termFreq(kw_stopped:hous)=1) > 5.514656 = idf(docFreq=25, maxDocs=2375) > 1.0 = fieldNorm(field=kw_stopped, doc=1812) > 8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of: > 2.0471694E-4 = queryWeight(kw_stopped:rate), product of: > 4.002068 = idf(docFreq=117, maxDocs=2375) > 5.1152787E-5 = queryNorm > 4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of: > 1.0 = tf(termFreq(kw_stopped:rate)=1) > 4.002068 = idf(docFreq=117, maxDocs=2375) > 1.0 = fieldNorm(field=kw_stopped, doc=1812) > 7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of: > 1.9382538E-4 = queryWeight(kw_stopped:loan), product of: > 3.7891462 = idf(docFreq=145, maxDocs=2375) > 5.1152787E-5 = queryNorm > 3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product > of: > 1.0 = tf(termFreq(kw_stopped:loan)=1) > 3.7891462 = idf(docFreq=145, maxDocs=2375) > 1.0 = fieldNorm(field=kw_stopped, doc=1812) > 0.27272728 = coord(3/11) > > for "house loan rates" vs > > 8.480054E-4 = (MATCH) sum of: > 8.480054E-4 = (MATCH) product of: > 0.0031093531 = (MATCH) sum of: > 0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of: > 2.8209004E-4 = queryWeight(kw_stopped:hous), product of: > 5.514656 = idf(docFreq=25, maxDocs=2375) > 5.1152787E-5 = queryNorm > 5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of: > 1.0 = tf(termFreq(kw_stopped:hous)=1) > 5.514656 = idf(docFreq=25, maxDocs=2375) > 1.0 = fieldNorm(field=kw_stopped, doc=1812) > 8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of: > 2.0471694E-4 = queryWeight(kw_stopped:rate), product of: > 4.002068 = idf(docFreq=117, maxDocs=2375) > 5.1152787E-5 = queryNorm > 4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of: > 1.0 = tf(termFreq(kw_stopped:rate)=1) > 4.002068 = idf(docFreq=117, maxDocs=2375) > 1.0 = fieldNorm(field=kw_stopped, doc=1812) > 7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of: > 1.9382538E-4 = queryWeight(kw_stopped:loan), product of: > 3.7891462 = idf(docFreq=145, maxDocs=2375) > 5.1152787E-5 = queryNorm > 3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product > of: > 1.0 = tf(termFreq(kw_stopped:loan)=1) > 3.7891462 = idf(docFreq=145, maxDocs=2375) > 1.0 = fieldNorm(field=kw_stopped, doc=1812) > 0.27272728 = coord(3/11) > > for "buy a house". > > Unless I try an exact phrase "buy a house" as the query, the kw_phrases > never shows up in the explanation. > > What am I doing wrong? Please help! > > thanks, > Vijay >