Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

shamik Fri, 12 Dec 2014 13:11:18 -0800

Ted,

  Here's the query I'm using and the debug info. It's still returning all 5
results back as if it's simply looking for either of the term with q.op set
as OR (default).


http://localhost:8983/solr/autophrase?q=text:seat+cushions&wt=xml&debugQuery=true

Debug
====
<lst name="debug">
<str name="rawquerystring">text:seat cushions</str>
<str name="querystring">text:seat cushions</str>
<str name="parsedquery">text:seat text:cushion</str>
<str name="parsedquery_toString">text:seat text:cushion</str>
<lst name="explain">
<str name="2">
0.430151 = (MATCH) sum of:
  0.11124363 = (MATCH) weight(text:seat in 1) [DefaultSimilarity], result
of:
    0.11124363 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      0.5085423 = queryWeight, product of:
        1.0 = idf(docFreq=5, maxDocs=6)
        0.5085423 = queryNorm
      0.21875 = fieldWeight in 1, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=5, maxDocs=6)
        0.21875 = fieldNorm(doc=1)
  0.31890735 = (MATCH) weight(text:cushion in 1) [DefaultSimilarity], result
of:
    0.31890735 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      0.86103696 = queryWeight, product of:
        1.6931472 = idf(docFreq=2, maxDocs=6)
        0.5085423 = queryNorm
      0.37037593 = fieldWeight in 1, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.6931472 = idf(docFreq=2, maxDocs=6)
        0.21875 = fieldNorm(doc=1)
</str>
<str name="6">
0.430151 = (MATCH) sum of:
  0.11124363 = (MATCH) weight(text:seat in 5) [DefaultSimilarity], result
of:
    0.11124363 = score(doc=5,freq=1.0 = termFreq=1.0
), product of:
      0.5085423 = queryWeight, product of:
        1.0 = idf(docFreq=5, maxDocs=6)
        0.5085423 = queryNorm
      0.21875 = fieldWeight in 5, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=5, maxDocs=6)
        0.21875 = fieldNorm(doc=5)
  0.31890735 = (MATCH) weight(text:cushion in 5) [DefaultSimilarity], result
of:
    0.31890735 = score(doc=5,freq=1.0 = termFreq=1.0
), product of:
      0.86103696 = queryWeight, product of:
        1.6931472 = idf(docFreq=2, maxDocs=6)
        0.5085423 = queryNorm
      0.37037593 = fieldWeight in 5, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.6931472 = idf(docFreq=2, maxDocs=6)
        0.21875 = fieldNorm(doc=5)
</str>
<str name="1">
0.06356779 = (MATCH) product of:
  0.12713557 = (MATCH) sum of:
    0.12713557 = (MATCH) weight(text:seat in 0) [DefaultSimilarity], result
of:
      0.12713557 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
        0.5085423 = queryWeight, product of:
          1.0 = idf(docFreq=5, maxDocs=6)
          0.5085423 = queryNorm
        0.25 = fieldWeight in 0, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          1.0 = idf(docFreq=5, maxDocs=6)
          0.25 = fieldNorm(doc=0)
  0.5 = coord(1/2)
</str>
<str name="3">
0.06356779 = (MATCH) product of:
  0.12713557 = (MATCH) sum of:
    0.12713557 = (MATCH) weight(text:seat in 2) [DefaultSimilarity], result
of:
      0.12713557 = score(doc=2,freq=1.0 = termFreq=1.0
), product of:
        0.5085423 = queryWeight, product of:
          1.0 = idf(docFreq=5, maxDocs=6)
          0.5085423 = queryNorm
        0.25 = fieldWeight in 2, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          1.0 = idf(docFreq=5, maxDocs=6)
          0.25 = fieldNorm(doc=2)
  0.5 = coord(1/2)
</str>
<str name="5">
0.055621814 = (MATCH) product of:
  0.11124363 = (MATCH) sum of:
    0.11124363 = (MATCH) weight(text:seat in 4) [DefaultSimilarity], result
of:
      0.11124363 = score(doc=4,freq=1.0 = termFreq=1.0
), product of:
        0.5085423 = queryWeight, product of:
          1.0 = idf(docFreq=5, maxDocs=6)
          0.5085423 = queryNorm
        0.21875 = fieldWeight in 4, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          1.0 = idf(docFreq=5, maxDocs=6)
          0.21875 = fieldNorm(doc=4)
  0.5 = coord(1/2)
</str>
</lst>
<str name="QParser">LuceneQParser</str>


Sample data
========
 <add>
  <doc>
    <field name="id">1</field>
    <field name="name">Doc 1</field>
    <field name="text">This has a rear window defroster and really cool
bucket seats.</field>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="name">Doc 2</field>
    <field name="text">This one has rear seat cushions and air conditioning
– what a ride!</field>
  </doc>
  <doc>
    <field name="id">3</field>
    <field name="name">Doc 3</field>
    <field name="text">This one has gold seat belts front and rear.</field>
  </doc>
  <doc>
    <field name="id">4</field>
    <field name="name">Doc 4</field>
    <field name="text">This one has front and side air bags and a heated
seat.The fan belt never breaks.</field>
  </doc>
    <doc>
    <field name="id">5</field>
    <field name="name">Doc 5</field>
    <field name="text">This one has big rear wheels and a seat cushion.It
doesn't have a timing belt.</field>
  </doc>
   <doc>
    <field name="id">6</field>
    <field name="name">Doc 6</field>
    <field name="text">This one has rear seat with cushions and air
conditioning – what a ride!</field>
  </doc>
</add>

I tried including AutoPhrasingTokenFilterFactory as part of query analyzer,
it didn't make any difference.

Let me know if I'm missing something.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4174094.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

Reply via email to