Re: Slow query time on stemmed fields

Jens Meiners Mon, 03 Mar 2014 03:27:13 -0800

Sorry for the delay,

I did not have access to the server and could not query anything.


This is my Query:
http://server:port
/solr/core/select?q=keyword1+keyword2&wt=xml&indent=true&hl.fragsize=120&f.file_URI_tokenized.hl.fragsize=1000&spellcheck=true&f.file_content.hl.alternateField=spell&hl.simple.pre=%3Cb%3E&hl.fl=file_URI_tokenized,xmp_title,file_content&hl=true&rows=10&fl=file_URI,file_URI_tokenized,file_name,file_lastModification,file_lastModification_raw,xmp_creation_date,xmp_title,xmp_content_type,score,file_URI,host,xmp_manual_summary&hl.snippets=1&hl.useFastVectorHighlighter=true&hl.maxAlternateFieldLength=120&start=0&q=itdz+berlin&hl.simple.post=%3C/b%3E&fq=file_readright:%22wiki-access%22&debugQuery=true&defType=edismax&qf=file_URI_tokenized^10.0+file_content^10.0+xmp_title^5.0+spell^0.001&pf=file_URI_tokenized~2^1.0+file_content~100^2.0+xmp_title~2^1.0

newly extended testing showed that the normal QTime without a search on the
spell field is expected to be about 713 while it turns out to be at 70503
with the stemming parameter included like in the url above. Therefor its
just 100x slower at the moment.

Here comes the debug:

<lst name="debug">
<str name="rawquerystring">keyword1 keyword2</str>
<str name="querystring">keyword1 keyword2</str>
<str
name="parsedquery">(+((DisjunctionMaxQuery((file_URI_tokenized:keyword1^10.0
| xmp_title:keyword1^5.0 | spell:keyword1^0.0010 |
file_content:keyword1^10.0))
DisjunctionMaxQuery((file_URI_tokenized:keyword2^10.0 |
xmp_title:keyword2^5.0 | spell:keyword2^0.0010 |
file_content:keyword2^10.0)))~2)
DisjunctionMaxQuery((file_URI_tokenized:"keyword1 keyword2"~2))
DisjunctionMaxQuery((file_content:"keyword1 keyword2"~100^2.0))
DisjunctionMaxQuery((xmp_title:"keyword1 keyword2"~2)))/no_coord</str>
<str name="parsedquery_toString">+(((file_URI_tokenized:keyword1^10.0 |
xmp_title:keyword1^5.0 | spell:keyword1^0.0010 |
file_content:keyword1^10.0) (file_URI_tokenized:keyword2^10.0 |
xmp_title:keyword2^5.0 | spell:keyword2^0.0010 |
file_content:keyword2^10.0))~2) (file_URI_tokenized:"keyword1 keyword2"~2)
(file_content:"keyword1 keyword2"~100^2.0) (xmp_title:"keyword1
keyword2"~2)</str>
<lst name="explain">
<str name="..."></str>
<str name="..."></str>
<str name="..."></str>
<str name="..."></str>
<str name="..."></str>
<str name="..."></str>
<str name="..."></str>
<str name="..."></str>
<str name="..."></str>
<str name="...">
0.035045296 = (MATCH) sum of:
  0.035045296 = (MATCH) sum of:
    0.0318122 = (MATCH) max of:
      8.29798E-4 = (MATCH) weight(spell:keyword1^0.0010 in 71660)
[DefaultSimilarity], result of:
        8.29798E-4 = score(doc=71660,freq=2.0 = termFreq=2.0
), product of:
          6.7839865E-5 = queryWeight, product of:
            0.0010 = boost
            8.64913 = idf(docFreq=618, maxDocs=1299169)
            0.0078435475 = queryNorm
          12.231716 = fieldWeight in 71660, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.64913 = idf(docFreq=618, maxDocs=1299169)
            1.0 = fieldNorm(doc=71660)
      0.0318122 = (MATCH) weight(file_content:keyword1^10.0 in 71660)
[DefaultSimilarity], result of:
        0.0318122 = score(doc=71660,freq=2.0 = termFreq=2.0
), product of:
          0.6720717 = queryWeight, product of:
            10.0 = boost
            8.568466 = idf(docFreq=670, maxDocs=1299169)
            0.0078435475 = queryNorm
          0.047334533 = fieldWeight in 71660, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.568466 = idf(docFreq=670, maxDocs=1299169)
            0.00390625 = fieldNorm(doc=71660)
    0.003233097 = (MATCH) max of:
      0.003233097 = (MATCH) weight(file_content:keyword2^10.0 in 71660)
[DefaultSimilarity], result of:
        0.003233097 = score(doc=71660,freq=1.0 = termFreq=1.0
), product of:
          0.25479192 = queryWeight, product of:
            10.0 = boost
            3.2484267 = idf(docFreq=137146, maxDocs=1299169)
            0.0078435475 = queryNorm
          0.012689167 = fieldWeight in 71660, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            3.2484267 = idf(docFreq=137146, maxDocs=1299169)
            0.00390625 = fieldNorm(doc=71660)
</str>
</lst>
<str name="QParser">ExtendedDismaxQParser</str>
<null name="altquerystring"/>
<null name="boost_queries"/>
<arr name="parsed_boost_queries"/>
<null name="boostfuncs"/>
<arr name="filter_queries">
<str>file_readright:"wiki-access"</str></arr>
<arr
name="parsed_filter_queries"><str>file_readright:wiki-access</str></arr>
<lst name="timing">
<double name="time">66359.0</double>
<lst name="prepare"></lst>
<lst name="process">
<double name="time">66357.0</double>
<lst name="query">
<double name="time">80.0</double></lst>
<lst name="facet">
<double name="time">0.0</double></lst>
<lst name="mlt">
<double name="time">0.0</double></lst>
<lst name="highlight">
<double name="time">65981.0</double></lst>
<lst name="stats">
<double name="time">0.0</double></lst>
<lst name="spellcheck">
<double name="time">38.0</double></lst>
<lst name="debug">
<double name="time">258.0</double></lst>
</lst>
</lst>

Why does the Highlighting take up this mutch time? is it a problem with my
parameter overload or does highlighting on the spell field actually take
place ?

I Noticed a 13MB file poping up only if the search results are extended via
the spell field. but highlighting this doc on a query that brings only this
doc up does not take any amount of time like this.

Thanks for your comments and time.

Best,
Jens


2014-02-24 17:32 GMT+01:00 Jack Krupansky <j...@basetechnology.com>:

> Maybe some heap/GC issue from using more of this 20 GB index. Maybe it was
> running at the edge and just one more field was too much for the heap.
>
> The "timing" section of the debug query response should shed a little
> light.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Erick Erickson
> Sent: Monday, February 24, 2014 11:25 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Slow query time on stemmed fields
>
>
> This is really strange. You should have _fewer_ tokens in your stemmed
> field.
> Plus, the up-front processing to stem the field in the query shouldn't be
> noticeable.
>
> Let's see the query and results from &debug=all being added to the URL
> because something is completely strange here.
>
> Best,
> Erick
>
>
> On Mon, Feb 24, 2014 at 7:18 AM, Jens Meiners <snej.sren...@gmail.com
> >wrote:
>
>  Hi,
>>
>> we've built an index (Solr 4.3), which contains approx. 1 Million docs and
>> its size is around 20 GB (optimized).
>>
>> In our index we have one field which contains the tokenized words of
>> indexed documents and a second field with the stemmed contents
>> (SnowballFilter, German2).
>>
>> During our tests we've found out that some keywords are just taking too
>> long to process. When we exclude the stemmed field from our edismax
>> configuration (qf) the query time was surprisingly quick (10 000x faster).
>>
>> Had one of you the same experience ?
>>
>> We are using the stemmed field only to increase the returned documents and
>> not for highlighting. We know that by applying highlighting on stemmed
>> values is not good for query speed.
>>
>> Best Regards,
>> Jens Meiners
>>
>>
>

Re: Slow query time on stemmed fields

Reply via email to