DocValues is the new black
http://wiki.apache.org/solr/DocValues

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
SOLR Performance Monitoring -- http://sematext.com/spm



On Fri, Oct 18, 2013 at 12:30 PM, Lemke, Michael  SZ/HZA-ZSW
<lemke...@schaeffler.com> wrote:
> Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
>>Lemke, Michael  SZ/HZA-ZSW [lemke...@schaeffler.com] wrote:
>>> 1. 
>>> q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
>>> 2. 
>>> q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
>>
>>> The only difference is am empty facet.prefix in the first query.
>>
>>> The first query returns after some 20 seconds (QTime 20000 in the result) 
>>> while
>>> the second one takes only 80 msec (QTime 80). Why is this?
>>
>>If you index was just opened when you issued your queries, the first request 
>>will be notably slower than the second as the facet values might not be in
> the disk cache.
>
> I know but it shouldn't be orders of magnitudes as in this example, should it?
>
>>
>>Furthermore, for enum the difference between no prefix and some prefix is 
>>huge. As enum iterates values first (as opposed to fc that iterates hits 
>>first), limiting to only the values that starts with 'a' ought to speed up 
>>retrieval by a factor 10 or more.
>
> Thanks.  That is what we sort of figured but it's good to know for sure.  Of 
> course it begs the question if there is a way to speed this up?
>
>>
>>> And as side note: facet.method=fc makes the queries run 'forever' and 
>>> eventually
>>> fail with org.apache.solr.common.SolrException: Too many values for 
>>> UnInvertedField faceting on field CONTENT.
>>
>>An internal memory structure optimization in Solr limits the amount of 
>>possible unique values when using fc. It is not a bug as such, but more a 
>>consequence of a choice. Unfortunately the enum-solution is normally quite 
>>slow when there are enough unique values to trigger the "too many 
>>values"-exception. I know too little about the structures for DocValues to 
>>say if they will help here, but you might want to take a look at those.
>
> What is DocValues?  Haven't heard of it yet.  And yes, the fc method was 
> terribly slow in a case where it did work.  Something like 20 minutes whereas 
> enum returned within a few seconds.
>
> Michael
>

Reply via email to