Re: Improve Query Time For Large Index

Peter Karich Thu, 12 Aug 2010 03:25:24 -0700

Hi Tom,

I tried again with:
  <queryResultCache class="solr.LRUCache" size="10000" initialSize="10000"
        autowarmCount="10000"/>


and even now the hitratio is still 0. What could be wrong with my setup?

('free -m' shows that the cache has over 2 GB free.)

Regards,
Peter.

> Hi Peter,
>
> Can you give a few more examples of slow queries?  
> Are they phrase queries? Boolean queries? prefix or wildcard queries?
> If one word queries are your slow queries, than CommonGrams won't help.  
> CommonGrams will only help with phrase queries.
>
> How are you using termvectors?  That may be slowing things down.  I don't 
> have experience with termvectors, so someone else on the list might speak to 
> that.
>
> When you say the query time for common terms stays slow, do you mean if you 
> re-issue the exact query, the second query is not faster?  That seems very 
> strange.  You might restart Solr, and send a first query (the first query 
> always takes a relatively long time.)  Then pick one of your slow queries and 
> send it 2 times.  The second time you send the query it should be much faster 
> due to the Solr caches and you should be able to see the cache hit in the 
> Solr admin panel.  If you send the exact query a second time (without enough 
> intervening queries to evict data from the cache, ) the Solr queryResultCache 
> should get hit and you should see a response time in the .01-5 millisecond 
> range.
>
> What settings are you using for your Solr caches?
>
> How much memory is on the machine?  If your bottleneck is disk i/o for 
> frequent terms, then you want to make sure you have enough memory for the OS 
> disk cache.  
>
> I assume that http is not in your stopwords.  CommonGrams will only help with 
> phrase queries
> CommonGrams was committed and is in Solr 1.4.  If you decide to use 
> CommonGrams you definitely need to re-index and you also need to use both the 
> index time filter and the query time filter.  Your index will be larger.
>
> <fieldType name="foo" ...>
> <analyzer type="index">
> <filter class="solr.CommonGramsFilterFactory" words="new400common.txt"/>
> </analyzer>
>
> <analyzer type="query">
> <filter class="solr.CommonGramsQueryFilterFactory" words="new400common.txt"/>
> </analyzer>
> </fieldType>
>
>
>
> Tom
> -----Original Message-----
> From: Peter Karich [mailto:peat...@yahoo.de] 
> Sent: Tuesday, August 10, 2010 3:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Improve Query Time For Large Index
>
> Hi Tom,
>
> my index is around 3GB large and I am using 2GB RAM for the JVM although
> a some more is available.
> If I am looking into the RAM usage while a slow query runs (via
> jvisualvm) I see that only 750MB of the JVM RAM is used.
>
>   
>> Can you give us some examples of the slow queries?
>>     
> for example the empty query solr/select?q=
> takes very long or solr/select?q=http
> where 'http' is the most common term
>
>   
>> Are you using stop words?  
>>     
> yes, a lot. I stored them into stopwords.txt
>
>   
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
>>     
> this looks interesting. I read through
> https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4.
> I only need to enable it via:
>
> <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>
> right? Do I need to reindex?
>
> Regards,
> Peter.
>
>   
>> Hi Peter,
>>
>> A few more details about your setup would help list members to answer your 
>> questions.
>> How large is your index?  
>> How much memory is on the machine and how much is allocated to the JVM?
>> Besides the Solr caches, Solr and Lucene depend on the operating system's 
>> disk caching for caching of postings lists.  So you need to leave some 
>> memory for the OS.  On the other hand if you are optimizing and refreshing 
>> every 10-15 minutes, that will invalidate all the caches, since an optimized 
>> index is essentially a set of new files.
>>
>> Can you give us some examples of the slow queries?  Are you using stop 
>> words?  
>>
>> If your slow queries are phrase queries, then you might try either adding 
>> the most frequent terms in your index to the stopwords list  or try 
>> CommonGrams and add them to the common words list.  (Details on CommonGrams 
>> here: 
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2)
>>
>> Tom Burton-West
>>
>> -----Original Message-----
>> From: Peter Karich [mailto:peat...@yahoo.de] 
>> Sent: Tuesday, August 10, 2010 9:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: Improve Query Time For Large Index
>>
>> Hi,
>>
>> I have 5 Million small documents/tweets (=> ~3GB) and the slave index
>> replicates itself from master every 10-15 minutes, so the index is
>> optimized before querying. We are using solr 1.4.1 (patched with
>> SOLR-1624) via SolrJ.
>>
>> Now the search speed is slow >2s for common terms which hits more than 2
>> mio docs and acceptable for others: <0.5s. For those numbers I don't use
>> highlighting or facets. I am using the following schema [1] and from
>> luke handler I know that numTerms =~20 mio. The query for common terms
>> stays slow if I retry again and again (no cache improvements).
>>
>> How can I improve the query time for the common terms without using
>> Distributed Search [2] ?
>>
>> Regards,
>> Peter.
>>
>>
>> [1]
>> <field name="id" type="tlong" indexed="true" stored="true"
>> required="true" />
>> <field name="date" type="tdate" indexed="true" stored="true" />
>> <!-- term* attributes to prepare faster highlighting. -->
>> <field name="txt" type="text" indexed="true" stored="true"
>>                termVectors="true" termPositions="true" termOffsets="true"/>
>>
>> [2]
>> http://wiki.apache.org/solr/DistributedSearch
>>
>>
>>   
>>     
>
>   


-- 
http://karussell.wordpress.com/

Re: Improve Query Time For Large Index

Reply via email to