Hi Peter,

If hits aren't showing up, and you aren't getting any queryResultCache hits 
even with the exact query being repeated, something is very wrong.  I'd suggest 
first getting the query result cache working, and then moving on to look at 
other possible bottlenecks.  

What are your settings for queryResultWindowSize and queryResultMaxDocsCached?

Following up on Robert's point, you might also try to run a few queries in the 
admin interface with the debug flag on to see if the query parser is creating 
phrase queries (assuming you have queries like http://foo.bar.baz).  The 
debug/explain will indicate whether the parsed query is a PhraseQuery.

Tom



-----Original Message-----
From: Peter Karich [mailto:peat...@yahoo.de] 
Sent: Thursday, August 12, 2010 5:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Improve Query Time For Large Index

Hi Tom,

I tried again with:
  <queryResultCache class="solr.LRUCache" size="10000" initialSize="10000"
        autowarmCount="10000"/>

and even now the hitratio is still 0. What could be wrong with my setup?

('free -m' shows that the cache has over 2 GB free.)

Regards,
Peter.

> Hi Peter,
>
> Can you give a few more examples of slow queries?  
> Are they phrase queries? Boolean queries? prefix or wildcard queries?
> If one word queries are your slow queries, than CommonGrams won't help.  
> CommonGrams will only help with phrase queries.
>
> How are you using termvectors?  That may be slowing things down.  I don't 
> have experience with termvectors, so someone else on the list might speak to 
> that.
>
> When you say the query time for common terms stays slow, do you mean if you 
> re-issue the exact query, the second query is not faster?  That seems very 
> strange.  You might restart Solr, and send a first query (the first query 
> always takes a relatively long time.)  Then pick one of your slow queries and 
> send it 2 times.  The second time you send the query it should be much faster 
> due to the Solr caches and you should be able to see the cache hit in the 
> Solr admin panel.  If you send the exact query a second time (without enough 
> intervening queries to evict data from the cache, ) the Solr queryResultCache 
> should get hit and you should see a response time in the .01-5 millisecond 
> range.
>
> What settings are you using for your Solr caches?
>
> How much memory is on the machine?  If your bottleneck is disk i/o for 
> frequent terms, then you want to make sure you have enough memory for the OS 
> disk cache.  
>
> I assume that http is not in your stopwords.  CommonGrams will only help with 
> phrase queries
> CommonGrams was committed and is in Solr 1.4.  If you decide to use 
> CommonGrams you definitely need to re-index and you also need to use both the 
> index time filter and the query time filter.  Your index will be larger.
>
> <fieldType name="foo" ...>
> <analyzer type="index">
> <filter class="solr.CommonGramsFilterFactory" words="new400common.txt"/>
> </analyzer>
>
> <analyzer type="query">
> <filter class="solr.CommonGramsQueryFilterFactory" words="new400common.txt"/>
> </analyzer>
> </fieldType>
>
>
>
> Tom
> -----Original Message-----
> From: Peter Karich [mailto:peat...@yahoo.de] 
> Sent: Tuesday, August 10, 2010 3:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Improve Query Time For Large Index
>
> Hi Tom,
>
> my index is around 3GB large and I am using 2GB RAM for the JVM although
> a some more is available.
> If I am looking into the RAM usage while a slow query runs (via
> jvisualvm) I see that only 750MB of the JVM RAM is used.
>
>   
>> Can you give us some examples of the slow queries?
>>     
> for example the empty query solr/select?q=
> takes very long or solr/select?q=http
> where 'http' is the most common term
>
>   
>> Are you using stop words?  
>>     
> yes, a lot. I stored them into stopwords.txt
>
>   
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
>>     
> this looks interesting. I read through
> https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4.
> I only need to enable it via:
>
> <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>
> right? Do I need to reindex?
>
> Regards,
> Peter.
>
>   
>> Hi Peter,
>>
>> A few more details about your setup would help list members to answer your 
>> questions.
>> How large is your index?  
>> How much memory is on the machine and how much is allocated to the JVM?
>> Besides the Solr caches, Solr and Lucene depend on the operating system's 
>> disk caching for caching of postings lists.  So you need to leave some 
>> memory for the OS.  On the other hand if you are optimizing and refreshing 
>> every 10-15 minutes, that will invalidate all the caches, since an optimized 
>> index is essentially a set of new files.
>>
>> Can you give us some examples of the slow queries?  Are you using stop 
>> words?  
>>
>> If your slow queries are phrase queries, then you might try either adding 
>> the most frequent terms in your index to the stopwords list  or try 
>> CommonGrams and add them to the common words list.  (Details on CommonGrams 
>> here: 
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2)
>>
>> Tom Burton-West
>>
>> -----Original Message-----
>> From: Peter Karich [mailto:peat...@yahoo.de] 
>> Sent: Tuesday, August 10, 2010 9:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: Improve Query Time For Large Index
>>
>> Hi,
>>
>> I have 5 Million small documents/tweets (=> ~3GB) and the slave index
>> replicates itself from master every 10-15 minutes, so the index is
>> optimized before querying. We are using solr 1.4.1 (patched with
>> SOLR-1624) via SolrJ.
>>
>> Now the search speed is slow >2s for common terms which hits more than 2
>> mio docs and acceptable for others: <0.5s. For those numbers I don't use
>> highlighting or facets. I am using the following schema [1] and from
>> luke handler I know that numTerms =~20 mio. The query for common terms
>> stays slow if I retry again and again (no cache improvements).
>>
>> How can I improve the query time for the common terms without using
>> Distributed Search [2] ?
>>
>> Regards,
>> Peter.
>>
>>
>> [1]
>> <field name="id" type="tlong" indexed="true" stored="true"
>> required="true" />
>> <field name="date" type="tdate" indexed="true" stored="true" />
>> <!-- term* attributes to prepare faster highlighting. -->
>> <field name="txt" type="text" indexed="true" stored="true"
>>                termVectors="true" termPositions="true" termOffsets="true"/>
>>
>> [2]
>> http://wiki.apache.org/solr/DistributedSearch
>>
>>
>>   
>>     
>
>   


-- 
http://karussell.wordpress.com/

Reply via email to