RE: Improve Query Time For Large Index

Burton-West, Tom Wed, 11 Aug 2010 08:48:15 -0700

Hi Peter,

Can you give a few more examples of slow queries?  
Are they phrase queries? Boolean queries? prefix or wildcard queries?
If one word queries are your slow queries, than CommonGrams won't help.  
CommonGrams will only help with phrase queries.


How are you using termvectors?  That may be slowing things down.  I don't have 
experience with termvectors, so someone else on the list might speak to that.

When you say the query time for common terms stays slow, do you mean if you 
re-issue the exact query, the second query is not faster?  That seems very 
strange.  You might restart Solr, and send a first query (the first query 
always takes a relatively long time.)  Then pick one of your slow queries and 
send it 2 times.  The second time you send the query it should be much faster 
due to the Solr caches and you should be able to see the cache hit in the Solr 
admin panel.  If you send the exact query a second time (without enough 
intervening queries to evict data from the cache, ) the Solr queryResultCache 
should get hit and you should see a response time in the .01-5 millisecond 
range.

What settings are you using for your Solr caches?

How much memory is on the machine?  If your bottleneck is disk i/o for frequent 
terms, then you want to make sure you have enough memory for the OS disk cache. 
 

I assume that http is not in your stopwords.  CommonGrams will only help with 
phrase queries
CommonGrams was committed and is in Solr 1.4.  If you decide to use CommonGrams 
you definitely need to re-index and you also need to use both the index time 
filter and the query time filter.  Your index will be larger.

<fieldType name="foo" ...>
<analyzer type="index">
<filter class="solr.CommonGramsFilterFactory" words="new400common.txt"/>
</analyzer>

<analyzer type="query">
<filter class="solr.CommonGramsQueryFilterFactory" words="new400common.txt"/>
</analyzer>
</fieldType>



Tom
-----Original Message-----
From: Peter Karich [mailto:peat...@yahoo.de] 
Sent: Tuesday, August 10, 2010 3:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Improve Query Time For Large Index

Hi Tom,

my index is around 3GB large and I am using 2GB RAM for the JVM although
a some more is available.
If I am looking into the RAM usage while a slow query runs (via
jvisualvm) I see that only 750MB of the JVM RAM is used.

> Can you give us some examples of the slow queries?

for example the empty query solr/select?q=
takes very long or solr/select?q=http
where 'http' is the most common term

> Are you using stop words?  

yes, a lot. I stored them into stopwords.txt

> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2

this looks interesting. I read through
https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4.
I only need to enable it via:

<filter class="solr.CommonGramsFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>

right? Do I need to reindex?

Regards,
Peter.

> Hi Peter,
>
> A few more details about your setup would help list members to answer your 
> questions.
> How large is your index?  
> How much memory is on the machine and how much is allocated to the JVM?
> Besides the Solr caches, Solr and Lucene depend on the operating system's 
> disk caching for caching of postings lists.  So you need to leave some memory 
> for the OS.  On the other hand if you are optimizing and refreshing every 
> 10-15 minutes, that will invalidate all the caches, since an optimized index 
> is essentially a set of new files.
>
> Can you give us some examples of the slow queries?  Are you using stop words? 
>  
>
> If your slow queries are phrase queries, then you might try either adding the 
> most frequent terms in your index to the stopwords list  or try CommonGrams 
> and add them to the common words list.  (Details on CommonGrams here: 
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2)
>
> Tom Burton-West
>
> -----Original Message-----
> From: Peter Karich [mailto:peat...@yahoo.de] 
> Sent: Tuesday, August 10, 2010 9:54 AM
> To: solr-user@lucene.apache.org
> Subject: Improve Query Time For Large Index
>
> Hi,
>
> I have 5 Million small documents/tweets (=> ~3GB) and the slave index
> replicates itself from master every 10-15 minutes, so the index is
> optimized before querying. We are using solr 1.4.1 (patched with
> SOLR-1624) via SolrJ.
>
> Now the search speed is slow >2s for common terms which hits more than 2
> mio docs and acceptable for others: <0.5s. For those numbers I don't use
> highlighting or facets. I am using the following schema [1] and from
> luke handler I know that numTerms =~20 mio. The query for common terms
> stays slow if I retry again and again (no cache improvements).
>
> How can I improve the query time for the common terms without using
> Distributed Search [2] ?
>
> Regards,
> Peter.
>
>
> [1]
> <field name="id" type="tlong" indexed="true" stored="true"
> required="true" />
> <field name="date" type="tdate" indexed="true" stored="true" />
> <!-- term* attributes to prepare faster highlighting. -->
> <field name="txt" type="text" indexed="true" stored="true"
>                termVectors="true" termPositions="true" termOffsets="true"/>
>
> [2]
> http://wiki.apache.org/solr/DistributedSearch
>
>
>   


-- 
http://karussell.wordpress.com/

RE: Improve Query Time For Large Index

Reply via email to