Hi Peter, Can you give a few more examples of slow queries? Are they phrase queries? Boolean queries? prefix or wildcard queries? If one word queries are your slow queries, than CommonGrams won't help. CommonGrams will only help with phrase queries.
How are you using termvectors? That may be slowing things down. I don't have experience with termvectors, so someone else on the list might speak to that. When you say the query time for common terms stays slow, do you mean if you re-issue the exact query, the second query is not faster? That seems very strange. You might restart Solr, and send a first query (the first query always takes a relatively long time.) Then pick one of your slow queries and send it 2 times. The second time you send the query it should be much faster due to the Solr caches and you should be able to see the cache hit in the Solr admin panel. If you send the exact query a second time (without enough intervening queries to evict data from the cache, ) the Solr queryResultCache should get hit and you should see a response time in the .01-5 millisecond range. What settings are you using for your Solr caches? How much memory is on the machine? If your bottleneck is disk i/o for frequent terms, then you want to make sure you have enough memory for the OS disk cache. I assume that http is not in your stopwords. CommonGrams will only help with phrase queries CommonGrams was committed and is in Solr 1.4. If you decide to use CommonGrams you definitely need to re-index and you also need to use both the index time filter and the query time filter. Your index will be larger. <fieldType name="foo" ...> <analyzer type="index"> <filter class="solr.CommonGramsFilterFactory" words="new400common.txt"/> </analyzer> <analyzer type="query"> <filter class="solr.CommonGramsQueryFilterFactory" words="new400common.txt"/> </analyzer> </fieldType> Tom -----Original Message----- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, August 10, 2010 3:32 PM To: solr-user@lucene.apache.org Subject: Re: Improve Query Time For Large Index Hi Tom, my index is around 3GB large and I am using 2GB RAM for the JVM although a some more is available. If I am looking into the RAM usage while a slow query runs (via jvisualvm) I see that only 750MB of the JVM RAM is used. > Can you give us some examples of the slow queries? for example the empty query solr/select?q= takes very long or solr/select?q=http where 'http' is the most common term > Are you using stop words? yes, a lot. I stored them into stopwords.txt > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 this looks interesting. I read through https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4. I only need to enable it via: <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" words="stopwords.txt"/> right? Do I need to reindex? Regards, Peter. > Hi Peter, > > A few more details about your setup would help list members to answer your > questions. > How large is your index? > How much memory is on the machine and how much is allocated to the JVM? > Besides the Solr caches, Solr and Lucene depend on the operating system's > disk caching for caching of postings lists. So you need to leave some memory > for the OS. On the other hand if you are optimizing and refreshing every > 10-15 minutes, that will invalidate all the caches, since an optimized index > is essentially a set of new files. > > Can you give us some examples of the slow queries? Are you using stop words? > > > If your slow queries are phrase queries, then you might try either adding the > most frequent terms in your index to the stopwords list or try CommonGrams > and add them to the common words list. (Details on CommonGrams here: > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2) > > Tom Burton-West > > -----Original Message----- > From: Peter Karich [mailto:peat...@yahoo.de] > Sent: Tuesday, August 10, 2010 9:54 AM > To: solr-user@lucene.apache.org > Subject: Improve Query Time For Large Index > > Hi, > > I have 5 Million small documents/tweets (=> ~3GB) and the slave index > replicates itself from master every 10-15 minutes, so the index is > optimized before querying. We are using solr 1.4.1 (patched with > SOLR-1624) via SolrJ. > > Now the search speed is slow >2s for common terms which hits more than 2 > mio docs and acceptable for others: <0.5s. For those numbers I don't use > highlighting or facets. I am using the following schema [1] and from > luke handler I know that numTerms =~20 mio. The query for common terms > stays slow if I retry again and again (no cache improvements). > > How can I improve the query time for the common terms without using > Distributed Search [2] ? > > Regards, > Peter. > > > [1] > <field name="id" type="tlong" indexed="true" stored="true" > required="true" /> > <field name="date" type="tdate" indexed="true" stored="true" /> > <!-- term* attributes to prepare faster highlighting. --> > <field name="txt" type="text" indexed="true" stored="true" > termVectors="true" termPositions="true" termOffsets="true"/> > > [2] > http://wiki.apache.org/solr/DistributedSearch > > > -- http://karussell.wordpress.com/