Hi Tom, I tried again with: <queryResultCache class="solr.LRUCache" size="10000" initialSize="10000" autowarmCount="10000"/>
and even now the hitratio is still 0. What could be wrong with my setup? ('free -m' shows that the cache has over 2 GB free.) Regards, Peter. > Hi Peter, > > Can you give a few more examples of slow queries? > Are they phrase queries? Boolean queries? prefix or wildcard queries? > If one word queries are your slow queries, than CommonGrams won't help. > CommonGrams will only help with phrase queries. > > How are you using termvectors? That may be slowing things down. I don't > have experience with termvectors, so someone else on the list might speak to > that. > > When you say the query time for common terms stays slow, do you mean if you > re-issue the exact query, the second query is not faster? That seems very > strange. You might restart Solr, and send a first query (the first query > always takes a relatively long time.) Then pick one of your slow queries and > send it 2 times. The second time you send the query it should be much faster > due to the Solr caches and you should be able to see the cache hit in the > Solr admin panel. If you send the exact query a second time (without enough > intervening queries to evict data from the cache, ) the Solr queryResultCache > should get hit and you should see a response time in the .01-5 millisecond > range. > > What settings are you using for your Solr caches? > > How much memory is on the machine? If your bottleneck is disk i/o for > frequent terms, then you want to make sure you have enough memory for the OS > disk cache. > > I assume that http is not in your stopwords. CommonGrams will only help with > phrase queries > CommonGrams was committed and is in Solr 1.4. If you decide to use > CommonGrams you definitely need to re-index and you also need to use both the > index time filter and the query time filter. Your index will be larger. > > <fieldType name="foo" ...> > <analyzer type="index"> > <filter class="solr.CommonGramsFilterFactory" words="new400common.txt"/> > </analyzer> > > <analyzer type="query"> > <filter class="solr.CommonGramsQueryFilterFactory" words="new400common.txt"/> > </analyzer> > </fieldType> > > > > Tom > -----Original Message----- > From: Peter Karich [mailto:peat...@yahoo.de] > Sent: Tuesday, August 10, 2010 3:32 PM > To: solr-user@lucene.apache.org > Subject: Re: Improve Query Time For Large Index > > Hi Tom, > > my index is around 3GB large and I am using 2GB RAM for the JVM although > a some more is available. > If I am looking into the RAM usage while a slow query runs (via > jvisualvm) I see that only 750MB of the JVM RAM is used. > > >> Can you give us some examples of the slow queries? >> > for example the empty query solr/select?q= > takes very long or solr/select?q=http > where 'http' is the most common term > > >> Are you using stop words? >> > yes, a lot. I stored them into stopwords.txt > > >> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 >> > this looks interesting. I read through > https://issues.apache.org/jira/browse/SOLR-908 and it seems to be in 1.4. > I only need to enable it via: > > <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > > right? Do I need to reindex? > > Regards, > Peter. > > >> Hi Peter, >> >> A few more details about your setup would help list members to answer your >> questions. >> How large is your index? >> How much memory is on the machine and how much is allocated to the JVM? >> Besides the Solr caches, Solr and Lucene depend on the operating system's >> disk caching for caching of postings lists. So you need to leave some >> memory for the OS. On the other hand if you are optimizing and refreshing >> every 10-15 minutes, that will invalidate all the caches, since an optimized >> index is essentially a set of new files. >> >> Can you give us some examples of the slow queries? Are you using stop >> words? >> >> If your slow queries are phrase queries, then you might try either adding >> the most frequent terms in your index to the stopwords list or try >> CommonGrams and add them to the common words list. (Details on CommonGrams >> here: >> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2) >> >> Tom Burton-West >> >> -----Original Message----- >> From: Peter Karich [mailto:peat...@yahoo.de] >> Sent: Tuesday, August 10, 2010 9:54 AM >> To: solr-user@lucene.apache.org >> Subject: Improve Query Time For Large Index >> >> Hi, >> >> I have 5 Million small documents/tweets (=> ~3GB) and the slave index >> replicates itself from master every 10-15 minutes, so the index is >> optimized before querying. We are using solr 1.4.1 (patched with >> SOLR-1624) via SolrJ. >> >> Now the search speed is slow >2s for common terms which hits more than 2 >> mio docs and acceptable for others: <0.5s. For those numbers I don't use >> highlighting or facets. I am using the following schema [1] and from >> luke handler I know that numTerms =~20 mio. The query for common terms >> stays slow if I retry again and again (no cache improvements). >> >> How can I improve the query time for the common terms without using >> Distributed Search [2] ? >> >> Regards, >> Peter. >> >> >> [1] >> <field name="id" type="tlong" indexed="true" stored="true" >> required="true" /> >> <field name="date" type="tdate" indexed="true" stored="true" /> >> <!-- term* attributes to prepare faster highlighting. --> >> <field name="txt" type="text" indexed="true" stored="true" >> termVectors="true" termPositions="true" termOffsets="true"/> >> >> [2] >> http://wiki.apache.org/solr/DistributedSearch >> >> >> >> > > -- http://karussell.wordpress.com/