Demian, If you omit "spellcheckIndexDir" from the configuration, it will create an in-memory spelling dictionary.
James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Demian Katz [mailto:demian.k...@villanova.edu] Sent: Tuesday, June 07, 2011 7:59 AM To: solr-user@lucene.apache.org Subject: RE: SpellCheckComponent performance As I may have mentioned before, VuFind is actually doing two Solr queries for every search -- a base query that gets basic spelling suggestions, and a supplemental spelling-only query that gets shingled spelling suggestions. If there's a way to get two different spelling responses in a single query, I'd love to hear about it... but the double-querying doesn't seem to be a huge problem -- the delays I'm talking about are in the spelling portion of the initial query. Just for the sake of completeness, here are both of my spelling field types: <!-- Basic Text Field for use with Spell Correction --> <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j" composed="false" remove_diacritics="true" remove_modifiers="true" fold="true"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <!-- More advanced spell checking field. --> <fieldType name="textSpellShingle" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> ...and here are the fields: <field name="spelling" type="textSpell" indexed="true" stored="true"/> <field name="spellingShingle" type="textSpellShingle" indexed="true" stored="true" multiValued="true"/> As you can probably guess, I'm using spelling in my main query and spellingShingle in my supplemental query. Here are stats on the spelling field: {field=spelling,memSize=107830314,tindexSize=249184,time=25747,phase1=25150,nTerms=1343061,bigTerms=231,termInstances=40960454,uses=1} (I obtained these numbers by temporarily adding the spelling field as a facet to my warming query -- probably not a very smart way to do it, but it was the only way I could figure out! If there's a more elegant and accurate approach, I'd be interested to know what it is.) I should also note that my basic spelling index is 114MB and my shingled spelling index is 931MB -- not outrageously large. Is there a way to persuade Solr to load these into memory for faster performance? thanks, Demian > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, June 06, 2011 6:23 PM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent performance > > Hmmm, how are you configuring you spell checker? The first-time > slowdown > is probably due to cache warming, but subsequent 500 ms slowdowns > seem odd. How many unique terms are there in your spellecheck index? > > It'd probably be best if you showed us your fieldtype and field > definition... > > Best > Erick > > On Mon, Jun 6, 2011 at 4:04 PM, Demian Katz <demian.k...@villanova.edu> > wrote: > > I'm continuing to work on tuning my Solr server, and now I'm noticing > that my biggest bottleneck is the SpellCheckComponent. This is eating > multiple seconds on most first-time searches, and still taking around > 500ms even on cached searches. Here is my configuration: > > > > <searchComponent name="spellcheck" > class="org.apache.solr.handler.component.SpellCheckComponent"> > > <lst name="spellchecker"> > > <str name="name">basicSpell</str> > > <str name="field">spelling</str> > > <str name="accuracy">0.75</str> > > <str name="spellcheckIndexDir">./spellchecker</str> > > <str name="queryAnalyzerFieldType">textSpell</str> > > <str name="buildOnOptimize">true</str> > > </lst> > > </searchComponent> > > > > I've done a bit of searching, but the best advice I could find for > making the search component go faster involved reducing > spellcheck.maxCollationTries, which doesn't even seem to apply to my > settings. > > > > Does anyone have any advice on tuning this aspect of my > configuration? Are there any extra debug settings that might give > deeper insight into how the component is spending its time? > > > > thanks, > > Demian > >