RE: SpellCheckComponent performance

Dyer, James Tue, 07 Jun 2011 07:39:15 -0700

Demian,

If you omit "spellcheckIndexDir" from the configuration, it will create an 
in-memory spelling dictionary.


James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Demian Katz [mailto:demian.k...@villanova.edu] 
Sent: Tuesday, June 07, 2011 7:59 AM
To: solr-user@lucene.apache.org
Subject: RE: SpellCheckComponent performance

As I may have mentioned before, VuFind is actually doing two Solr queries for 
every search -- a base query that gets basic spelling suggestions, and a 
supplemental spelling-only query that gets shingled spelling suggestions.  If 
there's a way to get two different spelling responses in a single query, I'd 
love to hear about it...  but the double-querying doesn't seem to be a huge 
problem -- the delays I'm talking about are in the spelling portion of the 
initial query.  Just for the sake of completeness, here are both of my spelling 
field types:

    <!-- Basic Text Field for use with Spell Correction -->
    <fieldType name="textSpell" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="schema.UnicodeNormalizationFilterFactory" 
version="icu4j" composed="false" remove_diacritics="true" 
remove_modifiers="true" fold="true"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
    <!-- More advanced spell checking field. -->
    <fieldType name="textSpellShingle" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2" 
outputUnigrams="false"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2" 
outputUnigrams="false"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

...and here are the fields:

   <field name="spelling" type="textSpell" indexed="true" stored="true"/>
   <field name="spellingShingle" type="textSpellShingle" indexed="true" 
stored="true" multiValued="true"/>

As you can probably guess, I'm using spelling in my main query and 
spellingShingle in my supplemental query.

Here are stats on the spelling field:

{field=spelling,memSize=107830314,tindexSize=249184,time=25747,phase1=25150,nTerms=1343061,bigTerms=231,termInstances=40960454,uses=1}

(I obtained these numbers by temporarily adding the spelling field as a facet 
to my warming query -- probably not a very smart way to do it, but it was the 
only way I could figure out!  If there's a more elegant and accurate approach, 
I'd be interested to know what it is.)

I should also note that my basic spelling index is 114MB and my shingled 
spelling index is 931MB -- not outrageously large.  Is there a way to persuade 
Solr to load these into memory for faster performance?

thanks,
Demian

> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, June 06, 2011 6:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SpellCheckComponent performance
> 
> Hmmm, how are you configuring you spell checker? The first-time
> slowdown
> is probably due to cache warming, but subsequent 500 ms slowdowns
> seem odd. How many unique terms are there in your spellecheck index?
> 
> It'd probably be best if you showed us your fieldtype and field
> definition...
> 
> Best
> Erick
> 
> On Mon, Jun 6, 2011 at 4:04 PM, Demian Katz <demian.k...@villanova.edu>
> wrote:
> > I'm continuing to work on tuning my Solr server, and now I'm noticing
> that my biggest bottleneck is the SpellCheckComponent.  This is eating
> multiple seconds on most first-time searches, and still taking around
> 500ms even on cached searches.  Here is my configuration:
> >
> >  <searchComponent name="spellcheck"
> class="org.apache.solr.handler.component.SpellCheckComponent">
> >    <lst name="spellchecker">
> >      <str name="name">basicSpell</str>
> >      <str name="field">spelling</str>
> >      <str name="accuracy">0.75</str>
> >      <str name="spellcheckIndexDir">./spellchecker</str>
> >      <str name="queryAnalyzerFieldType">textSpell</str>
> >      <str name="buildOnOptimize">true</str>
> >    </lst>
> >  </searchComponent>
> >
> > I've done a bit of searching, but the best advice I could find for
> making the search component go faster involved reducing
> spellcheck.maxCollationTries, which doesn't even seem to apply to my
> settings.
> >
> > Does anyone have any advice on tuning this aspect of my
> configuration?  Are there any extra debug settings that might give
> deeper insight into how the component is spending its time?
> >
> > thanks,
> > Demian
> >

RE: SpellCheckComponent performance

Reply via email to