Re: Deleting spelll checker index

Lance Norskog Wed, 17 Feb 2010 19:22:30 -0800

This is a quirk of Lucene - when you delete a document, the indexed
terms for the document are not deleted. That is, if 2 documents have
the word 'frampton' in an indexed field, the term dictionary contains
the entry 'frampton' and pointers to those two documents. When you
delete those two documents, the index contains the entry 'frampton'
with an empty list of pointers. So, the terms are still there even
when you delete all of the documents.


Facets and the spellchecking dictionary build from this term
dictionary, not from the text string that are 'stored' and returned
when you search for the documents.

The <optimize> command throws away these remnant terms.

http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/

On Wed, Feb 17, 2010 at 12:24 PM, darniz <rnizamud...@edmunds.com> wrote:
>
> Please bear with me on the limitted understanding.
> i deleted all documents and i made a rebuild of my spell checker  using the
> command
> spellcheck=true&spellcheck.build=true&spellcheck.dictionary=default
>
> After this i went to the schema browser and i saw that mySpellText still has
> around 2000 values.
> How can i make sure that i clean up that field.
> We had the same issue with facets too, even though we delete all the
> documents, and if we do a facet on make we still see facets but we can
> filter out facets by saying facet.mincount>0.
>
> Again coming back to my question how can i make mySpellText fields get rid
> of all previous terms
>
> Thanks a lot
> darniz
>
>
>
> hossman wrote:
>>
>> : But still i cant stop thinking about this.
>> : i deleted my entire index and now i have 0 documents.
>> :
>> : Now if i make a query with accrd i still get a suggestion of accord even
>> : though there are no document returned since i deleted my entire index. i
>> : hope it also clear the spell check index field.
>>
>> there are two Lucene indexes when you use spell checking.
>>
>> there is the "main" index which is goverend by your schema.xml and is what
>> you add your own documents to, and what searches are run agains for the
>> result section of solr responses.
>>
>> There is also the "spell" index which has only two fields and in
>> which each "document" corrisponds to a "word" that might be returend as a
>> spelling suggestion, and the other fields contain various start/end/middle
>> ngrams that represent possible misspellings.
>>
>> When you use the spellchecker component it builds the "spell" index
>> makinga document out of every word it finds in whatever field name you
>> configure it to use.
>>
>> deleting your entire "main" index won't automaticly delete the "spell"
>> index (allthough you should be able rebuild the "spell" index using the
>> *empty* "main" index, that should work).
>>
>> : i am copying both fields to a field called
>> : <copyField source="make" dest="mySpellText"/>
>> : <copyField source="model" dest="mySpellText"/>
>>
>> ..at this point your "main" index has a field named mySpellText, and for
>> ever document it contains a copy of make and model.
>>
>> :         <lst name="spellchecker">
>> :             <str name="name">default</str>
>> :             <str name="field">mySpellText</str>
>> :             <str name="buildOnOptimize">true</str>
>> :             <str name="buildOnCommit">true</str>
>>
>> ...so whenever you commit or optimize your "main" index it will take every
>> word from the mySpellText and use them all as individual documents in the
>> "spell" index.
>>
>> In your previous email you said you changed hte copyField declaration, and
>> then triggered a commit -- that rebuilt your "spell" index, but the data
>> was still all there in the mySpellText field of the "main" index, so the
>> rebuilt "spell" index was exactly the same.
>>
>> : i have buildOnOPtmize and buildOnCommit as true so when i index new
>> document
>> : i want my dictionary to be created but how can i make sure i remove the
>> : preivious indexed terms.
>>
>> everytime the spellchecker component "builds" it will create a completley
>> new "spell" index .. but if the old data is still in the "main" index then
>> it will also be in the "spell" index.
>>
>> The only reason i can think of why you'd be seeing words in your "spell"
>> index after deleting documents from your "main" index is that even if you
>> delete documents, the Terms are still there in the underlying index untill
>> the segments are merged ... so if you do an optimize that will force them
>> to be expunged --- but i honestly have no idea if that is what's causing
>> your problem, because quite frankly i really don't understand what your
>> problem is ... you have to provide specifics: reproducible steps anyone
>> can take using a clean install of solr to see the the behavior you are
>> seeing that seems incorrect.  (ie: modifications to the example schema,
>> and commands to execute against hte demo port to see the bug)
>>
>> if you can provide details like that then it's possible to understand what
>> is going wrong for you -- which is a prereq to providing useful help.
>>
>>
>>
>> -Hoss
>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27629740.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Deleting spelll checker index

Reply via email to