RE: LowerCaseFilterFactory and spellchecker

Norskog, Lance Wed, 28 Nov 2007 17:28:25 -0800

Oops, sorry, didn't think that through.

The query to the spellchecker is not filtered through the field query
definition. You have to do your own lower-case transformation when you
do the query.  This is a simple thing to resolve. But, I'm working with
international alphabets and I would like 'protege' and 'protege with
both e's accented` to match. The ISOLatin1 filter does this in indexing
& querying. But I have to rip off the code and use it in my app to
preprocess words for spell-checks.


Lance

-----Original Message-----
From: Rob Casson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 28, 2007 5:16 PM
To: solr-user@lucene.apache.org
Subject: Re: LowerCaseFilterFactory and spellchecker

lance,

thanks for the quick reply....looks like 'thorne' is getting added to
the dictionary, as it comes up as a suggestion for 'Thorne'

i could certainly just lowercase in my client, but just confirming that
i'm not just screwing it up in the firstplace :)

thanks again,
rc

On Nov 28, 2007 8:11 PM, Norskog, Lance <[EMAIL PROTECTED]> wrote:
> There are a few parameters for limiting what words are added to the 
> dictionary.  You might be trimming out 'thorne'. See this page:
>
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
>
> -----Original Message-----
> From: Rob Casson [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 28, 2007 4:25 PM
> To: solr-user@lucene.apache.org
> Subject: LowerCaseFilterFactory and spellchecker
>
> think i'm just doing something wrong...
>
> was experimenting with the spellcheck handler with the nightly 
> checkout from 11-28; seems my spellchecking is case-sensitive, even 
> tho i think i'm adding the LowerCaseFilterFactory to both the index 
> and query analyzers.
>
> here's a brief rundown of my testing steps.
>
> from schema.xml:
>
> <fieldtype name="spell" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.StandardFilterFactory"/>
>                 <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.StandardFilterFactory"/>
>                 <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
> </fieldtype>
>
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="true"/>
> <field name="spelling" type="spell" indexed="true" stored="stored"
> multiValued="true"/>
>
> <copyField source="title" dest="spelling"/>
>
> --------------------------------
>
> from solrconfig.xml:
>
> <requestHandler name="spellchecker"
> class="solr.SpellCheckerRequestHandler" startup="lazy">
>         <lst name="defaults">
>                 <int name="suggestionCount">1</int>
>                 <float name="accuracy">0.5</float>
>         </lst>
>         <str name="spellcheckerIndexDir">spell</str>
>         <str name="termSourceField">spelling</str>
> </requestHandler>
>
> --------------------------------
>
> adding the doc:
>
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<add><doc><field
> name="title">Thorne</field></doc></add>'
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<optimize />'
>
> --------------------------------
>
> building the spellchecker:
>
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuil
> d
>
> --------------------------------
>
> querying the spellchecker:
>
> results from 
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?> <response>
>         <lst name="responseHeader">
>                 <int name="status">0</int>
>                 <int name="QTime">1</int>
>         </lst>
>         <str name="words">Thorne</str>
>         <str name="exist">false</str>
>         <arr name="suggestions">
>                 <str>thorne</str>
>         </arr>
> </response>
>
> results from 
> http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?> <response>
>         <lst name="responseHeader">
>                 <int name="status">0</int>
>                 <int name="QTime">2</int>
>         </lst>
>                 <str name="words">thorne</str>
>                 <str name="exist">true</str>
>         <arr name="suggestions"/>
> </response>
>
>
> any pointers as to what i'm doing wrong, misinterpreting?  i suspect
i'm
> just doing something bone-headed in the analyzer sections...
>
> thanks as always,
>
> rob casson
> miami university libraries
>

RE: LowerCaseFilterFactory and spellchecker

Reply via email to