[jira] Commented: (SOLR-630) Spellchecker should not be case-sensitive and should be stopwords-aware

Alex Baranov (JIRA) Tue, 18 Aug 2009 20:43:40 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744860#action_12744860
 ]


Alex Baranov commented on SOLR-630:
-----------------------------------

I would propose to close this bug.

1) of->oft
  Whether stop words are omitted or not depends on:
    a. If "q" parameter is used, then "queryAnalyzerFieldType" parameter is 
used to determine the analyzer for the query. If "queryAnalyzerFieldType" is 
not specified, then WhitespaceTokenizer is used.
    b. If "spellcheck.q" parameter is used, then query analyzer of the 
spellchecker field is used.

2) America->Americaa, america->[none]
  I couldn't reproduce that. The results are the same as for "America" as for 
"america". However, spellchecker is really case-sensitive. For example, if 
there is "AmErIcAa" in the spellchecker index then this suggestion won't appear 
neither for "America" nor for "america", but would appear for "AmErIcA".
  The reason, why America->Americaa, america->Americaa lies in the n-gram 
method which is used in lucene spellchecker: for America and america the same 
grams are defined, the only difference is "startN" gram. Actually there is 
still might be a difference in the results: the method works so that it boosts 
the relevance of the suggestion if the first N letters of it are the same as in 
the word under spellcheck.

  I'm not sure whether case-sensitiveness(is it a word?) is a bug or not. 
Anyway, finding suggestions as well as creating the index for spellchecker is 
delegated to the Lucene SpellChecker, so this is Lucene issue, not Solr.

P.S. I believe that one can avoid case-sensitive issue by configuring properly 
the analyzers (e.g. for the spellchecker field).

> Spellchecker should not be case-sensitive and should be stopwords-aware
> -----------------------------------------------------------------------
>
>                 Key: SOLR-630
>                 URL: https://issues.apache.org/jira/browse/SOLR-630
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 1.5
>
>
> Here are 2 more bugs:
> 1)
> Search for:
>   united states of America
> Suggests:
>  united states oft America
> It looks like the SC doesn't check stopwords, and "of" is a stopword.  Thus, 
> it does not exist in the index,
> but "oft" does, so SC suggests "oft" and thinks "of" is misspelled.  I think 
> the SC component should check the list of
> stopwords, too, no?
> 2)
> Search for:
>  united states of America
> Suggests:
>  united states oftAmericaa
> The of->oft is described above.  But note how SC suggested America->Americaa, 
> but it didn't do that for "america".
> This looks like case-sensitivity problem.  Shouldn't the SC be 
> case-insensitive?
> I can't produce a patch now (no src handy), so I'm hoping Grant or somebody 
> else can do it based on this report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-630) Spellchecker should not be case-sensitive and should be stopwords-aware

Reply via email to