[
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605645#action_12605645
]
Grant Ingersoll commented on SOLR-572:
--------------------------------------
{quote}
Why is a WhiteSpaceTokenizer being used for tokenizing the value for a
spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer
if the index is being built from a Solr field?
The above argument also applies to queryAnalyzerFieldType which is being used
for QueryConverter
{quote}
My understanding was that the sc.q parameter was already analyzed and ready to
be checked, thus all it needed was a conversion to tokens. As for the
queryAnalyzerFieldType, that assumes the implementation is the
IndexBasedSpellChecker or some other field based one that the
SpellCheckComponent doesn't have access to, thus my reasoning that it needs to
be handled separately and explicitly, which is why it isn't a part of the
spellchecker configuration.
{quote}
I see that we can specify our own query converter through the queryConverter
section in solrconfig.xml. But the SpellCheckComponent uses
SpellingQueryConverter directly instead of an interface. We should add a
QueryConvertor interface if this needs to be pluggable.
{quote}
I thought about making it an abstract base class, but in my mind it is really
easy to override the SpellingQueryConverter and the component should know how
to deal with it.
{quote}
If name is omitted from two dictionaries in solrconfig.xml then both get named
as Default from the SolrSpellChecker#init method and they overwrite each other
in the spellCheckers map
{quote}
Hmm, not good. I will fix.
{quote}
How about building the index in the inform() method? I understand that the
users can build the index using spellcheck.build=true and they can also use
QuerySenderListener to build the index but this limits the user to use
FSDirectory because if we use RAMDirectory and solr is restarted, the
QuerySenderListener never fires and spell checker is left with no index. It's
not a major inconvenience to use FSDirectory always but then RAMDirectory
doesn't bring much to the table.
{quote}
I think this gets back to our early discussions about it not working in inform
b/c we don't have the reader at that point, or something like that. I really
don't know the right answer, but do feel free to try it out. I do think it
belongs in inform, but not sure if Solr is ready at that point. As for the
QuerySenderListener, seems like it should fire if it is restarted, but I admit
I don't know a whole lot about that functionality.
> Spell Checker as a Search Component
> -----------------------------------
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
> Issue Type: New Feature
> Components: spellchecker
> Affects Versions: 1.3
> Reporter: Shalin Shekhar Mangar
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the
> following features:
> * Allow creating a spell index on a given field and make it possible to have
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.