[jira] Updated: (SOLR-572) Spell Checker as a Search Component

Shalin Shekhar Mangar (JIRA) Thu, 15 May 2008 10:46:20 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shalin Shekhar Mangar updated SOLR-572:
---------------------------------------

    Attachment: SOLR-572.patch

A first cut for this issue. Please consider this as work in progress. I've 
posted this to get feedback on the approach and syntax.

The contains the following:
* SpellCheckComponent is an implementation of SearchComponent
* The configuration is specified in solrconfig.xml with multiple "dictionary" 
nodes. Each dictionary must have a name and a type. The name must be specified 
during query time. The type is needed to allow for more than one way of loading 
data into the spell index (solr field or file). For example:
{code:xml}
<searchComponent name="spellcheck" 
class="org.apache.solr.handler.component.SpellCheckComponent">
        <lst name="dictionary">
                <str name="name">default</str>
                <str name="type">solr</str>
                <str name="field">word</str>
                <str name="indexDir">c:/temp/spellindex</str>
        </lst>
        <lst name="dictionary">
                <str name="name">external</str>
                <str name="type">file</str>
                <str name="path">spellings.txt</str>
        </lst>
</searchComponent>
{code}
* If indexDir is not present in the dictionary's configuration then a 
RAMDirectory is used, otherwise a FSDirectory is used.
* This patch supports dictionaries loaded from Solr fields.
* A separate Lucene SpellChecker is created for each configured dictionary
* Sample query syntax is as follows:
** 
{{/select/?q=aura&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.dictionary=default&spellcheck.count=10}}
** 
{{/select/?q=toyata&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.dictionary=default}}
* The value for "q" is analyzed with the Solr field's query analyzer. 
Suggestions for each token are fetched separately.
* Only one suggestion for a query is given by default. This should be used for 
multi-token queries.
* If spellcheck.count is specified then the response has a number of 
suggestions <= spellcheck.count for each token separately.
* Only unique words are returned in the suggestions.

Things to be done:
* Add JUnit tests
* Reloading dictionaries. Currently the dictionary is loaded only once during 
the first request.
* Make things more configurable like SpellCheckerRequestHandler
* Add support for onlyMorePopular flag as in SpellCheckerRequestHandler

> Spell Checker as a Search Component
> -----------------------------------
>
>                 Key: SOLR-572
>                 URL: https://issues.apache.org/jira/browse/SOLR-572
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>             Fix For: 1.3
>
>         Attachments: SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-572) Spell Checker as a Search Component

Reply via email to