[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597930#action_12597930 ]
bosmid edited comment on SOLR-572 at 5/19/08 5:03 AM: ---------------------------------------------------------- Character encodings for file-based dictionaries now supported with property characterEncoding. So, configuration for such dictionary would look like this: {code:xml} <lst name="dictionary"> <str name="name">external</str> <str name="type">file</str> <str name="location">spellings.txt</str> <str name="characterEncoding">UTF-8</str> <str name="spellcheckIndexDir ">c:\spellchecker</str> </lst> {code} New code needs latest lucene-spellchecker-2.4*.jar from Lucene trunk. Since SolrResourceLoader.getLines method doesn't support configurable encodings (treats everything as UTF-8), I wasn't sure how to add that support. I could have added overloaded method to SolrResourceLoader, but there is a TODO comment, so I decided to create getLines() method inside SpellCheckComponent class instead. What do you think of this? was (Author: bosmid): Character encodings for file-based dictionaries now supported with property characterEncoding. So, configuration for such dictionary would look like this: {code:xml} <lst name="dictionary"> <str name="name">external</str> <str name="type">file</str> <str name="location">spellings.txt</str> <str name="characterEncoding">UTF-8</str> <str name="spellcheckIndexDir ">c:\spellchecker</str> </lst> {code} New code needs latest lucene-spellchecker-2.4*.jar from Lucene trunk. Since SolrResourceLoader.getLines method doesn't support configurable encodings (treats everything as UTF-8), I wasn't sure how to add that support. I could have added overloaded method to SolrResourceLoader, but there is a TODO comment, so I decided to create getLines() method inside SpellCheckComponent class instead. What do you think of this? > Spell Checker as a Search Component > ----------------------------------- > > Key: SOLR-572 > URL: https://issues.apache.org/jira/browse/SOLR-572 > Project: Solr > Issue Type: New Feature > Components: spellchecker > Affects Versions: 1.3 > Reporter: Shalin Shekhar Mangar > Fix For: 1.3 > > Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch > > > Expose the Lucene contrib SpellChecker as a Search Component. Provide the > following features: > * Allow creating a spell index on a given field and make it possible to have > multiple spell indices -- one for each field > * Give suggestions on a per-field basis > * Given a multi-word query, give only one consistent suggestion > * Process the query with the same analyzer specified for the source field and > process each token separately > * Allow the user to specify minimum length for a token (optional) > Consistency criteria for a multi-word query can consist of the following: > * Preserve the correct words in the original query as it is > * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.