Hi

I've set it to 2, but python implementation of Levenshtein says its 3
for restraunt -> restaurant.

On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar
<susheel.ku...@thedigitalgroup.net> wrote:
> How much is the maxEdits you have set. It should catch restaurant example 
> with edit distance set to 2.
>
> Thanks,
> Susheel
>
> -----Original Message-----
> From: Maciej Dziardziel [mailto:fied...@gmail.com]
> Sent: Friday, May 02, 2014 7:05 PM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking - looking for general advice
>
> Hi
>
> I was looking at spellcheck (Direct and FileBased) and testing that they can 
> do.
> Direct works fine most of the time, but I'd like to find solution for few 
> corner cases:
>
> 1) having "recruted" and "recruiter" in index, "recruter" should suggest the 
> latter.
>     Obviously the distance to the former is smaller, so it may be completely 
> arbitrary,
>     and perhaps must be handled on application side rather then solr.
> 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to 
> big for that.
>
> Those are few examples of queries that spellcheck gets (according to my 
> requirements) wrong.
> For now I am just looking at possible solutions and I'd need to come up with 
> initial concept to have something to show to users and get more feedback, 
> likely with more cases to correct.
>
> I'd like to know if there are some tweaks to spellcheck component I could 
> make (or perhaps other ways of doing this with solr), or am I forced to 
> hardcode list of all such corrections that go beyond what spellcheck can do?
>
> One solution I am considering is to put list of those special cases into 
> FileSpellChecker (it seems to be more relaxed, and handles restraunt case 
> well) and fall back to Direct if this yields no results... though I am not 
> sure yet how well that would work in practice if the list of misspelled words 
> would grow beyond few I have now. It would most likely woldn't scale
>
> Another possibility would be to analyze list of queries our users use that 
> yield little results and check if there is spellchecked version that improves 
> that... but that seems to require human to review corrections.
>
> Yet another thing I was thinking about would be to pull terms into separate 
> spellchecker (like aspell) and see if they do better job or are more 
> tweakable.
>
> That's a bit open ended problem, so any advice welcome.
>
> --
> Maciej Dziardziel
> fied...@gmail.com
> This e-mail message may contain confidential or legally privileged 
> information and is intended only for the use of the intended recipient(s). 
> Any unauthorized disclosure, dissemination, distribution, copying or the 
> taking of any action in reliance on the information herein is prohibited. 
> E-mails are not secure and cannot be guaranteed to be error free as they can 
> be intercepted, amended, or contain viruses. Anyone who communicates with us 
> by e-mail is deemed to have accepted these risks. The Digital Group is not 
> responsible for errors or omissions in this message and denies any 
> responsibility for any damage arising from the use of e-mail. Any opinion 
> defamatory or deemed to be defamatory or  any material which could be 
> reasonably branded to be a species of plagiarism and other statements 
> contained in this message and any attachment are solely those of the author 
> and do not necessarily represent those of the company.



-- 
Maciej Dziardziel
fied...@gmail.com

Reply via email to