Hi I've set it to 2, but python implementation of Levenshtein says its 3 for restraunt -> restaurant.
On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar <susheel.ku...@thedigitalgroup.net> wrote: > How much is the maxEdits you have set. It should catch restaurant example > with edit distance set to 2. > > Thanks, > Susheel > > -----Original Message----- > From: Maciej Dziardziel [mailto:fied...@gmail.com] > Sent: Friday, May 02, 2014 7:05 PM > To: solr-user@lucene.apache.org > Subject: Spellchecking - looking for general advice > > Hi > > I was looking at spellcheck (Direct and FileBased) and testing that they can > do. > Direct works fine most of the time, but I'd like to find solution for few > corner cases: > > 1) having "recruted" and "recruiter" in index, "recruter" should suggest the > latter. > Obviously the distance to the former is smaller, so it may be completely > arbitrary, > and perhaps must be handled on application side rather then solr. > 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to > big for that. > > Those are few examples of queries that spellcheck gets (according to my > requirements) wrong. > For now I am just looking at possible solutions and I'd need to come up with > initial concept to have something to show to users and get more feedback, > likely with more cases to correct. > > I'd like to know if there are some tweaks to spellcheck component I could > make (or perhaps other ways of doing this with solr), or am I forced to > hardcode list of all such corrections that go beyond what spellcheck can do? > > One solution I am considering is to put list of those special cases into > FileSpellChecker (it seems to be more relaxed, and handles restraunt case > well) and fall back to Direct if this yields no results... though I am not > sure yet how well that would work in practice if the list of misspelled words > would grow beyond few I have now. It would most likely woldn't scale > > Another possibility would be to analyze list of queries our users use that > yield little results and check if there is spellchecked version that improves > that... but that seems to require human to review corrections. > > Yet another thing I was thinking about would be to pull terms into separate > spellchecker (like aspell) and see if they do better job or are more > tweakable. > > That's a bit open ended problem, so any advice welcome. > > -- > Maciej Dziardziel > fied...@gmail.com > This e-mail message may contain confidential or legally privileged > information and is intended only for the use of the intended recipient(s). > Any unauthorized disclosure, dissemination, distribution, copying or the > taking of any action in reliance on the information herein is prohibited. > E-mails are not secure and cannot be guaranteed to be error free as they can > be intercepted, amended, or contain viruses. Anyone who communicates with us > by e-mail is deemed to have accepted these risks. The Digital Group is not > responsible for errors or omissions in this message and denies any > responsibility for any damage arising from the use of e-mail. Any opinion > defamatory or deemed to be defamatory or any material which could be > reasonably branded to be a species of plagiarism and other statements > contained in this message and any attachment are solely those of the author > and do not necessarily represent those of the company. -- Maciej Dziardziel fied...@gmail.com