Related to this is this jira issue: 
https://issues.apache.org/jira/browse/SOLR-2585 . With this patch, Solr will 
consider alternatives in cases where a word is mispelled in its context, but 
nevertheless exists in the index and/or dictionary.  This is a work-in-progress 
and is for trunk only, but would make for another nice incremental improvement 
in the spellchecker.

This patch won't solve the problem at hand, but it may make the shingle 
workaround function in a few more cases.  Of course actually developing 
word-break-analysis into the spellchecker would be the right solution...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Monday, July 25, 2011 10:13 AM
To: solr-user@lucene.apache.org
Cc: Dyer, James
Subject: Re: Spellcheck compounded words

This will work for mispelled compounds indeed but not when the compound word 
is actually queried as two separate correctly spelled words. Most likely both 
sail and boat exist in the index as single token.

There is a work around but that's limited to a scenario where users never use 
more than 1 query term (or two in case of mispelled compounds). When your 
index has shingles and you replace the whitespace with a non-whitespace 
character you get a proper suggestion returned. The compound is then found as 
suggestion but not in the collation.

When queries contain more than two terms is most likely will never work this 
way. The results get really strange.

On Monday 25 July 2011 16:49:18 Dyer, James wrote:
> I'm afraid there currently isn't much support for correcting misplaced
> whitespace.  Solr is going to look at each word individually and won't
> even try to combine ajacent words (or split a word into 2 or more).  So
> there is no good way to get these kinds of suggestions.
> 
> One thing that might work in some cases is to create a spelling dictionary
> composed of shingles (2+ words indexed together as 1 token).  This
> approach is described in Smiley&Pugh's Solr book, (1st ed) p.180ff under
> the heading "An alternative approach".  I haven't tried this but it might
> be your best hope if this is a feature you've absolutely got to have.
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: O. Klein [mailto:kl...@octoweb.nl]
> Sent: Friday, July 22, 2011 8:11 PM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck compounded words
> 
> How do I get spellchecker to suggest compounded words?
> 
> Like. q=sail booat
> 
> and suggestion/collate is "sailboat" and "sail boat"
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3
> 192748.html Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to