Related to this is this jira issue: https://issues.apache.org/jira/browse/SOLR-2585 . With this patch, Solr will consider alternatives in cases where a word is mispelled in its context, but nevertheless exists in the index and/or dictionary. This is a work-in-progress and is for trunk only, but would make for another nice incremental improvement in the spellchecker.
This patch won't solve the problem at hand, but it may make the shingle workaround function in a few more cases. Of course actually developing word-break-analysis into the spellchecker would be the right solution... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, July 25, 2011 10:13 AM To: solr-user@lucene.apache.org Cc: Dyer, James Subject: Re: Spellcheck compounded words This will work for mispelled compounds indeed but not when the compound word is actually queried as two separate correctly spelled words. Most likely both sail and boat exist in the index as single token. There is a work around but that's limited to a scenario where users never use more than 1 query term (or two in case of mispelled compounds). When your index has shingles and you replace the whitespace with a non-whitespace character you get a proper suggestion returned. The compound is then found as suggestion but not in the collation. When queries contain more than two terms is most likely will never work this way. The results get really strange. On Monday 25 July 2011 16:49:18 Dyer, James wrote: > I'm afraid there currently isn't much support for correcting misplaced > whitespace. Solr is going to look at each word individually and won't > even try to combine ajacent words (or split a word into 2 or more). So > there is no good way to get these kinds of suggestions. > > One thing that might work in some cases is to create a spelling dictionary > composed of shingles (2+ words indexed together as 1 token). This > approach is described in Smiley&Pugh's Solr book, (1st ed) p.180ff under > the heading "An alternative approach". I haven't tried this but it might > be your best hope if this is a feature you've absolutely got to have. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -----Original Message----- > From: O. Klein [mailto:kl...@octoweb.nl] > Sent: Friday, July 22, 2011 8:11 PM > To: solr-user@lucene.apache.org > Subject: Spellcheck compounded words > > How do I get spellchecker to suggest compounded words? > > Like. q=sail booat > > and suggestion/collate is "sailboat" and "sail boat" > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3 > 192748.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350