Aha, good example, Sean. What's the explanation? Note that doing:
http://www.google.com/search?q=abdur+choudhury
offers this alternative:
http://www.google.com/searchq=abdur+chowdhury
And that the number of hits is approximately the same in both cases and that
Google is smart enough to search for and highlight chowdhury even when the
search was for choudhury.
Google's spelling corrections/suggestions are driven off of massive query
(refinement) logs. Solr's suggestions are based on the index field content.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: Sean Timm <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, February 22, 2008 4:03:58 PM
> Subject: Re: Spell checking ?'s
>
> Sometimes context can play into the correct spelling of a term. I
> haven't looked at the 1.3 spell check stuff, but it would be nice to do
> term n-gramming in order to check the terms in context.
>
> Since Otis brought up Google, here is an example of putting the term
> into context.
> http://www.google.com/search?q=choudhury
> http://www.google.com/search?q=abdur+choudhury
>
> -Sean
>
> Otis Gospodnetic wrote:
> > Haven't used SCRH in a while, but what you are describing sounds right
> (thinking about how Google does it) - each word should be checked separately
> and
> we shouldn't assume splitting on whitespace. I'm trying to think if there
> are
> cases where you'd want to look at the surrounding terms instead of looking at
> each term in isolation.... can think of anything exciting....maybe ensure
> that
> words with dashes are properly handled.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> > ----- Original Message ----
> >
> >> From: Grant Ingersoll
> >> To: [email protected]
> >> Sent: Thursday, February 21, 2008 3:13:20 PM
> >> Subject: Spell checking ?'s
> >>
> >> Hi,
> >>
> >> I've been looking a bit at the spell checker and the implementation in
> >> the SpellCheckerRequestHandler and I have some questions.
> >>
> >> In looking at the code and the wiki, the SpellChecker seems to treat
> >> multiword queries differently depending on whether extendedResults is
> >> true or not. Is the use case a multiword query or a single word
> >> query? It seems like one would want to pass the whole query to the
> >> spell checker and have it come back with results for each word, by
> >> default. Otherwise, the application would need to do the tokenization
> >> and send each term one by one to the spell checker. However, the app
> >> likely doesn't have access to the spell check tokenizer, so this is
> >> difficult.
> >>
> >> Which leads me to the next question, in the extendedResults, shouldn't
> >> it use the Query analyzer for the spellcheck field to tokenize the
> >> terms instead of splitting on the space character?
> >>
> >> Would it make sense to, for extendedResults anyway, do the following:
> >> Tokenize the query using the query analyzer for the spelling field
> >> for each token
> >> spell check the token
> >> add the results
> >>
> >> I see that extendedResults is a 1.3 addition, so we would be fine to
> >> change it, if it makes sense.
> >>
> >> Perhaps, for back compatibility, we keep the existing way for non
> >> extendedResults. However, it seems like multiword queries should be
> >> split even in the non-extended results, but I am not sure. How are
> >> others using it?
> >>
> >> Thanks,
> >> Grant
> >>
> >>
> >
> >
> >
>