subject:"combining open office spellchecker with Lucene"

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-20 Thread Morus Walter

David Spencer writes: > > > > could you put the current version of your code on that website as a java > > Weblog entry updated: > > http://searchmorph.com/weblog/index.php?id=23 > thanks > > Great suggestion and thanks for that idiom - I should know such things > by now. To clarify the "issu

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-16 Thread David Spencer

Morus Walter wrote: Hi David, Based on this mail I wrote a "ngram speller" for Lucene. It runs in 2 phases. First you build a "fast lookup index" as mentioned above. Then to correct a word you do a query in this index based on the ngrams in the misspelled word. Let's see. [1] Source is attached

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-16 Thread Morus Walter

Hi David, > > Based on this mail I wrote a "ngram speller" for Lucene. It runs in 2 > phases. First you build a "fast lookup index" as mentioned above. Then > to correct a word you do a query in this index based on the ngrams in > the misspelled word. > > Let's see. > > [1] Source is attached

RE: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-15 Thread Aad Nales

Also, You can also use an alternative spellchecker for the 'checking part' and use the Ngram algorithm for the 'suggestion' part. Only if the spell 'check' declares a word illegal the 'suggestion' part would perform its magic. cheers, Aad Doug Cutting wrote: > David Spencer wrote: > >> [1] Th

Re: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-15 Thread David Spencer

Doug Cutting wrote: David Spencer wrote: [1] The user enters a query like: recursize descent parser [2] The search code parses this and sees that the 1st word is not a term in the index, but the next 2 are. So it ignores the last 2 terms ("recursive" and "descent") and suggests alternatives t

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread David Spencer

Andrzej Bialecki wrote: David Spencer wrote: To restate the question for a second. The misspelled word is: "conts". The sugggestion expected is "const", which seems reasonable enough as it's just a transposition away, thus the string distance is low. But - I guess the problem w/ the algorithm is

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread David Spencer

om/kat/spell.jsp?s=conts&min=2&max=5&maxd=5&maxr=10&bstart=2.0&bend=1.0&btranspose=10.0&popular=1 -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Wednesday, 15 September, 2004 12:23 To: Lucene Users List Subject: Re: NGramSpeller cont

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread Andrzej Bialecki

David Spencer wrote: To restate the question for a second. The misspelled word is: "conts". The sugggestion expected is "const", which seems reasonable enough as it's just a transposition away, thus the string distance is low. But - I guess the problem w/ the algorithm is that for short words lik

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread David Spencer

Andrzej Bialecki wrote: Aad Nales wrote: David, Perhaps I misunderstand somehting so please correct me if I do. I used http://www.searchmorph.com/kat/spell.jsp to look for conts without changing any of the default values. What I got as results did not include 'const' which has quite a high frequenc

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread David Spencer

itions to the code and will report back if anything of interest changes here. -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Wednesday, 15 September, 2004 12:23 To: Lucene Users List Subject: Re: NGramSpeller contribution -- Re: combining open office spellchecker w

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread Aad Nales

y expectations (most likely ;-) 2. something in the code.. -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Wednesday, 15 September, 2004 12:23 To: Lucene Users List Subject: Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene Aad N

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread Andrzej Bialecki

Aad Nales wrote: David, Perhaps I misunderstand somehting so please correct me if I do. I used http://www.searchmorph.com/kat/spell.jsp to look for conts without changing any of the default values. What I got as results did not include 'const' which has quite a high frequency in your index and ???

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-15 Thread Aad Nales

uld have a pretty low levenshtein distance. Any idea what causes this behavior? cheers, Aad -Original Message- From: David Spencer [mailto:[EMAIL PROTECTED] Sent: Tuesday, 14 September, 2004 21:23 To: Lucene Users List Subject: NGramSpeller contribution -- Re: combining open office spellch

Re: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

Doug Cutting wrote: David Spencer wrote: [1] The user enters a query like: recursize descent parser [2] The search code parses this and sees that the 1st word is not a term in the index, but the next 2 are. So it ignores the last 2 terms ("recursive" and "descent") and suggests alternatives t

Re: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-14 Thread Doug Cutting

David Spencer wrote: [1] The user enters a query like: recursize descent parser [2] The search code parses this and sees that the 1st word is not a term in the index, but the next 2 are. So it ignores the last 2 terms ("recursive" and "descent") and suggests alternatives to "recursize"...thu

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread Doug Cutting

Andrzej Bialecki wrote: I was wondering about the way you build the n-gram queries. You basically don't care about their position in the input term. Originally I thought about using PhraseQuery with a slop - however, after checking the source of PhraseQuery I realized that this probably wouldn't

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

Andrzej Bialecki wrote: David Spencer wrote: ...or prepare in advance a fast lookup index - split all existing terms to bi- or trigrams, create a separate lookup index, and then simply for each term ask a phrase query (phrase = all n-grams from an input term), with a slop > 0, to get similar existi

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread Andrzej Bialecki

David Spencer wrote: ...or prepare in advance a fast lookup index - split all existing terms to bi- or trigrams, create a separate lookup index, and then simply for each term ask a phrase query (phrase = all n-grams from an input term), with a slop > 0, to get similar existing terms. This should be

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

List Subject: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene Andrzej Bialecki wrote: David Spencer wrote: I can/should send the code out. The logic is that for any terms in a query that have zero matches, go thru all the terms(!) and calculate the Levenshtein string

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread Tate Avery

: combining open office spellchecker with Lucene Andrzej Bialecki wrote: > David Spencer wrote: > >> >> I can/should send the code out. The logic is that for any terms in a >> query that have zero matches, go thru all the terms(!) and calculate >> the Levenshtein s

NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

Andrzej Bialecki wrote: David Spencer wrote: I can/should send the code out. The logic is that for any terms in a query that have zero matches, go thru all the terms(!) and calculate the Levenshtein string distance, and return the best matches. A more intelligent way of doing this is to instead

RE: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-11 Thread Aad Nales

Doug Cutting wrote: > David Spencer wrote: > >> Doug Cutting wrote: >> >>> And one should not try correction at all for terms which occur in a >>> large proportion of the collection. >> >> >> >> I keep thinking over this one and I don't understand it. If a user >> misspells a word and the "did yo

Re: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-10 Thread David Spencer

Doug Cutting wrote: David Spencer wrote: Doug Cutting wrote: And one should not try correction at all for terms which occur in a large proportion of the collection. I keep thinking over this one and I don't understand it. If a user misspells a word and the "did you mean" spelling correction algo

Re: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-10 Thread Doug Cutting

David Spencer wrote: Doug Cutting wrote: And one should not try correction at all for terms which occur in a large proportion of the collection. I keep thinking over this one and I don't understand it. If a user misspells a word and the "did you mean" spelling correction algorithm determines th

frequent terms - Re: combining open office spellchecker with Lucene

2004-09-10 Thread David Spencer

Doug Cutting wrote: Aad Nales wrote: Before I start reinventing wheels I would like to do a short check to see if anybody else has already tried this. A customer has requested us to look into the possibility to perform a spell check on queries. So far the most promising way of doing this seems to b

Re: combining open office spellchecker with Lucene

2004-09-10 Thread David Spencer

eks dev wrote: Hi Doug, Perhaps. Are folks really better at spelling the beginning of words? Yes they are. There were some comprehensive empirical studies on this topic. Winkler modification on Jaro string distance is based on this assumption (boosting similarity if first n, I think 4, chars mat

Re: combining open office spellchecker with Lucene

2004-09-10 Thread eks dev

Hi Doug, > Perhaps. Are folks really better at spelling the > beginning of words? Yes they are. There were some comprehensive empirical studies on this topic. Winkler modification on Jaro string distance is based on this assumption (boosting similarity if first n, I think 4, chars match). Jaro-W

Re: combining open office spellchecker with Lucene

2004-09-09 Thread Doug Cutting

David Spencer wrote: Good heuristics but are there any more precise, standard guidelines as to how to balance or combine what I think are the following possible criteria in suggesting a better choice: Not that I know of. - ignore(penalize?) terms that are rare I think this one is easy to threshol

Re: combining open office spellchecker with Lucene

2004-09-09 Thread David Spencer

Doug Cutting wrote: Aad Nales wrote: Before I start reinventing wheels I would like to do a short check to see if anybody else has already tried this. A customer has requested us to look into the possibility to perform a spell check on queries. So far the most promising way of doing this seems to b

Re: combining open office spellchecker with Lucene

2004-09-09 Thread Doug Cutting

Aad Nales wrote: Before I start reinventing wheels I would like to do a short check to see if anybody else has already tried this. A customer has requested us to look into the possibility to perform a spell check on queries. So far the most promising way of doing this seems to be to create an Analy

Re: combining open office spellchecker with Lucene

2004-09-09 Thread David Spencer

Andrzej Bialecki wrote: David Spencer wrote: I can/should send the code out. The logic is that for any terms in a query that have zero matches, go thru all the terms(!) and calculate the Levenshtein string distance, and return the best matches. A more intelligent way of doing this is to instead

Re: combining open office spellchecker with Lucene

2004-09-09 Thread Andrzej Bialecki

David Spencer wrote: I can/should send the code out. The logic is that for any terms in a query that have zero matches, go thru all the terms(!) and calculate the Levenshtein string distance, and return the best matches. A more intelligent way of doing this is to instead look for terms that also

Re: combining open office spellchecker with Lucene

2004-09-09 Thread David Spencer

Aad Nales wrote: Hi All, Before I start reinventing wheels I would like to do a short check to see if anybody else has already tried this. A customer has requested us to look into the possibility to perform a spell check on queries. So far the most promising way of doing this seems to be to create

combining open office spellchecker with Lucene

2004-09-09 Thread Aad Nales

Hi All, Before I start reinventing wheels I would like to do a short check to see if anybody else has already tried this. A customer has requested us to look into the possibility to perform a spell check on queries. So far the most promising way of doing this seems to be to create an Analyzer base

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

RE: frequent terms - Re: combining open office spellchecker with Lucene

Re: frequent terms - Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: frequent terms - Re: combining open office spellchecker with Lucene

Re: frequent terms - Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

RE: frequent terms - Re: combining open office spellchecker with Lucene

Re: frequent terms - Re: combining open office spellchecker with Lucene

Re: frequent terms - Re: combining open office spellchecker with Lucene

frequent terms - Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

Re: combining open office spellchecker with Lucene

combining open office spellchecker with Lucene

34 matches

Site Navigation

Mail list logo

Footer information