RE: Spell checking street names

2008-01-31 Thread Max Metral
gic, it's obviously not the best metric. Is there an appropriate edit distance metric that takes phonetics into account? -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Thursday, January 31, 2008 6:12 AM To: java-user@lucene.apache.org Subject: Re: Spell checking s

Re: Spell checking street names

2008-01-31 Thread Karl Wettin
30 jan 2008 kl. 17.34 skrev Max Metral: Part of the reason is if we look at some common mistakes: For Commonwealth: Communwealth Comonwealth Common wealth If they are common misstakes you can pick them up using reinforcement learning.

Re: Spell checking street names

2008-01-31 Thread eks dev
ache.org Sent: Thursday, 31 January, 2008 6:02:28 AM Subject: Re: Spell checking street names Hmmm, "untokenized n-gram spell checker"... does that really make sense? lucene as 2-gram: lu uc ce en ne. but all as a single token? No, I don't think that

Re: Spell checking street names

2008-01-30 Thread Otis Gospodnetic
://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Max Metral <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, January 30, 2008 11:34:20 AM Subject: Spell checking street names I'm using Lucene to spell check street names. Righ

Spell checking street names

2008-01-30 Thread Max Metral
I'm using Lucene to spell check street names. Right now, I'm using Double Metaphone on the street name (we have a sophisticated regex to parse out the NAME as opposed to the unit, number, street type, or suffix). I think that Double Metaphone is probably overkill/wrong, and a spell checking appro