Another theoretical answer for this question is ngrams approach. You can
index the word and its trigrams. Query the index, by the string as well as
its trigrams, with a % match search. You than pass the exhaustive resultset
through a more expensive scoring such as Smith Waterman.

Thanks,

Jagdish


On Sat, Jun 8, 2013 at 11:03 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:

> n-grams might help, followed by a edit distance metric such as Jaro-Winkler
> or Smith-Waterman-Gotoh to further filter out.
>
>
> On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com
> > wrote:
>
> > Interesting problem.  The first thing that comes to mind is to do
> > "word expansion" during indexing.  Kind of like synonym expansion, but
> > maybe a bit more dynamic. If you can have a dictionary of correctly
> > spelled words, then for each token emitted by the tokenizer you could
> > look up the dictionary and expand the token to all other words that
> > are similar/close enough.  This would not be super fast, and you'd
> > likely have to add some custom heuristic for figuring out what
> > "similar/close enough" means, but it might work.
> >
> > I'd love to hear other ideas...
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల
> > <kamesh...@gmail.com> wrote:
> > > Hi,
> > >
> > > I have a problem where our text corpus on which we need to do search
> > > contains many misspelled words. Same word could also be misspelled in
> > > several different ways. It could also have documents that have correct
> > > spellings However, the search term that we give in query would always
> be
> > > correct spelling. Now when we search on a term, we would like to get
> all
> > > the documents that contain both correct and misspelled forms of the
> > search
> > > term.
> > > We tried fuzzy search, but it doesn't work as per our expectations. It
> > > returns any close match, not specifically misspelled words. For
> example,
> > if
> > > I'm searching for a word like "fight", I would like to return the
> > documents
> > > that have words like "figth" and "feight", not documents with words
> like
> > > "sight" and "light".
> > > Is there any suggested approach for doing this?
> > >
> > > regards,
> > > Kamesh
> >
>



-- 
***Jagdish Nomula*
Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com <yourem...@simplyhired.com>

www.simplyhired.com
  • Search for ... కామేశ్వర రావు భైరవభట్ల
    • Re: Se... Otis Gospodnetic
      • Re... Shashi Kant
        • ... Jagdish Nomula
          • ... Otis Gospodnetic
            • ... Jagdish Nomula
              • ... కామేశ్వర రావు భైరవభట్ల
    • Re: Se... Upayavira
      • Re... కామేశ్వర రావు భైరవభట్ల

Reply via email to