I'm working on a solution that will act as a dictionary validator by 
performing the following:

- input: phrase
- processing: shingles phrase match with fuzziness
- output: rewritten phrase

- data: dictionary like, with entries that are short phrases up to 5 words 
(e.g "know it all", "merry go round")

What's particular about this use case is that we don't care about TF / IDF 
and have another mechanism in mind to select an entry (but that's not the 
issue).

The issue is that all started well, with queries involving a phrase 
suggester, direct generator and collation, but that's where we hit a snag 
with issues of fuzzy matches (edit distance >0) ranking higher than exact 
matches...

I've been discussing this in another thread 
(https://groups.google.com/forum/#!searchin/elasticsearch/bose/elasticsearch/dLdT90j1x74/zqJQiSlgHv8J)
 
but I wanted to present my use case a bit more clearly and see if there are 
any advices to achieve the purpose.

I tried to use FLT, as kindly recommended by Mark Harwood but didn't figure 
out how to use it as phrase suggester.

The key here I think is to control the scoring of the suggester, by not 
accounting for TF / IDF and instead just provide a ranking by a n-gram 
formula involving edit distance for further custom processing to select the 
right suggester entry. I looked at smoothing models, but everything seems 
to be based, to a +/- extent, on TF / IDF.

Any advice would be appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/82ed7214-0659-4140-a5cc-27c5905f1d7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to