I'm working on a solution that will act as a dictionary validator by performing the following:
- input: phrase - processing: shingles phrase match with fuzziness - output: rewritten phrase - data: dictionary like, with entries that are short phrases up to 5 words (e.g "know it all", "merry go round") What's particular about this use case is that we don't care about TF / IDF and have another mechanism in mind to select an entry (but that's not the issue). The issue is that all started well, with queries involving a phrase suggester, direct generator and collation, but that's where we hit a snag with issues of fuzzy matches (edit distance >0) ranking higher than exact matches... I've been discussing this in another thread (https://groups.google.com/forum/#!searchin/elasticsearch/bose/elasticsearch/dLdT90j1x74/zqJQiSlgHv8J) but I wanted to present my use case a bit more clearly and see if there are any advices to achieve the purpose. I tried to use FLT, as kindly recommended by Mark Harwood but didn't figure out how to use it as phrase suggester. The key here I think is to control the scoring of the suggester, by not accounting for TF / IDF and instead just provide a ranking by a n-gram formula involving edit distance for further custom processing to select the right suggester entry. I looked at smoothing models, but everything seems to be based, to a +/- extent, on TF / IDF. Any advice would be appreciated! -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82ed7214-0659-4140-a5cc-27c5905f1d7e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.