Using ES as a dictionary server - need advice

2015-01-21 Thread kaspersky_us via elasticsearch
I'm working on a solution that will act as a dictionary validator by performing the following: - input: phrase - processing: shingles phrase match with fuzziness - output: rewritten phrase - data: dictionary like, with entries that are short phrases up to 5 words (e.g "know it all", "merry go r

Re: When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ???? How Comes !?!?

2015-01-19 Thread kaspersky_us via elasticsearch
Thanks Mark. Sounds like this issue affects a lot of people. I looked at your suggestion about FLT, and the ignore_tf parameter should help, however unless I'm missing something, it doesn't seem like this would address the IDF, and results could be biased. But I will experiment. Ultimately I th

Re: When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ???? How Comes !?!?

2015-01-19 Thread kaspersky_us via elasticsearch
I have the same problem, where some results with higher edit distance are ranked higher than other results that are closer in terms of edit distance. I suspect it does have to do with document frequency, as you think Adrien. In my case I want to ignore document frequency completely. Any suggesti