I'm working on a solution that will act as a dictionary validator by
performing the following:
- input: phrase
- processing: shingles phrase match with fuzziness
- output: rewritten phrase
- data: dictionary like, with entries that are short phrases up to 5 words
(e.g "know it all", "merry go r
Thanks Mark. Sounds like this issue affects a lot of people.
I looked at your suggestion about FLT, and the ignore_tf parameter should
help, however unless I'm missing something, it doesn't seem like this would
address the IDF, and results could be biased. But I will experiment.
Ultimately I th
I have the same problem, where some results with higher edit distance are
ranked higher than other results that are closer in terms of edit distance.
I suspect it does have to do with document frequency, as you think Adrien.
In my case I want to ignore document frequency completely. Any suggesti