Thanks for your answer Rahul. I think I have explained similarity with the
example, assuming the natural order.
I would assume this is a common action for people who use solr and do search
based systems.
I am basically looking for any design patterns that people use to achieve the
results as exp
How do you define similarity? There are various different methods that work for
different methods. In solr depending on which index time analyzer / tokenizer
you are using, it will treat one company name as similar in one scenario and
not in another.
This seems like a case of data deduplication
Hi Team
This is what I want to do:
1. I have 2 datasets of the schema id-number and company-name
2. I want to ultimately be able to link (join or any other means) the 2 data
sets based on the similarity between the company-name fields of the 2 data set.
Example:
Dataset 1
Id | Company