Hello, I'm considering using Solr with learning to rank to build a product matcher. For example, it should match the titles: - Apple iPhone 6 16 Gb, - iPhone 6 16 Gb, - Smartphone IPhone 6 16 Gb, - iPhone 6 black 16 Gb, to the same internal reference, an unique identifier.
With Solr, each document would then have a field for the product title and one for its class, which is the unique identifier of the product. Solr would then be used to perform matching as follows. 1. A search is performed with a given product title. 2. The first three results are considered (this requires an initial product title database). 3. The most frequent identifier is returned. This method corresponds roughly to a k-Nearest Neighbor approach with the cosine metric, k = 3, and a TF-IDF model. I've done some preliminary tests with Sci-kit learn and the results are good, but not as good as the ones of more sophisticated learning algorithms. Then, I noticed that there exists learning to rank with Solr. First, do you think that such an use of Solr makes sense? Second, is there a relatively simple way to build a learning model using a sparse representation of the query TF-IDF vector? Kind regards, Xavier Schepler