Hello,

I'm considering using Solr with learning to rank to build a product matcher.
For example, it should match the titles:
- Apple iPhone 6 16 Gb,
- iPhone 6 16 Gb,
- Smartphone IPhone 6 16 Gb,
- iPhone 6 black 16 Gb,
to the same internal reference, an unique identifier.

With Solr, each document would then have a field for the product title and
one for its class, which is the unique identifier of the product.
Solr would then be used to perform matching as follows.

   1. A search is performed with a given product title.
   2. The first three results are considered (this requires an initial
   product title database).
   3. The most frequent identifier is returned.

This method corresponds roughly to a k-Nearest Neighbor approach with the
cosine metric, k = 3, and a TF-IDF model.

I've done some preliminary tests with Sci-kit learn and the results are
good, but not as good as the ones of more sophisticated learning algorithms.

Then, I noticed that there exists learning to rank with Solr.

First, do you think that such an use of Solr makes sense?
Second, is there a relatively simple way to build a learning model using a
sparse representation of the query TF-IDF vector?

Kind regards,

Xavier Schepler

Reply via email to