comparing feature vectors using Solr/Lucene

2014-11-26 Thread Upayavira
Hi, I've been asked how to use Solr as a component in a machine learning system, doing document comparison based upon feature vectors. If I have two vectors, one in the index (in some form) and one in the query (in some form), how can I do, for example, a vector multiplication of the two vectors

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Nicholas Ding
I'm not sure if Solr is the right tool to do this task. You probably need a machine learning library like Mahout or Weka. PS: Lucene doesn't really use Cosine Similarity, it's using a practical TF-IDF Similarity. Nicholas Ding On Wed, Nov 26, 2014 at 3:05 PM, Upayavira u...@odoko.co.uk wrote:

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Upayavira
Thanks Nicholas, there is a sense in which Solr isn't the right tool. However, we already have lots of business rules encapsulated into filter queries, and already have content ingestion pipelines for our content in place. TF-IDF similarity is pluggable (even just by sorting on function queries),

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Paul Libbrecht
Upayavira, on the lucene list, two tools are sometimes talked about which might be doing some of what you are searching: - semanticvectors (https://code.google.com/p/semanticvectors) - word2vec https://github.com/kojisekig/word2vec-lucene/i Maybe it helps? I'm under the impression that you are

Re: comparing feature vectors using Solr/Lucene

2014-11-26 Thread Mikhail Khludnev
Hello, Lucene rocks in calculating scalar product (a score of whatever similarity) of sparse feature vectors. That's it. Note that 'feature' usually means a term, and 'feature vector' is a document. Which might be opposite to your problem definition. You can either expand the definition of your