Vector based queries

Pat Ferrel Sat, 10 Mar 2012 09:59:30 -0800

I have a case where I'd like to get documents which most closely match aparticular vector. The RowSimilarityJob of Mahout is ideal forprecalculating similarity between existing documents but in my case thequery is constructed at run time. So the UI constructs a vector to beused as a query. We have this running in prototype using a run timecalculation of cosine similarity but the implementation is not scalableto large doc stores.

One thought is to calculate fairly small clusters. The UI will knowwhich cluster to target for the vector query. So we might be able tonarrow down the number of docs per query to a reasonable size.

It seems like a place for multiple hash functions maybe? Could we usesome kind of hack of the boost feature of Solr or some other approach?


Does anyone have a suggestion?

Vector based queries

Reply via email to