This is a pretty big hole in Lucene-based search right now that many
practitioners have struggled with

I know a couple of people who have worked on solutions. And I've used a
couple of hacks:

- You can hack together something that does cosine similarity using the
term frequency & query boosts DelimitedTermFreqFilterFactory. Basically the
term frequency becomes a feature weight on the document. Boosts become the
query weight. If you massage things correctly with the similarity, the
resulting boolean similarity is a dot product...

- Erik Hatcher has done some great work with payloads which you might want
to check out. See the delimited payload filter factory, and payload score
function queries

- Simon Hughes Activate Talk (slides/video not yet posted) covers this
topic in some depth

- Rene Kriegler's Haystack Talk discusses encoding Inception model
vectorizations of images:
https://opensourceconnections.com/events/haystack-single/haystack-relevance-scoring/

If this is a huge importance to you, I might also suggest looking at vespa,
which makes tensors a first-class citizen and makes matrix-math pretty
seamless: http://vespa.ai

Hope that helps
-Doug

On Fri, Oct 19, 2018 at 12:50 PM Ken Krugler <kkrugler_li...@transpac.com>
wrote:

> Hi all,
>
> [I posted on the Lucene list two days ago, but didn’t see any response -
> checking here for completeness]
>
> I’ve been looking at directly storing feature vectors and providing
> scoring/filtering support.
>
> This is for vectors consisting of (typically 300 - 2048) floats or doubles.
>
> It’s following the same pattern as geospatial support - so a new field
> type and query/parser, plus plumbing to hook it into Solr.
>
> Before I go much further, is there anything like this already done, or in
> the works?
>
> Thanks,
>
> — Ken
>
> --------------------------
> Ken Krugler
> +1 530-210-6378 <(530)%20210-6378>
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
> --
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Reply via email to