Hi,
I've been playing with "neural" / dense vector search in Solr 9 a bit
and find it very promising.
Currently I am calculating the vectors outside of Solr at indexing and
search time with a bunch of scripts using NLP models (text in, vectors
out...). Especially at search time, that's not exactly a handy solution,
because every client application would have to do this (or some sort of
proxy application between client applications and Solr, that would
manipulate requests (search terms out, vector in) on their way to Solr).
That's ok for my very basic prototype, but nothing else.
How are others solving this? Are there any best practices? Or even plans
to make Solr talk directly to ML models?
In Solr's traditional logic, I would imagine something like an analyzer,
that does the "dense vector creation" at indexing and search time. It
would have to use a ML model, pass data/searches in, get vectors out and
put them into a DenseVectorField. Just as traditional analyzers work.
The model could be a configurable ONNX model?
Is someone working on something like this? (I only found some related
comments in https://github.com/apache/solr/pull/1213)
Till
--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
[email protected], http://www.gbv.de/