btw, I have done some tests now with the sentence-transformer models
"all-roberta-large-v1" and "all-mpnet-base-v2"
https://huggingface.co/sentence-transformers/all-roberta-large-v1
https://huggingface.co/sentence-transformers/all-mpnet-base-v2
whereas also see
On Tue, Feb 15, 2022 at 2:33 PM Michael Wechner
wrote:
>
> There seems to be no light at the end of the tunnel for the JDK vector
> api, I think OpenJDK will incubate this API until the sun supernovas and
> java is dead :)
> It is frustrating, as that could give current implementation a needed
>
Am 15.02.22 um 19:48 schrieb Robert Muir:
Sure, but lucene should be able to have limits. We have this
discussion with every single limit we attempt to implement :)
There will always be extreme use cases using too many dimensions or
whatever.
It is open source! I think if what you are doing
Sure, but lucene should be able to have limits. We have this discussion
with every single limit we attempt to implement :)
There will always be extreme use cases using too many dimensions or
whatever.
It is open source! I think if what you are doing is strange enough, you can
modify the sources.
I understand, but if Lucene itself would allow to overwrite the default
max size programmatically, then I think it should be clear that you do
this at your own risk :-)
Thanks for the links to your blog posts, which sound very interesting.
Thanks
Michael
Am 15.02.22 um 17:25 schrieb
I believe it could make sense, but as Michael pointed out in the Jira
ticket related to the Solr integration, then we'll get complaints like "I
set it to 1.000.000 and my Solr instance doesn't work anymore" (I kept
everything super simple just to simulate a realistic scenario).
So I tend to agree
fair enough, but wouldn't it make sense that one can increase it
programmatically, e.g.
.setVectorMaxDimension(2028)
?
Thanks
Michael
Am 14.02.22 um 23:34 schrieb Michael Sokolov:
I think we picked the 1024 number as something that seemed so large
nobody would ever want to exceed it!
I think we picked the 1024 number as something that seemed so large
nobody would ever want to exceed it! Obviously that was naive. Still
the limit serves as a cautionary point for users; if your vectors are
bigger than this, there is probably a better way to accomplish what
you are after (eg
Sounds good, hope the testing goes well! Memory and CPU (largely from more
expensive vector distance calculations) are indeed the main factors to
consider.
Julie
On Mon, Feb 14, 2022 at 1:02 PM Michael Wechner
wrote:
> Hi Julie
>
> Thanks again for your feedback!
>
> I will do some more tests
Hi Julie
Thanks again for your feedback!
I will do some more tests with "all-mpnet-base-v2" (768) and
"all-roberta-large-v1" (1024), so 1024 is enough for me for the moment :-)
But yes, I could imagine, that eventually it might make sense to allow
more dimensions than 1024.
Beside memory
Hello Michael, the max number of dimensions is currently hardcoded and
can't be changed. I could see an argument for increasing the default a bit
and would be happy to discuss if you'd like to file a JIRA issue.
However 12288 dimensions still seems high to me, this is much larger than
most
Hi Julie
Thanks very much for this link, which is very interesting!
Btw, do you have an idea how to increase the default max size of 1024?
https://lists.apache.org/thread/hyb6w5c4x5rjt34k3w7zqn3yp5wvf33o
Thanks
Michael
Am 14.02.22 um 17:45 schrieb Julie Tibshirani:
Hello Michael, I don't
Hello Michael, I don't have personal experience with these models, but I
found this article insightful:
https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9.
It evaluates the OpenAI models against a variety of existing
Re the OpenAI embedding the following recent paper might be of interest
https://arxiv.org/pdf/2201.10005.pdf
(Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022)
Thanks
Michael
Am 13.02.22 um 00:14 schrieb Michael Wechner:
Here a concrete example where I combine OpenAI model
Here a concrete example where I combine OpenAI model
"text-similarity-ada-001" with Lucene vector search
INPUT sentence: "What is your age this year?"
Result sentences
1) How old are you this year?
score '0.98860765'
2) What was your age last year?
score '0.97811764'
3) What is your
Hi Alessandro
I am mainly interested in detecting similarity, for example whether the
following two sentences are similar resp. likely to mean the same thing
"How old are you?"
"What is your age?"
and that the following two sentences are not similar, resp. do not mean
the same thing
"How
Hi Michael, experience to what extent?
We have been exploring the area for a while given we contributed the first
neural search milestone to Apache Solr.
What is your curiosity? Performance? Relevance impact? How to integrate it?
Regards
On Fri, 11 Feb 2022, 22:38 Michael Wechner,
wrote:
> Hi
>
Hi
Does anyone have experience using OpenAI embeddings in combination with
Lucene vector search?
https://beta.openai.com/docs/guides/embeddings|
for example comparing performance re vector size
||https://api.openai.com/v1/engines/|||text-similarity-ada-001|/embeddings
and
18 matches
Mail list logo