Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-03-31 Thread Adrien Grand
I'm supportive of bumping the limit on the maximum dimension for vectors to something that is above what the majority of users need, but I'd like to keep a limit. We have limits for other things like the max number of docs per index, the max term length, the max number of dimensions of points,

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-03-31 Thread Michael Wechner
OpenAI reduced their size to 1536 dimensions https://openai.com/blog/new-and-improved-embedding-model so 2048 would work :-) but other services do provide also higher dimensions with sometimes slightly better accuracy Thanks Michael Am 31.03.23 um 14:45 schrieb Adrien Grand: I'm

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-03-31 Thread Alessandro Benedetti
I am also curious what would be the worst-case scenario if we remove the constant at all (so automatically the limit becomes the Java Integer.MAX_VALUE). i.e. right now if you exceed the limit you get: > if (dimension > ByteVectorValues.MAX_DIMENSIONS) { > throw new IllegalArgumentException( >

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-03-31 Thread Michael Wechner
Thanks Alessandro for summarizing the discussion below! I understand that there is no clear reasoning re what is the best embedding size, whereas I think heuristic approaches like described by the following link can be helpful

[Proposal] Remove max number of dimensions for KNN vectors

2023-03-31 Thread Alessandro Benedetti
I've been monitoring various discussions on Pull Requests about changing the max number of dimensions allowed for Lucene HNSW vectors: https://github.com/apache/lucene/pull/12191 https://github.com/apache/lucene/issues/11507 I would like to set up a discussion and potentially a vote about