I'm supportive of bumping the limit on the maximum dimension for
vectors to something that is above what the majority of users need,
but I'd like to keep a limit. We have limits for other things like the
max number of docs per index, the max term length, the max number of
dimensions of points,
OpenAI reduced their size to 1536 dimensions
https://openai.com/blog/new-and-improved-embedding-model
so 2048 would work :-)
but other services do provide also higher dimensions with sometimes
slightly better accuracy
Thanks
Michael
Am 31.03.23 um 14:45 schrieb Adrien Grand:
I'm
I am also curious what would be the worst-case scenario if we remove the
constant at all (so automatically the limit becomes the Java
Integer.MAX_VALUE).
i.e.
right now if you exceed the limit you get:
> if (dimension > ByteVectorValues.MAX_DIMENSIONS) {
> throw new IllegalArgumentException(
>
Thanks Alessandro for summarizing the discussion below!
I understand that there is no clear reasoning re what is the best
embedding size, whereas I think heuristic approaches like described by
the following link can be helpful
I've been monitoring various discussions on Pull Requests about changing
the max number of dimensions allowed for Lucene HNSW vectors:
https://github.com/apache/lucene/pull/12191
https://github.com/apache/lucene/issues/11507
I would like to set up a discussion and potentially a vote about