Custom SliceExecutor and slices computation in IndexSearcher

2023-05-18 Thread SorabhApache
Hi All, For concurrent segment search, lucene uses the *slices* method to compute the number of work units which can be processed concurrently. a) It calculates *slices* in the constructor of *IndexSearcher*

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Nicholas Knize
Difficult to keep up with this topic when it's spread across issues, PRs, and email lists. My poll response is option 3. -1 to option 2, I think the configuration should be moved to the HNSW specific implementation. At this point of technical maturity, it doesn't make sense (to me) to have the

Re: Allowing tests to use multiple cores

2023-05-18 Thread Michael McCandless
Hmm, I think that setting just tells the JVM to pretend the underlying hardware has only one core? I.e. forcing "Runtime.getRuntime().availableProcessors()" to return 1. But your test is still free to launch multiple threads to test concurrency and they should run on multiple actual CPU cores if

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
Am 18.05.23 um 12:22 schrieb Michael McCandless: I love all the energy and passion going into debating all the ways to poke at this limit, but please let's also spend some of this passion on actually improving the scalability of our aKNN implementation!  E.g. Robert opened an exciting 

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
It is basically the code which Michael Sokolov posted at https://markmail.org/message/kf4nzoqyhwacb7ri except  - that I have replaced KnnVectorField by KnnFloatVectorField, because KnnVectorField is deprecated.  - that I don't hard code the  dimension as 2048 and the metric as EUCLIDEAN, but

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael McCandless
This isn't really a VOTE (no specific code change is being proposed), but rather a poll? Anyway, I would prefer Option 3: put the limit check into the HNSW algorithm itself. This is the right place for the limit check, since HNSW has its own scaling behaviour. It might have other limits, like

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Alessandro Benedetti
That's great and a good plan B, but let's try to focus this thread of collecting votes for a week (let's keep discussions on the nice PR opened by David or the discussion thread we have in the mailing list already :) On Thu, 18 May 2023, 10:10 Ishan Chattopadhyaya, wrote: > That sounds

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Ishan Chattopadhyaya
That sounds promising, Michael. Can you share scripts/steps/code to reproduce this? On Thu, 18 May, 2023, 1:16 pm Michael Wechner, wrote: > I just implemented it and tested it with OpenAI's text-embedding-ada-002, > which is using 1536 dimensions and it works very fine :-) > > Thanks > >

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
I just implemented it and tested it with OpenAI's text-embedding-ada-002, which is using 1536 dimensions and it works very fine :-) Thanks Michael Am 18.05.23 um 00:29 schrieb Michael Wechner: IIUC KnnVectorField is deprecated and one is supposed to use KnnFloatVectorField when using