Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Gus Heck
10 MB hard drive, wow I'll never need another floppy disk ever... Neural nets... nice idea, but there will never be enough CPU power to run them... etc. Is it possible to make it a configurable limit? On Wed, Apr 5, 2023 at 4:51 PM Jack Conradson wrote: > I don't want to get too far off

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Jack Conradson
I don't want to get too far off topic, but I think one of the problems here is that HNSW doesn't really fit well as a Lucene data structure. The way it behaves it would be better supported as a live, in-memory data structure instead of segmented and written to disk for tiny graphs that then need

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Robert Muir
I'd ask anyone voting +1 to raise this limit to at least try to index a few million vectors with 756 or 1024, which is allowed today. IMO based on how painful it is, it seems the limit is already too high, I realize that will sound controversial but please at least try it out! voting +1 without

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Michael Wechner
Thanks for your feedback! I agree, that it should not crash. So far we did not experience crashes ourselves, but we did not index millions of vectors. I will try to reproduce the crash, maybe this will help us to move forward. Thanks Michael Am 05.04.23 um 18:30 schrieb Dawid Weiss: Can

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Dawid Weiss
> Can you describe your crash in more detail? I can't. That experiment was a while ago and a quick test to see if I could index rather large-ish USPTO (patent office) data as vectors. Couldn't do it then. > How much RAM? My indexing jobs run with rather smallish heaps to give space for I/O

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Michael Wechner
Hi Dawid Can you describe your crash in more detail? How many millions vectors exactly? What was the vector dimension? How much RAM? etc. Thanks Michael Am 05.04.23 um 17:48 schrieb Dawid Weiss: Ok, so what should we do then? I don't know, Alessandro. I just wanted to point out the fact

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Dawid Weiss
> Ok, so what should we do then? I don't know, Alessandro. I just wanted to point out the fact that by Apache rules a committer's veto to a code change counts as a no-go. It does not specify any way to "override" such a veto, perhaps counting on disagreeing parties to resolve conflicting points

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Alessandro Benedetti
Ok, so what should we do then? This space is moving fast, and in my opinion we should act fast to release and guarantee we attract as many users as possible. At the same time I am not saying we should proceed blind, if there's concrete evidence for setting a limit rather than another, or that a

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Dawid Weiss
> Should create a VOTE thread, where we propose some values with a > justification and we vote? > Technically, a vote thread won't help much if there's no full consensus - a single veto will make the patch unacceptable for merging. https://www.apache.org/foundation/voting.html#Veto Dawid

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Michael Wechner
Am 05.04.23 um 12:34 schrieb Alessandro Benedetti: Thanks Mike for the insight! What would be the next steps then? I see agreement but also the necessity of identifying a candidate MAX. Should create a VOTE thread, where we propose some values with a justification and we vote? +1 Thanks

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Alessandro Benedetti
Thanks Mike for the insight! What would be the next steps then? I see agreement but also the necessity of identifying a candidate MAX. Should create a VOTE thread, where we propose some values with a justification and we vote? In this way we can create a pull request and merge relatively soon.