Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread David Smiley
We agree backwards compatibility with the index should be maintained and that checkIndex should work. And we agree on a number of other things, but I want to focus on configurability. As long as the index contains the number of dimensions actually used in a specific segment & field, why couldn't c

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Ok pushed an attempt at a clearer message. LMK what you think. On Tue, May 16, 2023 at 11:30 PM Gus Heck wrote: > Ok reading my last message I realize it still might not be clear. Here's > what I observed: > > The class Codec clearly loaded, (from the lucene-core jar) when > Codec$Holder tried t

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Ok reading my last message I realize it still might not be clear. Here's what I observed: The class Codec clearly loaded, (from the lucene-core jar) when Codec$Holder tried to load the class initializer code went looking for the service definitions. It failed to find any of the the META-INF/servic

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Oh hmm the google UI hid the quoted bit. If you don't like message let's improve it. (actually, it should probably say the "file in the jar"... or something a little more specific... not the jar entirely. The class loads, but the service loader cant access the file in the same jar without the FileP

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
I propose to improve the message on an exception already thrown. On Tue, May 16, 2023 at 11:04 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > You propose to throw an exception containing this, right? > > > Java does not throw SecurityException if this > is the case, it just ignores

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Ishan Chattopadhyaya
You propose to throw an exception containing this, right? > Java does not throw SecurityException if this is the case, it just ignores the jar! Are you serious? On Wed, 17 May, 2023, 8:02 am Gus Heck, wrote: > Blaming? > > On Tue, May 16, 2023 at 10:05 PM Ishan Chattopadhyaya < > ichattopadhy.

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Hi Robert, If you read the issue I opened more carefully you'll see I had all the service loading stuff sorted just fine. It's the silent eating of the security exceptions by URLClassPath that I think is a useful thing to point out. If anything, that ticket is more about being surprised by Securit

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
My problem is that it impacts the default codec which is supported by our backwards compatibility policy for many years. We can't just let the user determine backwards compatibility with a sysprop. how will checkindex work? We have to have bounds and also allow for more performant implementations t

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread David Smiley
Robert, I have not heard from you (or anyone) an argument against System property based configurability (as I described in Option 4 via a System property). Uwe notes wisely some care must be taken to ensure it actually works. Sure, of course. What concerns do you have with this? ~ David Smiley

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Blaming? On Tue, May 16, 2023 at 10:05 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > > Having that explicitly called out would have been SUPER helpful. > > Blaming Java in an exception thrown by Lucene is a ridiculous idea. > > On Wed, 17 May, 2023, 3:33 am Gus Heck, wrote: > >>

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Ishan Chattopadhyaya
> Having that explicitly called out would have been SUPER helpful. Blaming Java in an exception thrown by Lucene is a ridiculous idea. On Wed, 17 May, 2023, 3:33 am Gus Heck, wrote: > Found it. > > It's a solr thing made worse by the interaction of lucene testutils and > jdk.internal.loader.URL

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
by the way, i agree with the idea to MOVE THE LIMIT UNCHANGED to the hsnw-specific code. This way, someone can write alternative codec with vectors using some other completely different approach that incorporates a different more appropriate limit (maybe lower, maybe higher) depending upon their t

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
Gus, I think i explained myself multiple times on issues and in this thread. the performance is unacceptable, everyone knows it, but nobody is talking about. I don't need to explain myself time and time again here. You don't seem to understand the technical issues (at least you sure as fuck don't k

RE: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Pandu Kerr
Hi all, Great to have this discussion! My votes are for 2 and 4! Best, Pandu On 2023/05/16 08:50:24 Alessandro Benedetti wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for the preferred one and then proceed with the > implementation.

RE: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Pandu Kerr
Hi all, Great to have this discussion! My votes are for 2 and 4! Best, Pandu On 2023/05/16 08:50:24 Alessandro Benedetti wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for the preferred one and then proceed with the > implementation.

Allowing tests to use multiple cores

2023-05-16 Thread Jonathan Ellis
Hi all, I found out last week that my concurrent HNSW [1] was not as bug-free as I had thought. It was passing the same tests as the serial HNSW, but the gradle configuration was limiting the test JVMs to a single core. I had a much more interesting time debugging when I hacked out that limitati

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Jonathan Ellis
My non-binding vote: Option 2 = Option 4 > Option 1 > Option 3 Explanation: Lucene's somewhat arbitrary limit of 1024 does not currently affect the raw, low-level HNSW, which is what I am plugging into Cassandra. The only option that would break this code is option 3. P.S. I mentioned this in a

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Found it. It's a solr thing made worse by the interaction of lucene testutils and jdk.internal.loader.URLClassPath's decision to hide anything gone wrong when checking a URL /* * Checks whether the resource URL should be returned. * Returns null on security check failure. * Call

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
Even if the options can be basically summarised in two groups: make it configurable VS not making it configurable and leave it be, when I collected the options from people I ended up with these four and I didn't want to collapse any of them (potentially making the proposer feel diminished). -

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Actually, I had wondered if this is a proper vote thread or not, normally those are yes/no on a single option. On Tue, May 16, 2023 at 10:47 AM Alessandro Benedetti wrote: > Hi Marcus, > I am afraid at this stage Robert's opinion counts just as any other > opinion, a single vote for option 1. >

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
Hi Marcus, I am afraid at this stage Robert's opinion counts just as any other opinion, a single vote for option 1. We are collecting a community's feedback here, we are not changing any code nor voting for a yes/no. Once the voting is finished, we'll operate an action depending on the community's

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Marcus Eagan
Given that Robert has put in his veto, aren’t we clear on what we need to do for him to change his mind? He’s been pretty clear and the rules of veto are cut and dry. Most of the people that have contributed to kNN vectors recently are not even on the thread. I think improving the feature should b

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Houston Putman
+1 on the combination of #3 and #4. Also good things to make sure of Uwe, thanks for calling those out. (Especially about the limit only being used on write, not on read). - Houston On Tue, May 16, 2023 at 9:57 AM Uwe Schindler wrote: > I agree with Dawid, > > I am +1 for those two options in

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Uwe Schindler
I agree with Dawid, I am +1 for those two options in combination: * option 3 (make limit an HNSW specific thing). New formats may use other limits (lower or higher). * option 4 (make a system property with HNSW prefix). Adding the system property must be done in same way like new propert

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Dawid Weiss
I'm for option 3 (limit at algorithm level), with the default there tunable via property (option 4). I understand Robert's concerns and I'd love to contribute a faster implementation but the reality is - I can't do it at the moment. I feel like experiments are good though and we shouldn't just ban

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Benjamin Trent
My vote is for option 3. Prevents Lucene from having the limit increased. Allows others who implement a different codec to set a limit of their choosing. Though I don't know the historical reasons for putting specific configuration items at the codec level. This limit is performance related and va

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
+1 to Gus' reply. I think that Robert's veto or anyone else's veto is fair enough, but I also think that anyone who is vetoing should be very clear about the objectives / goals to be achieved, in order to get a +1. If no clear objectives / goals can be defined and agreed on, then the whole t

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Robert, Can you explain in clear technical terms the standard that must be met for performance? A benchmark that must run in X time on Y hardware for example (and why that test is suitable)? Or some other reproducible criteria? So far I've heard you give an *opinion* that it's unusable, but that's

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
my non-binding vote goes to Option 2 resp. Option 4 Thanks Michael Wechner Am 16.05.23 um 10:51 schrieb Alessandro Benedetti: My vote goes to *Option 4*. -- *Alessandro Benedetti* Director @ Sease Ltd. /Apache Lucene/Solr Committer/ /Apache Solr PMC Member/ e-mail: a.

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
i still feel -1 (veto) on increasing this limit. sending more emails does not change the technical facts or make the veto go away. On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for th

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
For simplicity's sake, let's consider Option 2 and 4 as equivalent as they are not mutually exclusive and just differ on a minor implementation detail. On Tue, 16 May 2023, 10:24 Alessandro Benedetti, wrote: > Option 4 also aim to refactor the limit in an appropriate place for the > code (short

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
Option 4 also aim to refactor the limit in an appropriate place for the code (short answer is Yes, implementation details) Cheers On Tue, 16 May 2023, 10:04 Michael Wechner, wrote: > Hi Alessandro > > Thank you very much for summarizing and starting the vote. > > I am not sure whether I really

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
Hi Alessandro Thank you very much for summarizing and starting the vote. I am not sure whether I really understand the difference between Option 2 and Option 4, or is it just about implementation details? Thanks Michael Am 16.05.23 um 10:50 schrieb Alessandro Benedetti: Hi all, we have f

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
My vote goes to *Option 4*. -- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member* e-mail: a.benede...@sease.io *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io

[VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
Hi all, we have finalized all the options proposed by the community and we are ready to vote for the preferred one and then proceed with the implementation. *Option 1* Keep it as it is (dimension limit hardcoded to 1024) *Motivation*: We are close to improving on many fronts. Given the criticality