Re: [DISCUSS] New data type for vector search

2023-04-26 Thread Andrés de la Peña
If we are going to use FLOAT[N] as sugar for another CQL data type, maybe tuples are more convenient than lists. So FLOAT[N] could be equivalent to TUPLE. Differently to collections, tuples have a fixed size, they are always frozen and I think they don't support random access. These properties see

Re: [DISCUSS] New data type for vector search

2023-04-26 Thread Mick Semb Wever
> > My inclination then would be to say you declare an ARRAY (which > is semantic sugar for FROZEN>). This is very consistent with > our existing style. We then simply permit such columns to define ANN > indexes. > So long as nulls aren't a problem as David questions, an alternative is: FLOAT[N

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread J. D. Jordan
If we look to postgresql it allows defining arrays using FLOAT[N] or FLOAT ARRAY[N]. So that is an extra point for me to just using FLOAT[N]. From my quick search neither oracle* nor MySQL directly support arrays in columns. * oracle supports declaring a custom type using VARRAY and then using

Re: [DISCUSS] New data type for vector search

2023-04-26 Thread David Capwell
Benedicts comments also makes me question; can any of the values in the vector be null? The patch sent works with float arrays, so null isn’t possible… is null not valid for a vector type? If so this would help justify why is a vector not a array or a list (both allow null) > On Apr 26, 2023,

Re: [DISCUSS] New data type for vector search

2023-04-26 Thread David Capwell
Thanks for starting this thread! > In the initial commits and thread, this was DENSE FLOAT32. Nobody really > loved that, so we considered a bunch of alternatives, including > > - `FLOAT[N]`: This minimal option resembles C and Java array syntax, which > would make it familiar for many users. H

Re: [DISCUSS] New data type for vector search

2023-04-26 Thread Benedict Elliott Smith
I think we need to briefly step back and think about what the syntax means and how it fits into existing syntax.It seems that the dimensionality verbiage assumes we’re logically introducing N vector fields, so that each row adopts a value for all of the vector fields or none. But in practice we ar

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread David Capwell
> DENSE seems to just be an array? So very similar to a frozen list, but with a > fixed size? How I read the doc, DENSE = ARRAY, but knew that couldn’t be the case, so when I read the code its fixed size array…. So the real syntax was “DENSE FLOAT32[42]” Not a fan of the type naming, and feel

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread Patrick McFadin
I guess this is an excellent example to explore the minima of what constitutes a CEP. So far, CEPs have been some large changes, so where does something like this fit? (Wait. Did I beat Benedict to a Bike Shed? I think I did.) This is a list of everything needed for a CEP: Status Scope Goals Appr

[DISCUSS] New data type for vector search

2023-04-26 Thread Jonathan Ellis
Hi all, Splitting this out per the suggestion in the initial VS thread so we can work on driver support in parallel with the server-side changes. I propose adding a new data type for vector search indexes: FLOAT VECTOR[N_DIMENSIONS] In the initial commits and thread, this was DENSE FLOAT32. Nob

Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL

2023-04-26 Thread Mick Semb Wever
On Sat, 15 Apr 2023 at 03:17, C. Scott Andreas wrote: > If there’s lack of clarity around EOL policy and dates, we should > absolutely make this clear. > Fix is here: https://github.com/thelastpickle/cassandra-website/tree/mck/update-5-0_dates_download_page w/ html generated here: https://raw

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread Benedict
We probably at least need to bike shed naming as we already have FLOAT, DOUBLE, and LIST - which are similar/overlapping types, and we shoo on should be consistent.If we introduce FLOAT32 we probably need that to be an alias of FLOAT and introduce FLOAT64 to alias DOUBLE for consistency.DENSE seem