Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-10 Thread Alessandro Benedetti
I think Gus points are on target. I recommend we move this forward in this way: We stop any discussion and everyone interested proposes an option with a motivation, then we aggregate the options and we create a Vote maybe? I am also on the same page on the fact that a veto should come with a

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-10 Thread Michael Sokolov
I poked around on huggingface looking at various models that are being promoted there; this is the highest-performing text model they list, which is expected to take sentences as input; it uses so-called "attention" to capture the context of words:

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-10 Thread Michael Sokolov
I think concatenating word-embedding vectors is a reasonable thing to do. It captures information about the sequence of tokens which is being lost by the current approach (summing them). Random article I found in a search