Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-10 Thread Gus Heck
Do you anticipate that the vector engine would be changed in a way that fundamentally precluded larger vectors (intentionally)? I would think that the ability to support larger vectors should be a key criteria for any changes to be made. Certainly if there are optimizations to be had at specific

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-10 Thread Jonathan Ellis
I did track down a weird bug I was seeing to our cosine similarity returning NaN with high dimension vectors. Fix is here: https://github.com/apache/lucene/pull/12281 On Tue, May 9, 2023 at 12:15 PM Jonathan Ellis wrote: > I'm adding Lucene HNSW to Cassandra for vector search. One of my test

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Michael Wechner
I assumed that you would wrap Lucene into a mimimal REST service or use Solr or Elasticsearch Am 09.05.23 um 19:07 schrieb jim ferenczi: Lucene is a library. I don’t see how it would be exposed in this plugin which is about services. On Tue, 9 May 2023 at 18:00, Jun Luo wrote: The pr

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Michael Wechner
Yes, you would split the document into multiple chunks, whereas the ChatGPT retrieval plugin does this by itself, whereas AFAIK the default chunk size is 200 tokens (https://github.com/openai/chatgpt-retrieval-plugin/blob/main/services/chunks.py). Also it creates a unique ID for each document

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Jonathan Ellis
It looks like the framework is designed to support self-hosted plugins. On Tue, May 9, 2023 at 12:13 PM jim ferenczi wrote: > Lucene is a library. I don’t see how it would be exposed in this plugin > which is about services. > > > On Tue, 9 May 2023 at 18:00, Jun Luo wrote: > >> The pr

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Jonathan Ellis
I'm adding Lucene HNSW to Cassandra for vector search. One of my test harnesses loads 50k openai embeddings. Works as expected; as someone pointed out, it should be linear wrt vector size and that is what I see. I would not be afraid of increasing the max size. In parallel, Cassandra is also

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread jim ferenczi
Lucene is a library. I don’t see how it would be exposed in this plugin which is about services. On Tue, 9 May 2023 at 18:00, Jun Luo wrote: > The pr mentioned a Elasticsearch pr > that increased the > dim to 2048 in ElasticSearch. > >

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Jun Luo
The pr mentioned a Elasticsearch pr that increased the dim to 2048 in ElasticSearch. Curious how you use Lucene's KNN search. Lucene's KNN supports one vector per document. Usually multiple/many vectors are needed for a document content. We

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Alessandro Benedetti
I tried my best in the previous thread to set a plan of action to decide what should be done with that limit, I tried to summarise the possible next steps multiple times, but the discussion steered into other directions (fierce opposition, benchmarking, etc, etc). I created a new thread:

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Bruno Roustant
I agree with Robert Muir that an increase of the 1024 limit as it is currently in FloatVectorValues or ByteVectorValues would bind the API, we could not decrease it after, even if we needed to change the vector engine. Would it be possible to move the limit definition to a HNSW specific

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Michael Wechner
there is already a pull request for Elasticsearch which is also mentioning the max size 1024 https://github.com/openai/chatgpt-retrieval-plugin/pull/83 Am 06.05.23 um 19:00 schrieb Michael Wechner: Hi Together I recently setup ChatGPT retrieval plugin locally

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Christian Moen
Hello Michael, I agree. I think it makes sense to support OpenAI embeddings. Best, Christian On Sat, May 6, 2023 at 7:03 PM Michael Wechner wrote: > Hi Together > > I recently setup ChatGPT retrieval plugin locally > > https://github.com/openai/chatgpt-retrieval-plugin > > I think it would

Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Michael Wechner
Hi Together I recently setup ChatGPT retrieval plugin locally https://github.com/openai/chatgpt-retrieval-plugin I think it would be nice to consider to submit a Lucene implementation for this plugin https://github.com/openai/chatgpt-retrieval-plugin#future-directions The plugin is using