Re: Announcing githubsearch!

2024-02-19 Thread Michael Wechner
thank you very much! Am 19.02.24 um 17:39 schrieb Michael McCandless: Hi Team, ~1.5 years ago (August 2022) we migrated our Lucene issue tracking from Jira to GitHub. Thank you Tomoko for all the hard work doing such a complex, multi-phased, high-fidelity migration! I finally finished also

Re: The need for a Lucene 9.9.2 release

2024-01-23 Thread Michael Wechner
thanks for discovering and fixing! Michael Am 23.01.24 um 18:36 schrieb Chris Hegarty: Hi, We’ve encounter a serious issue with the recent Lucene 9.9.1 release, which warrants a 9.9.2. The issue is a NPE when sampling for quantization in Lucene99HnswScalarQuantizedVectorsFormat [1].

Re: Welcome Stefan Vodita as Lucene committter

2024-01-19 Thread Michael Wechner
Hi Stefan, thank you very much for your contributions and helping to improve Lucene! All the best Michael Am 19.01.24 um 20:03 schrieb Stefan Vodita: Thank you all! It's an honor to join the project as a committer. I'm originally from a small town in southern Romania

Re: SPLADE implementation

2023-11-15 Thread Michael Wechner
feature,0.3F), BooleanClause.Occur.SHOULD));} BooleanQuery termExpansionQuery = bqb.build(); log.info("Term expansion query: " + termExpansionQuery); return termExpansionQuery; }else { log.info("Regular query: " + questionQuery); return questionQuery; } Am

Re: SPLADE implementation

2023-11-15 Thread Michael Wechner
(term, weight) pair, you add a "new BooleanClause(FeatureField.newLinearQuery(fieldName, term, weight))" to your BooleanQuery. On Wed, Nov 15, 2023 at 11:08 AM Michael Wechner wrote: Hi Adrien Ah ok, I did not realize this, thanks for pointing this out! I don't quite un

Re: SPLADE implementation

2023-11-15 Thread Michael Wechner
34 schrieb Adrien Grand: Hi Michael, What functionality are you missing? Lucene already supports indexing/querying weighted terms using FeatureField. On Wed, Nov 15, 2023 at 10:03 AM Michael Wechner wrote: Hi I have found the following issue re a possible SPLADE implementation

SPLADE implementation

2023-11-15 Thread Michael Wechner
Hi I have found the following issue re a possible SPLADE implementation https://github.com/apache/lucene/issues/11799 Is somebody still working on this? Thanks Michael - To unsubscribe, e-mail:

Re: Quantization for vector search

2023-11-04 Thread Michael Wechner
Thanks! Ben On Sat, Nov 4, 2023 at 4:07 AM Michael Wechner wrote: Hi If I understand correctly some devs are working on introducing quantization for vector search or at least considering it https://github.com/apache/lucene/issues/12497 Just being curious what

Quantization for vector search

2023-11-04 Thread Michael Wechner
Hi If I understand correctly some devs are working on introducing quantization for vector search or at least considering it https://github.com/apache/lucene/issues/12497 Just being curious what is the status on this resp. is somebody working on this actively? It came to my mind, because

Re: Update TermInSetQuery Example?

2023-10-21 Thread Michael Wechner
be fixed. There may be other combinations not detected because of source code formatting. Uwe Am 20.10.2023 um 23:46 schrieb Michael Wechner: Hi I recently found TermInSetQuery example at https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/TermInSetQuery.html but if I

Update TermInSetQuery Example?

2023-10-20 Thread Michael Wechner
Hi I recently found TermInSetQuery example at https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/TermInSetQuery.html but if I understand correctly one should use now BooleanQuery.Builder instead BooleanQuery itself, right? BooleanQuery.Builder bqb = new

Re: Multimodal search

2023-10-16 Thread Michael Wechner
don't think this really addresses the problem of accuracy at its core. Thanks Michael Am 15.10.23 um 21:05 schrieb Michael Wechner: Hi Navneet I also observe that various "vector search DBs" are implementing hybrid search, because the accuracy with embeddings is often not go

Re: Multimodal search

2023-10-15 Thread Michael Wechner
lts there have been techniques like re-ranking that are very popular. Thanks Navneet On Fri, Oct 13, 2023 at 8:53 AM Michael Wechner wrote: Thanks for your feedback and the link to the OpenSearch implementation! I think the embedding approach as it exists today is not and

Re: Multimodal search

2023-10-13 Thread Michael Wechner
eally need to be multimodal? Wherever the embeddings come from, Lucene can index the vectors and combine with textual queries, right? Thanks, Froh On Thu, Oct 12, 2023 at 12:59 PM Michael Wechner wrote: Hi Did anyone of the Lucene committers consider making Lucene multimodal? Wi

Multimodal search

2023-10-12 Thread Michael Wechner
Hi Did anyone of the Lucene committers consider making Lucene multimodal? With a quick Google search I found for example https://dl.acm.org/doi/abs/10.1145/3503161.3548768 https://sigir-ecom.github.io/ecom2018/ecom18Papers/paper7.pdf Thanks Michael

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner
Hi Together You might be interesed in this paper / article https://arxiv.org/abs/2308.14963 Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Lucene 9.7 release

2023-06-09 Thread Michael Wechner
Thank you very much, Adrien! Am 09.06.23 um 18:20 schrieb Tomás Fernández Löbbe: +1 Thanks Adrien On Fri, Jun 9, 2023 at 9:19 AM Michael McCandless wrote: +1, thanks Adrien! Mike McCandless http://blog.mikemccandless.com On Fri, Jun 9, 2023 at 12:11 PM Patrick Zhai

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
topadhyaya, wrote: That sounds promising, Michael. Can you share scripts/steps/code to reproduce this? On Thu, 18 May, 2023, 1:16 pm Michael Wechner, wrote: I just implemented it and tested it with OpenAI's text-embedding-ada-002, which is

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
Chattopadhyaya: That sounds promising, Michael. Can you share scripts/steps/code to reproduce this? On Thu, 18 May, 2023, 1:16 pm Michael Wechner, wrote: I just implemented it and tested it with OpenAI's text-embedding-ada-002, which is using 1536 dimensions and it works very fine

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
I just implemented it and tested it with OpenAI's text-embedding-ada-002, which is using 1536 dimensions and it works very fine :-) Thanks Michael Am 18.05.23 um 00:29 schrieb Michael Wechner: IIUC KnnVectorField is deprecated and one is supposed to use KnnFloatVectorField when using

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Wechner
IIUC KnnVectorField is deprecated and one is supposed to use KnnFloatVectorField when using float as vector values, right? Am 17.05.23 um 16:41 schrieb Michael Sokolov: see https://markmail.org/message/kf4nzoqyhwacb7ri On Wed, May 17, 2023 at 10:09 AM David Smiley wrote: > easily be

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Wechner
I try to better understand the code, so IIUC vector MAX_DIMENSIONS is currently used inside lucene/core/src/java/org/apache/lucene/document/FieldType.java lucene/core/src/java/org/apache/lucene/document/KnnFloatVectorField.java

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
+1 to Gus' reply. I think that Robert's veto or anyone else's veto is fair enough, but I also think that anyone who is vetoing should be very clear about the objectives / goals to be achieved, in order to get a +1. If no clear objectives / goals can be defined and agreed on, then the whole

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
my non-binding vote goes to Option 2 resp. Option 4 Thanks Michael Wechner Am 16.05.23 um 10:51 schrieb Alessandro Benedetti: My vote goes to *Option 4*. -- *Alessandro Benedetti* Director @ Sease Ltd. /Apache Lucene/Solr Committer/ /Apache Solr PMC Member/ e-mail

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
Hi Alessandro Thank you very much for summarizing and starting the vote. I am not sure whether I really understand the difference between Option 2 and Option 4, or is it just about implementation details? Thanks Michael Am 16.05.23 um 10:50 schrieb Alessandro Benedetti: Hi all, we have

Re: Dimensions Limit for KNN vectors - Next Steps

2023-05-09 Thread Michael Wechner
+1 Michael Wechner Am 09.05.23 um 14:08 schrieb Alessandro Benedetti: *Proposed option*: make the limit configurable *Motivation*: The system administrator can enforce a limit its users need to respect that it's in line with whatever the admin decided to be acceptable for them. The default

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Michael Wechner
vector index. If we don't need the inverted index, will it be better to use other vector dbs? For example, PostgreSQL also added vector support recently. Thanks, Jun On Sat, May 6, 2023 at 1:44 PM Michael Wechner wrote: there is already a pull request for Elasticsearch

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Michael Wechner
ill it be better to use other vector dbs? For example, PostgreSQL also added vector support recently. Thanks, Jun On Sat, May 6, 2023 at 1:44 PM Michael Wechner wrote: there is already a pull request for Elasticsearch which is also mentioning the max size 1024 https://github.co

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Michael Wechner
there is already a pull request for Elasticsearch which is also mentioning the max size 1024 https://github.com/openai/chatgpt-retrieval-plugin/pull/83 Am 06.05.23 um 19:00 schrieb Michael Wechner: Hi Together I recently setup ChatGPT retrieval plugin locally https://github.com/openai

Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-06 Thread Michael Wechner
Hi Together I recently setup ChatGPT retrieval plugin locally https://github.com/openai/chatgpt-retrieval-plugin I think it would be nice to consider to submit a Lucene implementation for this plugin https://github.com/openai/chatgpt-retrieval-plugin#future-directions The plugin is using

Re: Seeking Tools and Methods to Measure Lucene's Indexing Performance

2023-05-06 Thread Michael Wechner
thanks for the pointer! I have added it to the Lucene FAQ https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-HowisLucene'sindexingandsearchperformancemeasured? Thanks Michael Am 06.05.23 um 06:18 schrieb Ishan Chattopadhyaya: Check Lucene bench:

Re: Concurrent HNSW index

2023-04-27 Thread Michael Wechner
+1 for a pull request Thanks Michael Am 27.04.23 um 20:53 schrieb Ishan Chattopadhyaya: +1, please contribute to Lucene. Thanks! On Thu, 27 Apr, 2023, 10:59 pm Jonathan Ellis, wrote: Hi all, I've created an HNSW index implementation that allows for concurrent build and

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-21 Thread Michael Wechner
yes, they are, whereas it should help us to test performance and scalability :-) Am 21.04.23 um 09:24 schrieb Ishan Chattopadhyaya: Seems like they were all 768 dimensions. On Fri, 21 Apr, 2023, 11:48 am Michael Wechner, wrote: Hi Together Cohere just published approx. 100Mio

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-21 Thread Michael Wechner
Michael Am 13.04.23 um 07:58 schrieb Michael Wechner: Hi Kent Great, thank you very much! Will download it later today :-) All the best Michael Am 13.04.23 um 01:35 schrieb Kent Fitch: Hi Michael (and anyone else who wants just over 240K "real world" ada-002 vectors of dime

Re: Lucene 9.6 release

2023-04-19 Thread Michael Wechner
+1 Thanks! Michael Am 19.04.23 um 18:09 schrieb Benjamin Trent: +1 ! You rock Alan! On Wed, Apr 19, 2023, 9:54 AM Ignacio Vera wrote: +1 Thanks Alan! On Wed, Apr 19, 2023 at 1:27 PM Alan Woodward wrote: Hi all, It’s been a while since our last release,

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-12 Thread Michael Wechner
em for any purpose you like. best wishes, Kent Fitch On Wed, Apr 12, 2023 at 4:37 PM Michael Wechner wrote: thank you very much for your feedback! In a previous post (April 7) you wrote you could make availlable the 47K ada-002 vectors, which would be great! Would it

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-12 Thread Michael Wechner
thank you very much for your feedback! In a previous post (April 7) you wrote you could make availlable the 47K ada-002 vectors, which would be great! Would it make sense to setup a public gitub repo, such that others could use or also contribute vectors? Thanks Michael Wechner Am

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-11 Thread Michael Wechner
. As I said above, after surveying huggingface I couldn't find any text-based model using more than 768 dimensions. So far we have some ideas of generating higher-dimensional data by dithering or concatenating existing data, but it seems artificial. On Tue, Apr 11, 2023 at 9:31 AM Michael Wechner

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-11 Thread Michael Wechner
complain that hnsw sucks it doesn't scale, but when I show it scales linearly with dimension you just ignore that and complain about something entirely different. > >> > > >> > You demand that people run all kinds of tests

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Michael Wechner
I think for testing the performance and scalability one can also use synthetic data and it does not have to be real world data in the sense of vectors generated from real world text. But I think the more people revisit the testing of performance and scalability the better and any help on this

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Michael Wechner
. Then you complain about people not meeting you half way. Wow On Sat, Apr 8, 2023, 12:40 PM Robert Muir wrote: On Sat, Apr 8, 2023 at 8:33 AM Michael Wechner wrote: What exactly do you consider reasonable? Let's begin a real discussion by being HONEST about the current status. Please pu

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-08 Thread Michael Wechner
remains the same. On Fri, Apr 7, 2023 at 10:57 PM Michael Wechner wrote: sorry to interrupt, but I think we get side-tracked from the original discussion to increase the vector dimension limit. I think improving the vector indexing performance is one thing and making sure Lucene does not crash when

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Michael Wechner
sorry to interrupt, but I think we get side-tracked from the original discussion to increase the vector dimension limit. I think improving the vector indexing performance is one thing and making sure Lucene does not crash when increasing the vector dimension limit is another. I think it is

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Michael Wechner
till won't be free, but it would be 99% cheaper than paying the LLM companies if we can be slow. On Thu, Apr 6, 2023 at 9:42 PM Michael Wechner wrote: Great, thank you! How much RAM; etc. did you run this test on? Do the vectors really have to be based on real data f

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Wechner
use for testing? If all else fails I can test with noise, but that tends to lead to meaningless results On Thu, Apr 6, 2023 at 3:52 PM Michael Wechner wrote: Am 06.04.23 um 17:47 schrieb Robert Muir: Well, I'm asking ppl actually try to test using such high dimensions. Based on my own

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Wechner
but please at least try it out! voting +1 without at least doing this is really the "weak/unscientifically minded" approach. On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner wrote: Thanks for your feedback! I agree, that it should not crash. So far we did not experience crashes o

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Wechner
the vectors en masse into RAM while merging. On Thu, Apr 6, 2023 at 10:20 AM Michael Wechner wrote: thanks very much for these insights! Does it make a difference re RAM when I do a batch import, for example import 1000 documents and close the IndexWriter and do a forceMerge or import 1Mio

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Wechner
thanks very much for these insights! Does it make a difference re RAM when I do a batch import, for example import 1000 documents and close the IndexWriter and do a forceMerge or import 1Mio documents at once? I would expect so, or do I misunderstand this? Thanks Michael Am 06.04.23 um

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Michael Wechner
I think we should focus on testing where the limits are and what might cause the limits. Let's get out of this fog :-) Thanks Michael Am 06.04.23 um 11:47 schrieb Michael McCandless: > We shouldn't accept weakly/not scientifically motivated vetos anyway right? In fact we must accept all

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Michael Wechner
Thanks for your feedback! I agree, that it should not crash. So far we did not experience crashes ourselves, but we did not index millions of vectors. I will try to reproduce the crash, maybe this will help us to move forward. Thanks Michael Am 05.04.23 um 18:30 schrieb Dawid Weiss: Can

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Michael Wechner
Hi Dawid Can you describe your crash in more detail? How many millions vectors exactly? What was the vector dimension? How much RAM? etc. Thanks Michael Am 05.04.23 um 17:48 schrieb Dawid Weiss: Ok, so what should we do then? I don't know, Alessandro. I just wanted to point out the fact

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Michael Wechner
Michael In this way we can create a pull request and merge relatively soon. Cheers On Tue, 4 Apr 2023, 14:47 Michael Wechner, wrote: IIUC we all agree that the limit could be raised, but we need some solid reasoning what limit makes sense, resp. why do we set this particular

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-04 Thread Michael Wechner
07:29 Michael Wechner, wrote: btw, what was the reasoning to set the current limit to 1024? Thanks Michael Am 01.04.23 um 14:47 schrieb Michael Sokolov: I'm also in favor of raising this limit. We do see some datasets with higher than 1024 d

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-02 Thread Michael Wechner
> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd> On Fri, 31 Mar 2023 at 16:12, Michael Wechner wrote: OpenAI reduced their size to 1536 dimensions

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-03-31 Thread Michael Wechner
in the low thousands, e.g. top-k KNN searches would still look at a very small subset of the full dataset. So overall, my vote would be to bump the limit to 2048 as suggested by Mayya on the issue that you linked. On Fri, Mar 31, 2023 at 2:38 PM Michael Wechner wrote: Thanks Alessandro for summarizing

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-03-31 Thread Michael Wechner
Thanks Alessandro for summarizing the discussion below! I understand that there is no clear reasoning re what is the best embedding size, whereas I think heuristic approaches like described by the following link can be helpful

Re: Lucene 9.5.0 release

2023-01-23 Thread Michael Wechner
shortly? Thanks Luca On Sat, Jan 21, 2023 at 11:41 AM Michael Wechner wrote: I tried to understand the issue described on github, but unfortunately do not really understand it. Can you explain a little more? Thanks Michael Am 21.01.

Re: Lucene 9.5.0 release

2023-01-21 Thread Michael Wechner
I tried to understand the issue described on github, but unfortunately do not really understand it. Can you explain a little more? Thanks Michael Am 21.01.23 um 11:00 schrieb Alessandro Benedetti: Hi, this would be nice to have in 9.5 : https://github.com/apache/lucene/issues/12099 It's

Re: Release Lucene 9.4.2

2022-11-09 Thread Michael Wechner
Thank you! +1 :-) Am 09.11.22 um 16:38 schrieb Adrien Grand: Hello all, A bad integer overflow has been discovered in the KNN vectors format, which affects segments that have more than ~16M vectors. I'd like to do a bugfix release when the bug

Re: Raising the Value of MAX_DIMENSIONS of Vector Values

2022-10-20 Thread Michael Wechner
of the migration -- then we'll discuss once it's been moved to GitHub!) Julie On Mon, Aug 8, 2022 at 10:05 PM Michael Wechner wrote: I agree that Lucene should support vector sizes depending on the model one is choosing. For example Weaviate seems to do this https://weaviate.slack.com

Re: call for 9.4.1 release (bug in vectors format)

2022-10-18 Thread Michael Wechner
+1 :-) Thanks Michael Am 18.10.22 um 19:52 schrieb Julie Tibshirani: Hi everyone, We recently discovered a severe bug in the 9.4 release in the kNN vectors format: https://github.com/apache/lucene/issues/11858. Explaining the problem: when ingesting a lot of data, or when performing a

Re: Raising the Value of MAX_DIMENSIONS of Vector Values

2022-08-08 Thread Michael Wechner
I agree that Lucene should support vector sizes depending on the model one is choosing. For example Weaviate seems to do this https://weaviate.slack.com/archives/C017EG2SL3H/p1659981294040479 Thanks Michael Am 07.08.22 um 22:48 schrieb Marcus Eagan: Hi Lucene Team, In general, I have

Re: Generate autocomplete predictions

2022-03-14 Thread Michael Wechner
can help show how things tie together. I guess the thing you want to avoid is to spend hours on the prototype but otherwise either is fine. Le dim. 13 mars 2022, 23:01, Michael Wechner a écrit : Hi Adrien Thanks for your feedback!  From a "project management" point o

Re: Generate autocomplete predictions

2022-03-13 Thread Michael Wechner
Hi Michael, This sounds like a good fit for Lucene to me. On Fri, Mar 11, 2022 at 11:15 PM Michael Wechner wrote: Hi I recently implemened auto-suggest based on https://lucene.apache.org/core/9_0_0/suggest/index.html whereas I am currently managing the terms / predictions (e.g. "autocompleti

Generate autocomplete predictions

2022-03-11 Thread Michael Wechner
Hi I recently implemened auto-suggest based on https://lucene.apache.org/core/9_0_0/suggest/index.html whereas I am currently managing the terms / predictions (e.g. "autocompletion using lucene suggesters dev") contained by the index manually. I would like now to generate the terms /

Re: Lucene 9.1 release soon?

2022-02-23 Thread Michael Wechner
I think this would be great :-) thank you very much for your efforts! Michael Am 24.02.22 um 00:28 schrieb Julie Tibshirani: Hello everyone, Would there be support for releasing Lucene 9.1 soon? It has been ~2.5 months since 9.0 was released and we already have a long list of new features,

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-20 Thread Michael Wechner
   What was your age last year? 0.4658360947506338      What is your age? 0.4859953687958164        How old are you? So both models do not "understand" the question. As Alessandro suggested a "well-curated fine-tuning step" might improve this, whereas I have not been able to try thi

Re: How to Increase max vector size?

2022-02-17 Thread Michael Wechner
available under the Apache license. Thanks Michael Am 16.02.22 um 19:51 schrieb Michael Sokolov: Fair enough - are you planning to offer such a service;) sounds exciting! -Mike On Tue, Feb 15, 2022 at 6:00 PM Michael Wechner wrote: true :-) when you are the one controlling the input

Re: How to Increase max vector size?

2022-02-15 Thread Michael Wechner
v: I don't think it makes sense to have a static variable maximum that you can change by calling a method. What purpose would it serve? On Tue, Feb 15, 2022, 2:39 PM Michael Wechner wrote: Hi Alessandro No, I have not created a Jira ticket, but I would be happy to create one, just l

Re: How to Increase max vector size?

2022-02-15 Thread Michael Wechner
-- Alessandro Benedetti Apache Lucene/Solr PMC member and Committer Director, R Software Engineer, Search Consultant www.sease.io <http://www.sease.io> On Sat, 12 Feb 2022 at 22:53, Michael Wechner wrote: Hi I just tried to test the OpenAI model

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-15 Thread Michael Wechner
the door open to alternative algorithms that might have better performance. It would be great if Lucene would provide alternative algorithms in the future and one can choose the algorithm based on one's requirements Thanks Michael On Tue, Feb 15, 2022 at 12:21 PM Michael Wechner wrote

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-15 Thread Michael Wechner
y-mismatch-with-document-expansion.html Cheers -- Alessandro Benedetti Apache Lucene/Solr PMC member and Committer Director, R Software Engineer, Search Consultant www.sease.io <http://www.sease.io> On Tue, 15 Feb 2022 at 09:10, Michael Wechner wrote: fair enou

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-15 Thread Michael Wechner
calculations) are indeed the main factors to consider. Julie On Mon, Feb 14, 2022 at 1:02 PM Michael Wechner wrote: Hi Julie Thanks again for your feedback! I will do some more tests with "all-mpnet-base-v2" (768) and "all-roberta-large-v1" (1024), so 1024 is enough f

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-14 Thread Michael Wechner
ault a bit and would be happy to discuss if you'd like to file a JIRA issue. However 12288 dimensions still seems high to me, this is much larger than most well-established embedding models and could require a lot of memory. Julie On Mon, Feb 14, 2022 at 12:08 PM Michael Wechner wrote:

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-14 Thread Michael Wechner
On Sun, Feb 13, 2022 at 1:55 AM Michael Wechner wrote: Re the OpenAI embedding the following recent paper might be of interest https://arxiv.org/pdf/2201.10005.pdf (Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022) Thanks Michael Am 13.02.22 um

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-13 Thread Michael Wechner
Re the OpenAI embedding the following recent paper might be of interest https://arxiv.org/pdf/2201.10005.pdf (Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022) Thanks Michael Am 13.02.22 um 00:14 schrieb Michael Wechner: Here a concrete example where I combine OpenAI model

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-12 Thread Michael Wechner
as "understanding", but I hope it makes it clearer what I am looking for :-) Thanks Michael Am 12.02.22 um 22:38 schrieb Michael Wechner: Hi Alessandro I am mainly interested in detecting similarity, for example whether the following two sentences are similar resp. likely to mean the

How to Increase max vector size?

2022-02-12 Thread Michael Wechner
Hi I just tried to test the OpenAI model "text-similarity-davinci-001" with 12288 dimensions and receive the following error java.lang.IllegalArgumentException: vector numDimensions must be <= VectorValues.MAX_DIMENSIONS (=1024); got 12288     at

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-12 Thread Michael Wechner
andro Benedetti: Hi Michael, experience to what extent? We have been exploring the area for a while given we contributed the first neural search milestone to Apache Solr. What is your curiosity? Performance? Relevance impact? How to integrate it? Regards On Fri, 11 Feb 2022, 22:38 Michael Wechner,

Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-11 Thread Michael Wechner
Hi Does anyone have experience using OpenAI embeddings in combination with Lucene vector search? https://beta.openai.com/docs/guides/embeddings| for example comparing performance re vector size ||https://api.openai.com/v1/engines/|||text-similarity-ada-001|/embeddings and

Re: Searching Lucene FAQ with Lucene

2021-12-21 Thread Michael Wechner
y, whereas the Katie frontend already provides this functionality https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en but I have to enhance the Javascript client used at https://lucene-faq.ukatie.com/ Thanks Michael On Mon, Dec 20, 2021 at 5:05 AM Michael Wechner wrote: Hi I

Searching Lucene FAQ with Lucene

2021-12-20 Thread Michael Wechner
Hi I am working on a webapp called "Katie" in order to detect duplicated questions https://ukatie.com/ As a test case I have imported the Lucene FAQ https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ to https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en and made

Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Michael Wechner
Hi Patrick/Haoyu, congratulations! Am 19.12.21 um 21:05 schrieb Patrick Zhai: Thanks everyone! It's a great honor to become a lucene committer and thank you everyone for building such a friendly community and specially thank you to who has replied email/ commented on issues/ reviewed PRs

Re: Article link at Lucene FAQ does not exist anymore

2021-12-15 Thread Michael Wechner
I have removed it now :-) Am 22.11.21 um 18:18 schrieb Michael Wechner: Hi The QnA https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowcanIindexXMLdocuments? is pointing to (See also this article Parsing, indexing, and searching XML with Digester and Lucene <h

Re: Welcome Julie Tibshirani to the Lucene PMC

2021-12-01 Thread Michael Wechner
great to hear, congratulations, Julie! Am 01.12.21 um 14:29 schrieb Mayya Sharipova: Congratulations, Julie ! Very well deserved! On Wed, Dec 1, 2021 at 2:45 PM Ignacio Vera wrote: Congratulations Julie! On Wed, Dec 1, 2021 at 10:03 AM Alan Woodward wrote:

Article link at Lucene FAQ does not exist anymore

2021-11-22 Thread Michael Wechner
Hi The QnA https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowcanIindexXMLdocuments? is pointing to (See also this article Parsing, indexing, and searching XML with Digester and Lucene .)

Re: Time to write an open-source book?

2021-11-17 Thread Michael Wechner
I think this would be great and I would be very happy to contribute. For example I am currently trying to understand how the autocomplete / auto suggest functionality of Lucene works and I could contribute my learnings. All the best Michael Am 16.11.21 um 20:49 schrieb Dongyu Xu: Hi Devs,

Re: VectorField renamed to KnnVectorField?

2021-11-02 Thread Michael Wechner
n> - Vigya On Mon, Nov 1, 2021 at 2:44 PM Michael Wechner mailto:michael.wech...@wyona.com>> wrote: I was able to update my code -    FieldType vectorFieldType = VectorField.createHnswType(vector.length, VectorValues.SimilarityFunction.DOT_PRODUCT, 16, 500); - 

Re: VectorField renamed to KnnVectorField?

2021-11-01 Thread Michael Wechner
)); +    return new TopDocScorer(this, context.reader().searchNearestVectors(field, vector, topK, null)); the indexing and searching works again :-) Thanks Michael Am 01.11.21 um 18:53 schrieb Michael Wechner: Hi In May 2021 I have done a Vector Search implementation based on Lucene

VectorField renamed to KnnVectorField?

2021-11-01 Thread Michael Wechner
Hi In May 2021 I have done a Vector Search implementation based on Lucene 9.0.0-SNAPSHOT with the following code FieldType vectorFieldType = VectorField.createHnswType(vector.length, VectorValues.SimilarityFunction.DOT_PRODUCT,16,500); VectorField vectorField =new VectorField(VECTOR_FIELD,

Re: Not able to subscribe to Developer Lists

2021-07-29 Thread Michael Wechner
Hi Praveen I think you managed https://lists.apache.org/list.html?dev@lucene.apache.org and otherwise I would not have received this email :-) HTH Michael Am 29.07.21 um 09:15 schrieb Praveen Nishchal: Hi Dev Community, I have sent multiple emails to dev-subscr...@lucene.apache.org

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-17 Thread Michael Wechner
Hi Tomoko Just noticed that you resolved the issue and also did some additional improvement :-) Thanks a lot! Michael Am 14.07.21 um 07:52 schrieb Michael Wechner: sure, I understand, but I just wanted to ask whether such a change makes sense actually. I have created a Jira ticket https

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
Tomoko Uchida: We don't accept patches by email... please open a Jira. 2021年7月14日(水) 5:58 Michael Wechner : would the following patch make sense? git diff lucene/luke/src/ diff --git a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java b/lucene/luke/src/java/org/apache/lucene

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
.message.index_opened_ro=Index successfully opened. (read-only) Thanks Michael Am 13.07.21 um 22:43 schrieb Michael Wechner: I analyzed the logs and the class/method lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String, String) and realized that the pr

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
xist, then the error message reads "No such directory" Or that the dropdown "Index Path" is checking whether the previously opened directories still exist. Thanks Michael Am 13.07.21 um 10:47 schrieb Michael Wechner: thanks again for your feeback! I will give it a try and

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
e to start (haven't tried to read indexes that includes vector values with Luke). The stack traces you might see should include full information to fix or improve it. Tomoko 2021年7月13日(火) 14:22 Michael Wechner : Am 13.07.21 um 04:22 schrieb Tomoko Uchida: There isn't any plans for that, and I'm not

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-12 Thread Michael Wechner
日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but

Does Luke already support vector search or are there any plans to support vector search?

2021-07-06 Thread Michael Wechner
Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and

Re: Luke (Lucene 9): ls: ../analysis: No such file or directory

2021-07-04 Thread Michael Wechner
ah ok, thanks for explaining! MIchael Am 05.07.21 um 00:36 schrieb Tomoko Uchida: No. The gradle task is a shortcut to start luke (directly after source checkout) for developers, and the shell/bat scripts are still valid after packaging. Tomoko Tomoko 2021年7月5日(月) 4:18 Michael Wechner

Re: Luke (Lucene 9): ls: ../analysis: No such file or directory

2021-07-04 Thread Michael Wechner
:10 Michael Wechner : Hi I have built Lucene 9.0.0-SNAPSHOT locally (https://github.com/apache/lucene.git) and use the Lucene core libraries successfully. In order to introspect the Lucene index I tried to run Luke on Mac OS X sh bin/luke.sh (from within the luke directory: lucene/lucene/luke

  1   2   >