Re: Source Control

2012-10-28 Thread Michael Wechner
Am 27.10.12 01:10, schrieb Robert Muir: On Fri, Oct 26, 2012 at 7:02 PM, Mark Millermarkrmil...@gmail.com wrote: So, it's not everyone's favorite tool, but it sure seems to be the most popular tool. My main question is, is it really git thats popular, or github? if git would really bring

Re: Source Control

2012-10-28 Thread Michael Wechner
Am 28.10.12 10:57, schrieb Robert Muir: On Sun, Oct 28, 2012 at 2:59 AM, Michael Wechner michael.wech...@wyona.com wrote: I also had/have quite some trouble to get used to the git commandline, although or maybe because I used SVN commandline for many years, but I am very glad now to be using

Re: more sql-like commands for solr

2012-02-07 Thread Michael Wechner
Am 07.02.12 10:24, schrieb Li Li: hi all, we have used solr to provide searching service in many products. I found for each product, we have to do some configurations and query expressions. our users are not used to this. they are familiar with sql and they may describe like this: I

Re: more sql-like commands for solr

2012-02-07 Thread Michael Wechner
, Michael Wechner michael.wech...@wyona.com mailto:michael.wech...@wyona.com wrote: Am 07.02.12 10:24, schrieb Li Li: hi all, we have used solr to provide searching service in many products. I found for each product, we have to do some configurations and query

Re: more sql-like commands for solr

2012-02-07 Thread Michael Wechner
something from http://jsqlparser.sourceforge.net/ I have experimented it to check sql injections. On Tue, Feb 7, 2012 at 5:54 PM, Michael Wechner michael.wech...@wyona.com mailto:michael.wech...@wyona.com wrote: Am 07.02.12 10:43, schrieb Li Li: I just want solr providing this new feature

Running tests incrementally based on changed files

2014-12-02 Thread Michael Wechner
Hi I am trying to setup a testing environment, which is running tests incrementally based on changed files, instead of running whole builds. I would like to try this for Lucene/Solr, hence I am trying to understand better the relationship between individual code changes and associated tests. For

Re: Running tests incrementally based on changed files

2014-12-02 Thread Michael Wechner
Hi Shawn Thanks very much for your feedback. Please see my comments inline below Am 03.12.14 um 00:10 schrieb Shawn Heisey: On 12/2/2014 3:31 PM, Michael Wechner wrote: I am trying to setup a testing environment, which is running tests incrementally based on changed files, instead of running

Re: Running tests incrementally based on changed files

2014-12-02 Thread Michael Wechner
popularizers community: https://www.linkedin.com/groups?gid=6713853 On 3 December 2014 at 00:47, Michael Wechner michael.wech...@wyona.com wrote: Hi Shawn Thanks very much for your feedback. Please see my comments inline below Am 03.12.14 um 00:10 schrieb Shawn Heisey: On 12/2/2014 3:31 PM

Re: Running tests incrementally based on changed files

2014-12-03 Thread Michael Wechner
Am 03.12.14 um 07:26 schrieb Shawn Heisey: On 12/2/2014 10:47 PM, Michael Wechner wrote: I think you are right that in certain cases, like for example in the case of CompressingStoredFieldsFormat it makes a lot of sense to run the entire test suit (even multiple times considering

Re: Running tests incrementally based on changed files

2014-12-03 Thread Michael Wechner
Am 03.12.14 um 07:26 schrieb Shawn Heisey: On 12/2/2014 10:47 PM, Michael Wechner wrote: I think you are right that in certain cases, like for example in the case of CompressingStoredFieldsFormat it makes a lot of sense to run the entire test suit (even multiple times considering

Re: Running tests incrementally based on changed files

2014-12-04 Thread Michael Wechner
that is a great suggestion. Maybe the following failure of today http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/1927/ and in particular http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/1927/testReport/junit/org.apache.solr.cloud/OverseerStatusTest/testDistribSearch/ illustrates

what is the rule for updating CHANGES.txt

2014-12-06 Thread Michael Wechner
Hi Yesterday morning I have noticed the following changes - lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java - lucene/core/src/java/org/apache/lucene/index/MergePolicy.java - lucene/core/src/test/org/apache/lucene/index/TestConcurrentMergeScheduler.java and with these

Re: Minimum test set for idempotent changes in schema.xml

2014-12-06 Thread Michael Wechner
Hi David I do not really understand what you are suggesting. I guess Alex was talking about changes on files like for example ./solr/contrib/morphlines-core/src/test-files/solr/minimr/conf/schema.xml ./solr/contrib/uima/src/test-files/uima/solr/collection1/conf/schema.xml

Re: what is the rule for updating CHANGES.txt

2014-12-07 Thread Michael Wechner
be tested. It is a guard against jvm bugs like buggy G1GC collector. On Sat, Dec 6, 2014 at 11:25 PM, Michael Wechner michael.wech...@wyona.com wrote: Hi Yesterday morning I have noticed the following changes - lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java

Re: what is the rule for updating CHANGES.txt

2014-12-07 Thread Michael Wechner
are right, that is indeed difficult. Thanks for making this clear Michael On Sun, Dec 7, 2014 at 5:19 AM, Michael Wechner michael.wech...@wyona.com wrote: Hi Robert Thanks very much for your feedback. I guess you mean the following change: git diff 1f69438a17a0ad33bb9c209113b0fe83fb68f860

Re: what is the rule for updating CHANGES.txt

2014-12-07 Thread Michael Wechner
Am 07.12.14 um 14:31 schrieb Robert Muir: On Sun, Dec 7, 2014 at 5:19 AM, Michael Wechner michael.wech...@wyona.com wrote: About missing tests, I guess you experienced a buggy G1GC collector or other JVM bugs and hence you made this change, right? If so, I guess you will run the new code

Re: what is the rule for updating CHANGES.txt

2014-12-07 Thread Michael Wechner
as a developer have as knowledge in your brain and this way it will get documented and more accessible to others. Thanks Michael ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Sat, Dec 6, 2014 at 11:25 PM, Michael Wechner michael.wech

Re: what is the rule for updating CHANGES.txt

2014-12-07 Thread Michael Wechner
://www.linkedin.com/in/davidwsmiley On Sat, Dec 6, 2014 at 11:25 PM, Michael Wechner michael.wech...@wyona.com wrote: Hi Yesterday morning I have noticed the following changes - lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java - lucene/core/src/java/org/apache/lucene

Re: Parallel beast testing.

2014-12-12 Thread Michael Wechner
thanks very much for sharing the script. Just to make I understand correctly, there can be test failures because of timing issues of the test itself, but these are not actual functional failures, right? Or can you explain a bit more? Thanks Michael Am 12.12.14 um 05:17 schrieb Mark Miller:

Re: 2B tests

2015-03-16 Thread Michael Wechner
run them by default when you run ant test. Look for the @Monster annotation to see all of them... Mike McCandless http://blog.mikemccandless.com On Sun, Mar 15, 2015 at 11:42 PM, Michael Wechner michael.wech...@wyona.com wrote: what are the 2B tests? I guess the entry point is lucene

Re: 2B tests

2015-03-16 Thread Michael Wechner
what are the 2B tests? I guess the entry point is lucene/core/src/test/org/apache/lucene/index/Test2BTerms.java or where would you start to learn more about these tests? Thanks Michael Am 15.03.15 um 21:58 schrieb Michael McCandless: I confirmed 2B tests are passing on 4.10.x. Took 17

Re: Testing Lucene Solr using IBM JVM

2015-06-03 Thread Michael Wechner
Hi Mesbah You might want to have a look at http://jenkins.thetaphi.de/ HTH Michael Am 03.06.15 um 17:36 schrieb Mesbah Alam: Hi I work for the IBM JVM team. We are looking into the possibility to create a environment within IBM that mirrors the one used in the community to run

Re: Questions about Solr Search

2020-07-02 Thread Michael Wechner
Hi Gautam Please find my answers inline below Am 02.07.20 um 16:19 schrieb Gautam K: Dear Team, Hope you all are doing well. Can you please help with the following question? We are using Solr search in our Organisation and now checking whether Solr provides search capabilities like Google

Luke (Lucene 9): ls: ../analysis: No such file or directory

2021-06-17 Thread Michael Wechner
Hi I have built Lucene 9.0.0-SNAPSHOT locally (https://github.com/apache/lucene.git) and use the Lucene core libraries successfully. In order to introspect the Lucene index I tried to run Luke on Mac OS X sh bin/luke.sh (from within the luke directory: lucene/lucene/luke) but then I

Re: Moving usage documentation of Luke to Lucene Website

2021-05-12 Thread Michael Wechner
Hi Dmitry Sure, I think it is valuable to have this information available. All the best Michael Am 12.05.21 um 08:19 schrieb Dmitry Kan: Hi Michael, The page looks great, thanks for including the link to my fork with older releases! On Mon, 10 May 2021 at 13:24, Michael Wechner

Re: Moving usage documentation of Luke to Lucene Website

2021-05-12 Thread Michael Wechner
to me; thanks for considering my suggestions. It seems better to move the conversation to github; once you make a PR I'll take care of it. Tomoko 2021年5月10日(月) 19:24 Michael Wechner <mailto:michael.wech...@wyona.com>>: Hi Together I have made a draft of the README (includ

Re: Moving usage documentation of Luke to Lucene Website

2021-05-10 Thread Michael Wechner
mphasize specific 3rd party projects on our official website. > - Brief project history This section is already included in Luke itself - its "About" dialog. Tomoko 2021年5月2日(日) 15:20 Michael Wechner mailto:michael.wech...@wyona.com>>: I would like to sug

Re: Moving usage documentation of Luke to Lucene Website

2021-05-12 Thread Michael Wechner
, it looks fine to me; thanks for considering my suggestions. It seems better to move the conversation to github; once you make a PR I'll take care of it. Tomoko 2021年5月10日(月) 19:24 Michael Wechner <mailto:michael.wech...@wyona.com>>: Hi Together I have made a draft of t

Re: What is the status and what are the next steps re k-nn search?

2021-05-24 Thread Michael Wechner
<https://home.apache.org/~mikemccand/lucenebench/> (hmm, they haven't run for the past few nights ... I'll try to fix). Mike McCandless http://blog.mikemccandless.com <http://blog.mikemccandless.com> On Sun, May 23, 2021 at 7:39 AM Michael Wechner mailto:michael.wech...@wyona.com>> w

java 16 gradle not working

2021-05-23 Thread Michael Wechner
Hi I just installed OpenJDK 16 and tried to build Lucene, but received the following error * What went wrong: Could not compile settings file '/Users/michaelwechner/src/apache/lucene/settings.gradle'. > startup failed:   General error during semantic analysis: Unsupported class file major

Re: Text search in Arabic

2021-05-20 Thread Michael Wechner
Hi Mete You might also want to try the java-u...@lucene.apache.org mailing list https://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg Re languages other than english you might find more information at

What is the status and what are the next steps re k-nn search?

2021-05-23 Thread Michael Wechner
Hi If I understand correctly the source version of Lucene contains an HNSW implementation for k-nn search https://issues.apache.org/jira/browse/LUCENE-9004 and another algorithm based on coarse quantization is in development https://issues.apache.org/jira/browse/LUCENE-9322 Also there are

Re: java 16 gradle not working

2021-05-23 Thread Michael Wechner
<https://github.com/apache/lucene/blob/main/help/jvms.txt> D. On Sun, May 23, 2021 at 6:02 PM Michael Wechner mailto:michael.wech...@wyona.com>> wrote: Hi I just installed OpenJDK 16 and tried to build Lucene, but received the following error * What went wron

VectorField: double versus float

2021-05-24 Thread Michael Wechner
Hi From SentenceTransformer of sbert.net I receive double values for embeddinggs, whereas I noticed that VectorField only accepts float values https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/VectorField.java What is the reason, that VectorField only

Re: Luke (Lucene 9): ls: ../analysis: No such file or directory

2021-07-04 Thread Michael Wechner
:10 Michael Wechner : Hi I have built Lucene 9.0.0-SNAPSHOT locally (https://github.com/apache/lucene.git) and use the Lucene core libraries successfully. In order to introspect the Lucene index I tried to run Luke on Mac OS X sh bin/luke.sh (from within the luke directory: lucene/lucene/luke

Re: Luke (Lucene 9): ls: ../analysis: No such file or directory

2021-07-04 Thread Michael Wechner
ah ok, thanks for explaining! MIchael Am 05.07.21 um 00:36 schrieb Tomoko Uchida: No. The gradle task is a shortcut to start luke (directly after source checkout) for developers, and the shell/bat scripts are still valid after packaging. Tomoko Tomoko 2021年7月5日(月) 4:18 Michael Wechner

Does Luke already support vector search or are there any plans to support vector search?

2021-07-06 Thread Michael Wechner
Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but it worked in the end and

Re: Helping to update Lucene FAQ

2021-04-27 Thread Michael Wechner
require some exciting iterating :) Thank you David Smiley for pointing me in the right direction! Mike McCandless http://blog.mikemccandless.com <http://blog.mikemccandless.com> On Tue, Apr 20, 2021 at 12:38 PM Michael Wechner mailto:michael.wech...@wyona.com>> wrote: great!

Re: Helping to update Lucene FAQ

2021-04-27 Thread Michael Wechner
Am 27.04.21 um 10:10 schrieb Michael Wechner: Hi Mike It is working :-) great, thanks! To test I have updated the links of the first QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-HowdoIstartusingLucene? I understand that we would like to have links which go

Moving usage documentation of Luke to Lucene Website

2021-04-30 Thread Michael Wechner
Hi I just noticed that Luke became a Lucene Module https://github.com/DmitryKey/luke#important-notice https://mocobeta.medium.com/luke-become-an-apache-lucene-module-as-of-lucene-8-1-7d139c998b2 https://lucene.apache.org/core/8_8_2/luke/index.html Wouldn't it make sense to move the "Usage

Re: Moving usage documentation of Luke to Lucene Website

2021-05-01 Thread Michael Wechner
the app. https://github.com/apache/lucene/pull/120 <https://github.com/apache/lucene/pull/120> Tomoko 2021年5月1日(土) 8:15 Robert Muir <mailto:rcm...@gmail.com>>: please submit a PR if you have the time! On Fri, Apr 30, 2021 at 3:38 PM Michael Wechner mailto:michael.we

Re: Moving usage documentation of Luke to Lucene Website

2021-05-01 Thread Michael Wechner
ckport the documentation change into the branch for 8.x. Tomoko 2021年5月1日(土) 23:04 Michael Wechner <mailto:michael.wech...@wyona.com>>: I am lIttle confused now :-) where exactly are these changes now? If I understand correctly that PR has already been merged (https://gi

Re: Moving usage documentation of Luke to Lucene Website

2021-05-01 Thread Michael Wechner
59> Tomoko 2021年5月1日(土) 23:42 Michael Wechner mailto:michael.wech...@wyona.com>>: ah ok :-) Do you have an idea when 9.0 will be released approximately? Do you think it would still make sense to add some documentation as well to https://githu

Re: Moving usage documentation of Luke to Lucene Website

2021-05-02 Thread Michael Wechner
I think it would need a bit of work. https://issues.apache.org/jira/browse/LUCENE-9459 <https://issues.apache.org/jira/browse/LUCENE-9459> Tomoko 2021年5月1日(土) 23:42 Michael Wechner mailto:michael.wech...@wyona.com>>: ah ok :-)

Re: Moving usage documentation of Luke to Lucene Website

2021-04-30 Thread Michael Wechner
. 2021 kl. 08:41 skrev Michael Wechner : Hi I just noticed that Luke became a Lucene Module https://github.com/DmitryKey/luke#important-notice https://mocobeta.medium.com/luke-become-an-apache-lucene-module-as-of-lucene-8-1-7d139c998b2 https://lucene.apache.org/core/8_8_2/luke/index.html Wouldn't

Helping to update Lucene FAQ

2021-04-19 Thread Michael Wechner
Hi I am currently doing some work re FAQ in general and noticed that the following Lucene FAQ https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ are not quite up to date. I read that you are looking for help https://cwiki.apache.org/confluence/display/lucene and I would be happy

Re: Helping to update Lucene FAQ

2021-04-20 Thread Michael Wechner
McCandless http://blog.mikemccandless.com <http://blog.mikemccandless.com> On Mon, Apr 19, 2021 at 12:20 PM Michael Wechner mailto:michael.wech...@wyona.com>> wrote: Hi I am currently doing some work re FAQ in general and noticed that the following Lucene FAQ https:/

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-12 Thread Michael Wechner
日(火) 16:23 Michael Wechner : Hi I just created a Lucene vector search index with Lucene-9.0.0-SNAPSHOT based on train-v2.0.json of SQuAD (https://rajpurkar.github.io/SQuAD-explorer/), which are 86'831 QnAs (for the embedding I used SentenceBERT). It took a couple of hours on my Mac laptop, but

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
e to start (haven't tried to read indexes that includes vector values with Luke). The stack traces you might see should include full information to fix or improve it. Tomoko 2021年7月13日(火) 14:22 Michael Wechner : Am 13.07.21 um 04:22 schrieb Tomoko Uchida: There isn't any plans for that, and I'm not

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
xist, then the error message reads "No such directory" Or that the dropdown "Index Path" is checking whether the previously opened directories still exist. Thanks Michael Am 13.07.21 um 10:47 schrieb Michael Wechner: thanks again for your feeback! I will give it a try and

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
.message.index_opened_ro=Index successfully opened. (read-only) Thanks Michael Am 13.07.21 um 22:43 schrieb Michael Wechner: I analyzed the logs and the class/method lucene/luke/src/java/org/apache/lucene/luke/models/util/IndexUtils.java#openIndex(String, String) and realized that the pr

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-13 Thread Michael Wechner
Tomoko Uchida: We don't accept patches by email... please open a Jira. 2021年7月14日(水) 5:58 Michael Wechner : would the following patch make sense? git diff lucene/luke/src/ diff --git a/lucene/luke/src/java/org/apache/lucene/luke/app/IndexHandler.java b/lucene/luke/src/java/org/apache/lucene

Re: Not able to subscribe to Developer Lists

2021-07-29 Thread Michael Wechner
Hi Praveen I think you managed https://lists.apache.org/list.html?dev@lucene.apache.org and otherwise I would not have received this email :-) HTH Michael Am 29.07.21 um 09:15 schrieb Praveen Nishchal: Hi Dev Community, I have sent multiple emails to dev-subscr...@lucene.apache.org

Searching Lucene FAQ with Lucene

2021-12-20 Thread Michael Wechner
Hi I am working on a webapp called "Katie" in order to detect duplicated questions https://ukatie.com/ As a test case I have imported the Lucene FAQ https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ to https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en and made

Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Michael Wechner
Hi Patrick/Haoyu, congratulations! Am 19.12.21 um 21:05 schrieb Patrick Zhai: Thanks everyone! It's a great honor to become a lucene committer and thank you everyone for building such a friendly community and specially thank you to who has replied email/ commented on issues/ reviewed PRs

Re: Searching Lucene FAQ with Lucene

2021-12-21 Thread Michael Wechner
y, whereas the Katie frontend already provides this functionality https://ukatie.com/#/faq/9f206aec-5223-4e03-a2fc-c16e4b885ef8/en but I have to enhance the Javascript client used at https://lucene-faq.ukatie.com/ Thanks Michael On Mon, Dec 20, 2021 at 5:05 AM Michael Wechner wrote: Hi I

Re: Time to write an open-source book?

2021-11-17 Thread Michael Wechner
I think this would be great and I would be very happy to contribute. For example I am currently trying to understand how the autocomplete / auto suggest functionality of Lucene works and I could contribute my learnings. All the best Michael Am 16.11.21 um 20:49 schrieb Dongyu Xu: Hi Devs,

Article link at Lucene FAQ does not exist anymore

2021-11-22 Thread Michael Wechner
Hi The QnA https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowcanIindexXMLdocuments? is pointing to (See also this article Parsing, indexing, and searching XML with Digester and Lucene .)

VectorField renamed to KnnVectorField?

2021-11-01 Thread Michael Wechner
Hi In May 2021 I have done a Vector Search implementation based on Lucene 9.0.0-SNAPSHOT with the following code FieldType vectorFieldType = VectorField.createHnswType(vector.length, VectorValues.SimilarityFunction.DOT_PRODUCT,16,500); VectorField vectorField =new VectorField(VECTOR_FIELD,

Re: VectorField renamed to KnnVectorField?

2021-11-01 Thread Michael Wechner
)); +    return new TopDocScorer(this, context.reader().searchNearestVectors(field, vector, topK, null)); the indexing and searching works again :-) Thanks Michael Am 01.11.21 um 18:53 schrieb Michael Wechner: Hi In May 2021 I have done a Vector Search implementation based on Lucene

Re: VectorField renamed to KnnVectorField?

2021-11-02 Thread Michael Wechner
n> - Vigya On Mon, Nov 1, 2021 at 2:44 PM Michael Wechner mailto:michael.wech...@wyona.com>> wrote: I was able to update my code -    FieldType vectorFieldType = VectorField.createHnswType(vector.length, VectorValues.SimilarityFunction.DOT_PRODUCT, 16, 500); - 

Re: Welcome Julie Tibshirani to the Lucene PMC

2021-12-01 Thread Michael Wechner
great to hear, congratulations, Julie! Am 01.12.21 um 14:29 schrieb Mayya Sharipova: Congratulations, Julie ! Very well deserved! On Wed, Dec 1, 2021 at 2:45 PM Ignacio Vera wrote: Congratulations Julie! On Wed, Dec 1, 2021 at 10:03 AM Alan Woodward wrote:

Re: Article link at Lucene FAQ does not exist anymore

2021-12-15 Thread Michael Wechner
I have removed it now :-) Am 22.11.21 um 18:18 schrieb Michael Wechner: Hi The QnA https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowcanIindexXMLdocuments? is pointing to (See also this article Parsing, indexing, and searching XML with Digester and Lucene <h

Re: Does Luke already support vector search or are there any plans to support vector search?

2021-07-17 Thread Michael Wechner
Hi Tomoko Just noticed that you resolved the issue and also did some additional improvement :-) Thanks a lot! Michael Am 14.07.21 um 07:52 schrieb Michael Wechner: sure, I understand, but I just wanted to ask whether such a change makes sense actually. I have created a Jira ticket https

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-13 Thread Michael Wechner
Re the OpenAI embedding the following recent paper might be of interest https://arxiv.org/pdf/2201.10005.pdf (Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022) Thanks Michael Am 13.02.22 um 00:14 schrieb Michael Wechner: Here a concrete example where I combine OpenAI model

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-15 Thread Michael Wechner
calculations) are indeed the main factors to consider. Julie On Mon, Feb 14, 2022 at 1:02 PM Michael Wechner wrote: Hi Julie Thanks again for your feedback! I will do some more tests with "all-mpnet-base-v2" (768) and "all-roberta-large-v1" (1024), so 1024 is enough f

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-15 Thread Michael Wechner
y-mismatch-with-document-expansion.html Cheers -- Alessandro Benedetti Apache Lucene/Solr PMC member and Committer Director, R Software Engineer, Search Consultant www.sease.io <http://www.sease.io> On Tue, 15 Feb 2022 at 09:10, Michael Wechner wrote: fair enou

Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-11 Thread Michael Wechner
Hi Does anyone have experience using OpenAI embeddings in combination with Lucene vector search? https://beta.openai.com/docs/guides/embeddings| for example comparing performance re vector size ||https://api.openai.com/v1/engines/|||text-similarity-ada-001|/embeddings and

Re: How to Increase max vector size?

2022-02-15 Thread Michael Wechner
v: I don't think it makes sense to have a static variable maximum that you can change by calling a method. What purpose would it serve? On Tue, Feb 15, 2022, 2:39 PM Michael Wechner wrote: Hi Alessandro No, I have not created a Jira ticket, but I would be happy to create one, just l

Re: Generate autocomplete predictions

2022-03-13 Thread Michael Wechner
Hi Michael, This sounds like a good fit for Lucene to me. On Fri, Mar 11, 2022 at 11:15 PM Michael Wechner wrote: Hi I recently implemened auto-suggest based on https://lucene.apache.org/core/9_0_0/suggest/index.html whereas I am currently managing the terms / predictions (e.g. "autocompleti

Re: Generate autocomplete predictions

2022-03-14 Thread Michael Wechner
can help show how things tie together. I guess the thing you want to avoid is to spend hours on the prototype but otherwise either is fine. Le dim. 13 mars 2022, 23:01, Michael Wechner a écrit : Hi Adrien Thanks for your feedback!  From a "project management" point o

Generate autocomplete predictions

2022-03-11 Thread Michael Wechner
Hi I recently implemened auto-suggest based on https://lucene.apache.org/core/9_0_0/suggest/index.html whereas I am currently managing the terms / predictions (e.g. "autocompletion using lucene suggesters dev") contained by the index manually. I would like now to generate the terms /

Re: How to Increase max vector size?

2022-02-17 Thread Michael Wechner
available under the Apache license. Thanks Michael Am 16.02.22 um 19:51 schrieb Michael Sokolov: Fair enough - are you planning to offer such a service;) sounds exciting! -Mike On Tue, Feb 15, 2022 at 6:00 PM Michael Wechner wrote: true :-) when you are the one controlling the input

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-14 Thread Michael Wechner
ault a bit and would be happy to discuss if you'd like to file a JIRA issue. However 12288 dimensions still seems high to me, this is much larger than most well-established embedding models and could require a lot of memory. Julie On Mon, Feb 14, 2022 at 12:08 PM Michael Wechner wrote:

Re: Lucene 9.1 release soon?

2022-02-23 Thread Michael Wechner
I think this would be great :-) thank you very much for your efforts! Michael Am 24.02.22 um 00:28 schrieb Julie Tibshirani: Hello everyone, Would there be support for releasing Lucene 9.1 soon? It has been ~2.5 months since 9.0 was released and we already have a long list of new features,

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-15 Thread Michael Wechner
the door open to alternative algorithms that might have better performance. It would be great if Lucene would provide alternative algorithms in the future and one can choose the algorithm based on one's requirements Thanks Michael On Tue, Feb 15, 2022 at 12:21 PM Michael Wechner wrote

Re: How to Increase max vector size?

2022-02-15 Thread Michael Wechner
-- Alessandro Benedetti Apache Lucene/Solr PMC member and Committer Director, R Software Engineer, Search Consultant www.sease.io <http://www.sease.io> On Sat, 12 Feb 2022 at 22:53, Michael Wechner wrote: Hi I just tried to test the OpenAI model

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-12 Thread Michael Wechner
andro Benedetti: Hi Michael, experience to what extent? We have been exploring the area for a while given we contributed the first neural search milestone to Apache Solr. What is your curiosity? Performance? Relevance impact? How to integrate it? Regards On Fri, 11 Feb 2022, 22:38 Michael Wechner,

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-14 Thread Michael Wechner
On Sun, Feb 13, 2022 at 1:55 AM Michael Wechner wrote: Re the OpenAI embedding the following recent paper might be of interest https://arxiv.org/pdf/2201.10005.pdf (Text and Code Embeddings by Contrastive Pre-Training, Jan 24, 2022) Thanks Michael Am 13.02.22 um

How to Increase max vector size?

2022-02-12 Thread Michael Wechner
Hi I just tried to test the OpenAI model "text-similarity-davinci-001" with 12288 dimensions and receive the following error java.lang.IllegalArgumentException: vector numDimensions must be <= VectorValues.MAX_DIMENSIONS (=1024); got 12288     at

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-12 Thread Michael Wechner
as "understanding", but I hope it makes it clearer what I am looking for :-) Thanks Michael Am 12.02.22 um 22:38 schrieb Michael Wechner: Hi Alessandro I am mainly interested in detecting similarity, for example whether the following two sentences are similar resp. likely to mean the

Re: Experience re OpenAI embeddings in combination with Lucene vector search

2022-02-20 Thread Michael Wechner
   What was your age last year? 0.4658360947506338      What is your age? 0.4859953687958164        How old are you? So both models do not "understand" the question. As Alessandro suggested a "well-curated fine-tuning step" might improve this, whereas I have not been able to try thi

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner
Hi Together You might be interesed in this paper / article https://arxiv.org/abs/2308.14963 Thanks Michael - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Quantization for vector search

2023-11-04 Thread Michael Wechner
Thanks! Ben On Sat, Nov 4, 2023 at 4:07 AM Michael Wechner wrote: Hi If I understand correctly some devs are working on introducing quantization for vector search or at least considering it https://github.com/apache/lucene/issues/12497 Just being curious what

Quantization for vector search

2023-11-04 Thread Michael Wechner
Hi If I understand correctly some devs are working on introducing quantization for vector search or at least considering it https://github.com/apache/lucene/issues/12497 Just being curious what is the status on this resp. is somebody working on this actively? It came to my mind, because

Multimodal search

2023-10-12 Thread Michael Wechner
Hi Did anyone of the Lucene committers consider making Lucene multimodal? With a quick Google search I found for example https://dl.acm.org/doi/abs/10.1145/3503161.3548768 https://sigir-ecom.github.io/ecom2018/ecom18Papers/paper7.pdf Thanks Michael

Update TermInSetQuery Example?

2023-10-20 Thread Michael Wechner
Hi I recently found TermInSetQuery example at https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/TermInSetQuery.html but if I understand correctly one should use now BooleanQuery.Builder instead BooleanQuery itself, right? BooleanQuery.Builder bqb = new

Re: Update TermInSetQuery Example?

2023-10-21 Thread Michael Wechner
be fixed. There may be other combinations not detected because of source code formatting. Uwe Am 20.10.2023 um 23:46 schrieb Michael Wechner: Hi I recently found TermInSetQuery example at https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/TermInSetQuery.html but if I

Re: Multimodal search

2023-10-13 Thread Michael Wechner
eally need to be multimodal? Wherever the embeddings come from, Lucene can index the vectors and combine with textual queries, right? Thanks, Froh On Thu, Oct 12, 2023 at 12:59 PM Michael Wechner wrote: Hi Did anyone of the Lucene committers consider making Lucene multimodal? Wi

Re: Multimodal search

2023-10-15 Thread Michael Wechner
lts there have been techniques like re-ranking that are very popular. Thanks Navneet On Fri, Oct 13, 2023 at 8:53 AM Michael Wechner wrote: Thanks for your feedback and the link to the OpenSearch implementation! I think the embedding approach as it exists today is not and

Re: Multimodal search

2023-10-16 Thread Michael Wechner
don't think this really addresses the problem of accuracy at its core. Thanks Michael Am 15.10.23 um 21:05 schrieb Michael Wechner: Hi Navneet I also observe that various "vector search DBs" are implementing hybrid search, because the accuracy with embeddings is often not go

Re: call for 9.4.1 release (bug in vectors format)

2022-10-18 Thread Michael Wechner
+1 :-) Thanks Michael Am 18.10.22 um 19:52 schrieb Julie Tibshirani: Hi everyone, We recently discovered a severe bug in the 9.4 release in the kNN vectors format: https://github.com/apache/lucene/issues/11858. Explaining the problem: when ingesting a lot of data, or when performing a

Re: Raising the Value of MAX_DIMENSIONS of Vector Values

2022-10-20 Thread Michael Wechner
of the migration -- then we'll discuss once it's been moved to GitHub!) Julie On Mon, Aug 8, 2022 at 10:05 PM Michael Wechner wrote: I agree that Lucene should support vector sizes depending on the model one is choosing. For example Weaviate seems to do this https://weaviate.slack.com

Re: Raising the Value of MAX_DIMENSIONS of Vector Values

2022-08-08 Thread Michael Wechner
I agree that Lucene should support vector sizes depending on the model one is choosing. For example Weaviate seems to do this https://weaviate.slack.com/archives/C017EG2SL3H/p1659981294040479 Thanks Michael Am 07.08.22 um 22:48 schrieb Marcus Eagan: Hi Lucene Team, In general, I have

Re: Lucene 9.5.0 release

2023-01-21 Thread Michael Wechner
I tried to understand the issue described on github, but unfortunately do not really understand it. Can you explain a little more? Thanks Michael Am 21.01.23 um 11:00 schrieb Alessandro Benedetti: Hi, this would be nice to have in 9.5 : https://github.com/apache/lucene/issues/12099 It's

Re: Lucene 9.5.0 release

2023-01-23 Thread Michael Wechner
shortly? Thanks Luca On Sat, Jan 21, 2023 at 11:41 AM Michael Wechner wrote: I tried to understand the issue described on github, but unfortunately do not really understand it. Can you explain a little more? Thanks Michael Am 21.01.

Re: Release Lucene 9.4.2

2022-11-09 Thread Michael Wechner
Thank you! +1 :-) Am 09.11.22 um 16:38 schrieb Adrien Grand: Hello all, A bad integer overflow has been discovered in the KNN vectors format, which affects segments that have more than ~16M vectors. I'd like to do a bugfix release when the bug

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-12 Thread Michael Wechner
em for any purpose you like. best wishes, Kent Fitch On Wed, Apr 12, 2023 at 4:37 PM Michael Wechner wrote: thank you very much for your feedback! In a previous post (April 7) you wrote you could make availlable the 47K ada-002 vectors, which would be great! Would it

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-21 Thread Michael Wechner
Michael Am 13.04.23 um 07:58 schrieb Michael Wechner: Hi Kent Great, thank you very much! Will download it later today :-) All the best Michael Am 13.04.23 um 01:35 schrieb Kent Fitch: Hi Michael (and anyone else who wants just over 240K "real world" ada-002 vectors of dime

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-21 Thread Michael Wechner
yes, they are, whereas it should help us to test performance and scalability :-) Am 21.04.23 um 09:24 schrieb Ishan Chattopadhyaya: Seems like they were all 768 dimensions. On Fri, 21 Apr, 2023, 11:48 am Michael Wechner, wrote: Hi Together Cohere just published approx. 100Mio

  1   2   >