Re: HNSW search with threshold

2022-11-07 Thread Alexey Gorlenko
Thanks, Michael! Yes, I will try. вт, 8 нояб. 2022 г. в 03:31, Michael Sokolov : > +1 to adding a scoring threshold. I think it could be another > parameter to KnnVectorQuery. Do you want to have a try at adding this? > If so, please feel free to open a PR and I will be happy to guide you. > >

Re: HNSW search with threshold

2022-11-07 Thread Michael Sokolov
+1 to adding a scoring threshold. I think it could be another parameter to KnnVectorQuery. Do you want to have a try at adding this? If so, please feel free to open a PR and I will be happy to guide you. On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko wrote: > > Hi! > > There are some use cases

Re: [VOTE] Release PyLucene 9.4.1-rc3

2022-11-07 Thread Andi Vajda
This vote has passed. Thank you all who voted ! The PyLucene 9.4.1 is now available (or when the download mirrors show it). Andi.. On Tue, 1 Nov 2022, Andi Vajda wrote: The PyLucene 9.4.1 (rc3) release tracking the recent release of Apache Lucene 9.4.1 is ready. A release candidate is

Re: [VOTE] Release PyLucene 9.4.1-rc3

2022-11-07 Thread Dawid Weiss
+1 to release, thanks Andi! Dawid On Tue, Nov 1, 2022 at 9:37 PM Andi Vajda wrote: > > The PyLucene 9.4.1 (rc3) release tracking the recent release of > Apache Lucene 9.4.1 is ready. > > A release candidate is available from: >

Re: [VOTE] Release PyLucene 9.4.1-rc3

2022-11-07 Thread Nelia Vb
+1 On Tue, 1 Nov 2022, 21:37 Andi Vajda, wrote: > > The PyLucene 9.4.1 (rc3) release tracking the recent release of > Apache Lucene 9.4.1 is ready. > > A release candidate is available from: > https://dist.apache.org/repos/dist/dev/lucene/pylucene/9.4.1-rc3/ > > PyLucene 9.4.1 is built with

HNSW search with threshold

2022-11-07 Thread Alexey Gorlenko
Hi! There are some use cases where we need to find vectors with the distance (by some metric) to the given vector V less than the given threshold T. That task is very similar to the knn problem, but in this case we don't have a quantity of the nearest neighbours *k*. As I see, the current

Re: [VOTE] Release PyLucene 9.4.1-rc3

2022-11-07 Thread Michael McCandless
+1 to release. Sorry for taking so long! I ran my usual smoke test -- index first 100 K docs from an English Wikipedia export, force merge and run a couple queries. One quirk during installation is I had to add "with_modern_setuptools = True" in JCC's setup.py before all the logic for

Re: Dense union of doc IDs

2022-11-07 Thread LuXugang
+1 If we would have a new BulkAdder and it could detect long runs of set bits, It also could be at least used in LRUQueryCache to cache part dense docs instead of always building a huge BitSet by maxDoc? Xugang https://www.amazingkoala.com.cn > On Nov 4, 2022, at 08:15, Michael Froh

Re: Dense union of doc IDs

2022-11-07 Thread Adrien Grand
I just found the branch I had used to play with this idea: https://github.com/apache/lucene/compare/main...jpountz:lucene:bulk_collection, in case you're interested in having a look. There are a few changes on DefaultBulkScorer and LeafCollector too because I was also interested in other ways to

Re: Dense union of doc IDs

2022-11-07 Thread Adrien Grand
I have been thinking about a similar feature for conjunctions and negations. When you have many low-cardinality fields, a good way to speed up queries on these fields is to configure an index sort on these fields. This automatically creates large gaps in postings between long runs of ones, and

JDK 20 EAb22, ZenGC EA builds, JavaFX 20 EAb5 and several heads-ups!

2022-11-07 Thread David Delabassee
Greetings, With JavaOne in Las Vegas, last month was epically busy! It was great to finally have the ability to meet and discuss the Quality Outreach program with some of you... face-to-face! This installment of the newsletter is packed as we have several heads-ups, including new