Should segment order impact hit count estimation?

2021-05-21 Thread Patrick Zhai
Hi folks For the past few weeks I've been working with Mike McCandless to use the recent introduced IndexRearranger to replace the old way of guarantee deterministic index -- using a single index thread and a LogDocMergePolicy. In the progress we found out that with two concurrently built but

Re: Welcome Greg Miller as Lucene committer

2021-05-30 Thread Patrick Zhai
Congrats Greg! Patrick On Sun, May 30, 2021, 20:26 Yonik Seeley wrote: > Congrats Greg! > > -Yonik > > On Sat, May 29, 2021 at 3:47 PM Adrien Grand wrote: > >> I'm pleased to announce that Greg Miller has accepted the PMC's >> invitation to become a committer. >> >> Greg, the tradition is

Re: Can QueryParser parse regexp query?

2021-09-13 Thread Patrick Zhai
ser/classic/package-summary.html#Regexp_Searches > > On Mon, Sep 13, 2021 at 1:22 PM Patrick Zhai wrote: > > > > Hi folks, > > I'm currently trying to benchmark some regexp queries and found that the > default QueryParser (o.a.l.queryparser.classic.QueryPars

Can QueryParser parse regexp query?

2021-09-13 Thread Patrick Zhai
Hi folks, I'm currently trying to benchmark some regexp queries and found that the default QueryParser (o.a.l.queryparser.classic.QueryParser) that luceneutil is using is not able to parse regexps such as: "[abcde]+" or "a{1,5}". The current workaround is to add another "if" logic in the

Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Patrick Zhai
Thanks everyone! It's a great honor to become a lucene committer and thank you everyone for building such a friendly community and specially thank you to who has replied email/ commented on issues/ reviewed PRs related to my work. It is an enjoyable experience working with lucene community and

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-16) - Build # 31986 - Unstable!

2021-12-20 Thread Patrick Zhai
My bad, I should have added 1 to the alphabet size, I assumed it was exclusive. Here's PR fixing it: https://github.com/apache/lucene/pull/559 Policeman Jenkins Server 于2021年12月20日周一 21:34写道: > Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/31986/ > Java: 64bit/jdk-16

Re: Taxonomy backward compatibility tests

2021-11-02 Thread Patrick Zhai
Hey Adrien, I'm making a change that let taxonomy to use NumericDocValues instead of term positions to store the parent array: https://github.com/apache/lucene/pull/420. It'll also need to pass the backward compatibility check, so for me, it seems it's better to keep it. And if I can make it work

Re: Taxonomy backward compatibility tests

2021-11-03 Thread Patrick Zhai
e to carry over the backward compatibility logic in 10.x, so I'm > inclined to allow merging it if you can get the change merged in the next > couple of days while I'm focusing on the 8.7 release. > > On Tue, Nov 2, 2021 at 6:57 PM Patrick Zhai wrote: > >> Hey Adrien, >> I'

Re: [VOTE] Release Lucene 9.1.0 RC2

2022-03-18 Thread Patrick Zhai
Hi Julie, thanks for summarizing the issues, for 4, as far as I understand, there's nothing that needs to be changed on lucece's end. We're still having trouble running the smoke test on Amazon's machine, but I've tried it on my own machine and it passed. So here's my +1. SUCCESS!

Re: Welcome Guo Feng as Lucene committer

2022-01-25 Thread Patrick Zhai
Welcome! On Tue, Jan 25, 2022, 07:33 Gus Heck wrote: > Welcome! > > On Tue, Jan 25, 2022 at 9:57 AM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Welcome Feng! >> >> Mike >> >> On Tue, Jan 25, 2022 at 4:09 AM Adrien Grand wrote: >> >>> I'm pleased to announce that Guo Feng has

Re: [JENKINS] Lucene » Lucene-Check-main - Build # 10242 - Unstable!

2023-09-16 Thread Patrick Zhai
I can reproduce it, opened: https://github.com/apache/lucene/issues/12562 Also not sure why the build shows SUCCESSFUL even the test failed... On Sat, Sep 16, 2023 at 4:52 PM Apache Jenkins Server < jenk...@builds.apache.org> wrote: > Build: >

Lucene 9.8 Release

2023-09-12 Thread Patrick Zhai
Hi all, It's been a while since the last release and we have quite a few good changes including new APIs, improvements and bug fixes. Should we release the 9.8? If there's no objections I volunteer to be the release manager and will cut the feature branch a week from now, which is Sep. 18th PST.

Re: Weird HNSW merge performance result

2023-10-11 Thread Patrick Zhai
was a bug where > `forceMerge` was not actually using your configured maxConn & beamWidth. > See: https://github.com/mikemccand/luceneutil/pull/232 > > Do you have that commit and rebuilt the KnnGraphTester? > > On Wed, Oct 11, 2023 at 10:10 AM Patrick Zhai wrote: > >>

Re: Lucene 9.8 Release

2023-09-22 Thread Patrick Zhai
ther Lucene users may > find nice wins with this change (I also added a note to that PR). > > Cheers, > -g > > On Thu, Sep 21, 2023 at 10:50 PM Patrick Zhai wrote: > >> Thank you Uwe! >> >> On Thu, Sep 21, 2023 at 3:27 PM Uwe Schindler wrote: >> >>&

Re: [JENKINS] Lucene-MMAPv2-Windows (64bit/hotspot/jdk-21-rc) - Build # 801 - Still Unstable!

2023-09-18 Thread Patrick Zhai
Thanks Uwe and Adrien, I haven't cut the branch, will wait for this fix then On Mon, Sep 18, 2023 at 9:28 AM Adrien Grand wrote: > Thanks Uwe for digging. The fork-join pool is optional, I will change > the test to use a ByteBuffersDirectory. > > On Mon, Sep 18, 2023 at 6:15 PM Uwe Schindler

Re: Lucene 9.8 Release

2023-09-21 Thread Patrick Zhai
peedup that nightly benchmarks reported, and > moved this section first as I suspect that users would be especially > interested in these speedups. > > Out of curiosity, do you know when you plan on creating a release > candidate? > > On Thu, Sep 21, 2023 at 7:40 AM Patrick Zhai wrote:

Re: Lucene 9.8 Release

2023-09-21 Thread Patrick Zhai
Thank you Uwe! On Thu, Sep 21, 2023 at 3:27 PM Uwe Schindler wrote: > Hi, > > I also enabled Jenkins jobs for the 9.8 branch today (a bit late, sorry). > See https://jenkins.thetaphi.de for the randomized jobs. > > Uwe > Am 21.09.2023 um 19:05 schrieb Patrick Zhai: > &

[VOTE] Release Lucene 9.8.0 RC1

2023-09-21 Thread Patrick Zhai
Please vote for release candidate 1 for Lucene 9.8.0 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-9.8.0-RC1-rev-d914b3722bd5b8ef31ccf7e8ddc638a87fd648db You can run the smoke tester directly with this command: python3 -u

New branch and feature freeze for Lucene 9.8.0

2023-09-20 Thread Patrick Zhai
NOTICE: Branch branch_9_8 has been cut and versions updated to 9.9 on stable branch. Please observe the normal rules: * No new features may be committed to the branch. * Documentation patches, build patches and serious bug fixes may be committed to the branch. However, you should submit all

[ANNOUNCE] Apache Lucene 9.8.0 released

2023-09-28 Thread Patrick Zhai
The Lucene PMC is pleased to announce the release of Apache Lucene 9.8.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting,

[RESULT] [VOTE] Release Lucene 9.8.0 RC1

2023-09-27 Thread Patrick Zhai
It's been >72h (plus 2 weekend days) since the vote was initiated and the result is: +1 11 (9 binding) 0 0 -1 0 This vote has PASSED

Re: Lucene 9.8 Release

2023-09-18 Thread Patrick Zhai
Update: Will wait https://github.com/apache/lucene/pull/12568 to be merged to cut the branch On Mon, Sep 18, 2023 at 11:00 AM Michael Sokolov wrote: > +1 for a release soon, and thanks for volunteering, Patrick! > > On Tue, Sep 12, 2023 at 2:08 AM Patrick Zhai wrote: > > >

Re: Override Analyzer.TokenStreamComponents's reader.

2023-09-17 Thread Patrick Zhai
Hi MyCoy, according to MIGRATE file """ ### TokenStreamComponents is now final Instead of overriding `TokenStreamComponents.setReader()` to customise analyzer initialisation, you should now pass a `Consumer` instance to the

Re: Lucene 9.8 Release

2023-09-20 Thread Patrick Zhai
t; On Tue, Sep 19, 2023 at 6:22 AM Patrick Zhai wrote: > > > > Update: > > Will wait https://github.com/apache/lucene/pull/12568 to be merged to > cut the branch > > > > > > On Mon, Sep 18, 2023 at 11:00 AM Michael Sokolov > wrote: > >> >

Weird HNSW merge performance result

2023-10-10 Thread Patrick Zhai
Hi folks, I was running the HNSW benchmark today and found some weird results. Want to share it here and see whether people have any ideas. The set up is: the 384 dimension vector that's available in luceneutil, 100k documents. And lucene main branch. max_conn=64, fanout=0, beam_width=250 I

Re: [JENKINS] Lucene-main-Windows (64bit/hotspot/jdk-17.0.5) - Build # 13300 - Unstable!

2023-10-13 Thread Patrick Zhai
I opened: https://github.com/apache/lucene/pull/12678/files to fix it On Fri, Oct 13, 2023 at 8:02 PM Policeman Jenkins Server < jenk...@thetaphi.de> wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-main-Windows/13300/ > Java: 64bit/hotspot/jdk-17.0.5 -XX:-UseCompressedOops

Re: Weird HNSW merge performance result

2023-10-11 Thread Patrick Zhai
as adding vectors to the graph gets more and more expensive as the size of > the graph increases. > > Le mer. 11 oct. 2023, 05:07, Patrick Zhai a écrit : > >> Hi folks, >> I was running the HNSW benchmark today and found some weird results. Want >> to share it her

Re: [VOTE] Release Lucene 9.1.0 RC2

2022-03-18 Thread Patrick Zhai
Hi I tried to run the smoke test command on an EC2 instance, and I got several failures like the one below: Exception in thread "Attach Listener" Agent failed to start! > > org.apache.lucene.index.Test2BPostings > classMethod FAILED > java.lang.AssertionError: The test or suite printed 48646

Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 45493 - Unstable!

2023-11-10 Thread Patrick Zhai
It's caused by the PR I just merged and reproduce for me, will dig into it today. On Fri, Nov 10, 2023, 04:38 Policeman Jenkins Server wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45493/ > Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:metronome > > 1 tests

Re: [JENKINS] Lucene-main-Linux (64bit/hotspot/jdk-19) - Build # 45496 - Unstable!

2023-11-10 Thread Patrick Zhai
Fix: https://github.com/apache/lucene/pull/12793 On Fri, Nov 10, 2023 at 10:34 AM Policeman Jenkins Server < jenk...@thetaphi.de> wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/45496/ > Java: 64bit/hotspot/jdk-19 -XX:-UseCompressedOops -XX:+UseSerialGC > > 1 tests failed. >

Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 45493 - Unstable!

2023-11-10 Thread Patrick Zhai
Fix: https://github.com/apache/lucene/pull/12793 On Fri, Nov 10, 2023 at 7:34 AM Patrick Zhai wrote: > It's caused by the PR I just merged and reproduce for me, will dig into it > today. > > On Fri, Nov 10, 2023, 04:38 Policeman Jenkins Server > wrote: > >> Build: htt

Re: [VOTE] Release Lucene 9.2.0 RC1

2022-05-18 Thread Patrick Zhai
+1 SUCCESS! [1:12:17.482213] Thank you Alan! On Wed, May 18, 2022 at 4:31 PM Julie Tibshirani wrote: > +1 SUCCESS! [0:57:28.654665] > > Thanks Alan! > > On Wed, May 18, 2022 at 1:51 PM Anshum Gupta > wrote: > >> +1 SUCCESS! [0:59:19.002348] >> >> On Wed, May 18, 2022 at 5:59 AM Alan Woodward

Re: [VOTE] Migration to GitHub issue from Jira (LUCENE-10557)

2022-05-30 Thread Patrick Zhai
Thank you Tomoko for starting the vote, although I didn't participate in the last discussion but I'd love to see us moving towards the github issue. So here's my +1 (committer, non-PMC) BTW, by "the vote will be effective if it successfully gains more than 15% of voters (>= 15) from committers",

Re: Welcome Chris Hegarty as Lucene committer

2022-06-01 Thread Patrick Zhai
Welcome Chris! Patrick On Wed, Jun 1, 2022, 08:06 Tomoko Uchida wrote: > Congratulations and welcome, Chris! > > Tomoko > > > 2022年6月1日(水) 23:17 Nhat Nguyen : > >> Welcome, Chris! >> >> On Wed, Jun 1, 2022 at 8:49 AM Greg Miller wrote: >> >>> Welcome Chris! >>> >>> On Wed, Jun 1, 2022 at 2:04

Re: Welcome Lu Xugang as Lucene committer

2022-06-01 Thread Patrick Zhai
Welcome Xugang! Patrick On Wed, Jun 1, 2022, 09:07 Lu Xugang wrote: > Thanks for your note, Tomoko, the latest signature could be used as my > name:) > > Xugang > > > > > On Jun 1, 2022, at 23:28, Tomoko Uchida > wrote: > > > > Congratulations and welcome, Lu Xugang! > > > > (Can we call you

Re: Adding a new PointDocValuesField

2022-05-24 Thread Patrick Zhai
As pointed out by Rob in the issue I would also suggest to start with the simple > separate-numeric-docvalues-fields case and use similar logic as the > org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc I think that's a preferable solution to me, because: 1. It does not

Re: Adding a new PointDocValuesField

2022-05-23 Thread Patrick Zhai
Hi Marc Thank you for starting the discussion, I think all your points make sense, but I'm wondering if we really need everything packed into one field? And what are the advantages of doing that? I *think* most of the facet related use cases can be satisfied using multiple fields, one field per

Re: Adding a new PointDocValuesField

2022-05-25 Thread Patrick Zhai
> BDV entry. This is where building on BDV started to feel a little icky > to me and it seemed like it might be a good use-case for actually > formalizing a format/encoding, but again, no strong preference. We > could certainly do something more quickly on top of BDV and formalize >

Re: [VOTE] Release Lucene 9.2.0 RC2

2022-05-19 Thread Patrick Zhai
+1 SUCCESS! [1:16:55.786014] On Thu, May 19, 2022 at 4:34 PM Julie Tibshirani wrote: > +1 SUCCESS! [0:59:11.553919] > > On Thu, May 19, 2022 at 3:30 PM Michael Gibney > wrote: > >> +1 (non-binding, java 11 and 17) >> SUCCESS! [2:01:03.089817] >> >> On Thu, May 19, 2022 at 4:24 PM Adrien Grand

Re: Welcome Greg Miller to the Lucene PMC

2022-06-07 Thread Patrick Zhai
Congrats Greg! Patrick On Tue, Jun 7, 2022, 07:53 Julie Tibshirani wrote: > Congratulations Greg!! > > On Tue, Jun 7, 2022 at 7:22 AM Ignacio Vera wrote: > >> Welcome Greg! >> >> On Tue, Jun 7, 2022 at 1:40 PM 陆徐刚 wrote: >> >>> Congratulations, Greg. >>> >>> Xugang >>> >>> > 在

Re: [JENKINS] Lucene-MMAPv2-Windows (64bit/jdk-17.0.3) - Build # 131 - Unstable!

2022-07-25 Thread Patrick Zhai
Seems the error was thrown from HandleLimitFS.java, which enforce an artificial limitation on the number of files opened it seems. But the weird thing is why I only see such errors on Windows builds? I'll try to dig more on this error. On Sun, Jul 24, 2022 at 8:41 PM Policeman Jenkins Server <

Re: [VOTE] Release Lucene 9.3.0 RC1

2022-07-25 Thread Patrick Zhai
+1 (non-binding) SUCCESS! [0:50:34.448483] On Mon, Jul 25, 2022 at 12:26 PM Julie Tibshirani wrote: > +1 SUCCESS! [0:52:18.736747] > > Thanks Ignacio for being release manager! > > Julie > > On Mon, Jul 25, 2022 at 7:53 AM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> +1 >> >>

Re: Welcome Vigya Sharma as Lucene committer

2022-07-28 Thread Patrick Zhai
Welcome! On Thu, Jul 28, 2022, 11:41 Gus Heck wrote: > Welcome! > > On Thu, Jul 28, 2022 at 11:24 AM Julie Tibshirani > wrote: > >> Congratulations Vigya! >> >> On Thu, Jul 28, 2022 at 6:34 AM Mayya Sharipova >> wrote: >> >>> Congratulations and welcome Vigya! >>> >>> >>> On Thu, Jul 28, 2022

Re: [JENKINS] Lucene-MMAPv2-Windows (64bit/jdk-17.0.3) - Build # 131 - Unstable!

2022-07-28 Thread Patrick Zhai
, and thus reaches the limit. I'll try increase the limit and see how it goes On Mon, Jul 25, 2022 at 10:35 AM Patrick Zhai wrote: > Seems the error was thrown from HandleLimitFS.java, which enforce an > artificial limitation on the number of files opened it seems. But the weird > thing

Re: automaton incremental updates

2022-05-05 Thread Patrick Zhai
Hi I'm not sure if I understood your question correctly, but a normal way lucene update an existing automaton is to intersect or union it with another one (the new one with the updated data), those operations' code are collected here:

Idea about faster vector format merge

2022-10-18 Thread Patrick Zhai
Hi Folks I've talked with Mike Sokolov and learnt some KNN knowledge from him (thank you!) during ApacheCon and one thing I learnt was that our KNN implementation was kind of suffering from long merging time because we currently rebuild the graph from scratch every time we merge. I noticed

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-23 Thread Patrick Zhai
(non-binding) +1 SUCCESS! [1:11:00.934249] On Fri, Sep 23, 2022 at 9:44 AM Vigya Sharma wrote: > The smoke tests passed for me too.. > > > (no vote) > > SUCCESS! [1:12:31.588303] > > On Thu, Sep 22, 2022 at 2:27 AM Ignacio Vera wrote: > >> +1 >> >> >> SUCCESS! [0:46:00.508949] >> >> On Thu,

Re: Code coverage check for PRs

2022-10-04 Thread Patrick Zhai
logic. > > Coverage report is just a tool to help us and the moment we do stupid > > shit like that, is the moment people start gaming it just to make the > > build pass. > > > > On Mon, Oct 3, 2022 at 10:57 PM Patrick Zhai wrote: > > > > > > Hi folks, &g

Re: Code coverage check for PRs

2022-10-05 Thread Patrick Zhai
we should stick with jacoco and not some commercial stuff for > measuring coverage. Jacoco works great. We just have to put the > reports or stats somewhere useful. > > On Tue, Oct 4, 2022 at 5:45 PM Patrick Zhai wrote: > > > > Hi Robert, thank you for commenting, yeah the fu

Re: Welcome Luca Cavanna as Lucene committer

2022-10-05 Thread Patrick Zhai
Welcome! On Wed, Oct 5, 2022, 13:54 Martin Gainty wrote: > welcome Luca! > -- > *From:* David Smiley > *Sent:* Wednesday, October 5, 2022 2:34 PM > *To:* dev@lucene.apache.org > *Cc:* cavannal...@gmail.com > *Subject:* Re: Welcome Luca Cavanna as Lucene committer

Code coverage check for PRs

2022-10-03 Thread Patrick Zhai
Hi folks, I'm not sure whether people have already discussed this but I'm wondering whether we want to add a workflow that pulls out the code coverage whenever a PR was created? It should be easier for both the reviewers and the contributors to figure out what can be improved, or at least figure

Re: [VOTE] Release Lucene 9.4.0 RC3

2022-09-27 Thread Patrick Zhai
+1 (non-binding) SUCCESS! [0:57:17.821452] On Tue, Sep 27, 2022 at 3:32 PM Michael McCandless < luc...@mikemccandless.com> wrote: > +1, smoke tester: > > SUCCESS! [0:26:01.696388] > > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Sep 27, 2022 at 3:45 PM Anshum Gupta >

Re: Is there a way to customize segment names?

2022-12-30 Thread Patrick Zhai
; No, you can't control them. And we must not open up anything to try to > support this. > > On Fri, Dec 16, 2022 at 7:28 PM Patrick Zhai wrote: > > > > Hi Mike, Robert > > > > Thanks for replying, the system is almost like what Mike has described: > one writer i

Re: [JENKINS] Lucene-main-Linux (64bit/jdk-17.0.3) - Build # 39124 - Still Unstable!

2022-12-30 Thread Patrick Zhai
Seems related to the commit just pushed? I put a quick fix: https://github.com/apache/lucene/pull/12049 On Fri, Dec 30, 2022 at 9:35 AM Policeman Jenkins Server < jenk...@thetaphi.de> wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/39124/ > Java: 64bit/jdk-17.0.3

Re: [JENKINS] Lucene-9.x-Windows (64bit/jdk-18) - Build # 1677 - Still Failing!

2022-12-30 Thread Patrick Zhai
My bad, when the fix I backported contains JDK17 API and so it can't build in 9x, I just pushed the fix for it, sorry! On Fri, Dec 30, 2022 at 5:01 PM Policeman Jenkins Server < jenk...@thetaphi.de> wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Windows/1677/ > Java: 64bit/jdk-18

Re: Is there a way to customize segment names?

2022-12-15 Thread Patrick Zhai
e also contains a unique identifier tied to > its commit so that we know everything is intact. > > I would look at the segment replication in lucene/replicator and not > try to play games with files and mixing multiple writers. > > On Thu, Dec 15, 2022 at 5:45 PM Patrick Zhai

Is there a way to customize segment names?

2022-12-15 Thread Patrick Zhai
Hi Folks, We're trying to build a search architecture using segment replication (indexer and searcher are separated and indexer shipping new segments to searchers) right now and one of the problems we're facing is: for availability reason we need to have multiple indexers running, and when the

Re: Is there a way to customize segment names?

2022-12-16 Thread Patrick Zhai
down this > > path (playing tricks with filenames) isn't going to work out well. > > > > On Fri, Dec 16, 2022 at 2:48 AM Patrick Zhai wrote: > > > > > > Hi Robert, > > > > > > Maybe I didn't explain it clearly but we're not going to consta

Re: [JENKINS] Lucene-9.4-Linux (64bit/jdk-17.0.3) - Build # 750 - Unstable!

2022-11-20 Thread Patrick Zhai
I can't reproduce this one, it seems to me when "addIndexes" is called there's some race condition (so that 'ord' is of type null on IW side), but I've checked the code and it is soundly protected by synchronized method it seems. Not sure whether it is a transient error or not. On Sun, Nov 20,

Re: Dense union of doc IDs

2022-11-04 Thread Patrick Zhai
Hi Froh, The idea sounds reasonable to me, altho I wonder whether using CONSTANT_SCORE_BOOLEAN_REWRITE would help with your case since that dense union case should be already handled by disjunction query I suppose? But that boolean rewrite is subject to max clause limit so it may have some other

Re: Lucene PMC Chair Greg Miller

2023-03-07 Thread Patrick Zhai
Thank you Bruno and Greg! On Tue, Mar 7, 2023, 10:40 Mikhail Khludnev wrote: > Thank you, Bruno. Congratulations, Greg. > > On Mon, Mar 6, 2023 at 8:16 PM Bruno Roustant > wrote: > >> Hello Lucene developers, >> >> Lucene Program Management Committee has elected a new chair, Greg >> Miller,

Should IndexWriter.flush return seqNo?

2023-04-19 Thread Patrick Zhai
Hi folks, I just realized that while "commit" returns the sequence number which represents the latest event that committed in the index, "flush" still returns nothing. Since they're essentially the same except fsync I wonder whether there's any specific reason to not do so? Best Patrick

Re: Should IndexWriter.flush return seqNo?

2023-04-21 Thread Patrick Zhai
moves stuff > from RAM to disk but not in a way that indexreader can see it or > anything, right? > > It doesn't make much sense that this method is public in the API, > definitely adding sequence number makes no sense since nothing was > committed here. > > On Thu, Apr 20,

Re: Patch to change murmurhash implementation slightly

2023-04-24 Thread Patrick Zhai
I did a quick run with your patch, but since I turned on the CMS as well as TieredMergePolicy I'm not sure how fair the comparison is. Here's the result: Candidate: Indexer: indexing done (890209 msec); total 2620 docs Indexer: waitForMerges done (71622 msec) Indexer: finished (961877 msec)

Re: Should IndexWriter.flush return seqNo?

2023-04-26 Thread Patrick Zhai
> Patrick maybe you had an interesting use case in mind? I had one, but later on I found out that I don't necessarily use flush to achieve that so it's not really a valid use case that definitely need flush... On Tue, Apr 25, 2023 at 7:26 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com>

Re: Help to find the RC of incompatible analyers

2023-04-28 Thread Patrick Zhai
It sounds like an EnglishPossessiveFilter is missing and I think it is not relevant to the filters you listed? Are there other Lucene filters you're using? Also what exact versions are you upgrading from and to? On Fri, Apr 28, 2023 at 10:20 AM MyCoy Z wrote: > Hi, Lucene dev community: > >

Re: [JENKINS] Lucene-main-Linux (64bit/openj9/jdk-17.0.5) - Build # 41045 - Unstable!

2023-04-03 Thread Patrick Zhai
Seems it fails just because `System.gc()` not always force gc for the wanted object, I created a PR to ignore it: https://github.com/apache/lucene/pull/12223 On Mon, Apr 3, 2023 at 8:38 AM Policeman Jenkins Server wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/41045/ > Java:

Re: Lucene 9.7 release

2023-06-09 Thread Patrick Zhai
+1, thank you Adrien! On Fri, Jun 9, 2023, 09:08 Adrien Grand wrote: > Hello all, > > There is some good stuff that is scheduled for 9.7 already, I found the > following changes in the changelog that look especially interesting: > - Concurrent query rewrites for vector queries. > - Speedups

Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-20) - Build # 10739 - Unstable!

2023-05-28 Thread Patrick Zhai
The similar issue appears twice recently so I think it might be worth increasing the delta a bit? https://github.com/apache/lucene/pull/12338/ On Sun, May 28, 2023 at 10:44 PM Policeman Jenkins Server < jenk...@thetaphi.de> wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-9.x-Linux/10739/

Re: [JENKINS] Lucene-9.x-MacOSX (64bit/hotspot/jdk-16.0.2) - Build # 2385 - Unstable!

2023-05-28 Thread Patrick Zhai
Can't reproduce in ~10 tries, the test logic seems ok to me too On Sun, May 28, 2023 at 6:48 PM Policeman Jenkins Server < jenk...@thetaphi.de> wrote: > Build: https://jenkins.thetaphi.de/job/Lucene-9.x-MacOSX/2385/ > Java: 64bit/hotspot/jdk-16.0.2 -XX:+UseCompressedOops -XX:+UseShenandoahGC > >

Re: Updates documents using queries

2023-05-31 Thread Patrick Zhai
teDocuments(org.apache.lucene.search.Query...)]. >> >> Implementation behind could be the same. Basically it would do the same >> but just use delQuery using the DocIdSetIteraor of the query and >> Iterable for the new block. >> >> Uwe >> >> Am 30.05.20

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 1019 - Unstable!

2023-06-01 Thread Patrick Zhai
This reproduces constantly for me, I even rolls back to some (random) old commit: 9acc6539959 (in last September) and it still reproduces. Found another similar failure: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/8835/ The error seems to be a more common one but somehow (so far)

Updates documents using queries

2023-05-30 Thread Patrick Zhai
Hi folks, Currently the only way to update a block of documents is by identifying them with a term and update those documents. However we have a case where the child documents does not share a same identifier as parent documents, and to identify the whole block of documents we need to use at least

Re: [JENKINS] Lucene-9.x-Linux (64bit/openj9/jdk-17.0.5) - Build # 10864 - Unstable!

2023-06-03 Thread Patrick Zhai
Just want to mention that this is using openj9 and we have previously seen quite a lot of unreproducible weird errors (e.g. AIOOB) with it, so not sure whether this is yet another one of them or not.. On Sat, Jun 3, 2023 at 1:14 PM Michael McCandless wrote: > Hmm how is this test case

Re: [VOTE] Release Lucene 9.7.0 RC1

2023-06-24 Thread Patrick Zhai
SUCCESS! [0:53:17.495903] +1 (non-binding), thank you Adrien! Patrick On Sat, Jun 24, 2023 at 3:00 PM Michael McCandless < luc...@mikemccandless.com> wrote: > +1 > > SUCCESS! [0:16:13.144051] > > Mike > > On Fri, Jun 23, 2023, 11:48 PM Gautam Worah > wrote: > >> SUCCESS! [0:32:53.769993] >>

Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-17.0.5) - Build # 11322 - Unstable!

2023-06-27 Thread Patrick Zhai
gt; > http://blog.mikemccandless.com > > > On Tue, Jun 27, 2023 at 2:34 AM Patrick Zhai wrote: > >> The exception was thrown because TimeLimitingBulkScorer passed in a "max" >> which is larger than the maxDoc in the segment. And then MaxScoreBulkScorer >> d

Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-17.0.5) - Build # 11322 - Unstable!

2023-06-27 Thread Patrick Zhai
The exception was thrown because TimeLimitingBulkScorer passed in a "max" which is larger than the maxDoc in the segment. And then MaxScoreBulkScorer directly returns the rangeEnd as the next estimation here

Re: TermInSetQuery: seekExact vs. seekCeil

2023-05-05 Thread Patrick Zhai
Hi Greg IMO I still think the seekCeil is a better solution for the default posting format, as it could potentially save time on traversing the FST by doing the ping-pong skipping. I can see that in the case of using bloom filter the seekExact might be better but I'm not sure whether there is a

Re: [VOTE] Release Lucene 9.6.0 RC2

2023-05-04 Thread Patrick Zhai
(non-binding) +1 SUCCESS! [0:47:50.033079] On Thu, May 4, 2023 at 1:27 PM Greg Miller wrote: > +1 SUCCESS! [1:02:49.795869] > > Cheers, > -Greg > > On Wed, May 3, 2023 at 5:14 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Er, +1 too ;) >> >> Mike McCandless >> >>

Re: Welcome Stefan Vodita as Lucene committter

2024-01-18 Thread Patrick Zhai
Welcome and Congrats, Stefan. Patrick On Thu, Jan 18, 2024, 08:45 Chris Hegarty wrote: > Welcome Stefan. > > -Chris. > > > On 18 Jan 2024, at 15:53, Michael McCandless > wrote: > > > > Hi Team, > > > > I'm pleased to announce that Stefan Vodita has accepted the Lucene PMC's > invitation to

GDPR compliance

2023-11-28 Thread Patrick Zhai
Hi Folks, In LinkedIn we need to comply with GDPR for a large part of our data, and an important part of it is that we need to be sure we have completely deleted the data the user requested to delete within a certain period of time. The way we have come up with so far is to: 1. Record the segment

Re: GDPR compliance

2023-11-28 Thread Patrick Zhai
3, Robert Muir wrote: > >> > >> I don't think there's any problem with GDPR, and I don't think users > >> should be running unnecessary "optimize". GDRP just says data should > >> be erased without "undue" delay. waiting for a merge to nuk

Re: GDPR compliance

2023-11-28 Thread Patrick Zhai
deleted docs isn't "undue", there is a good reason for it. > > On Tue, Nov 28, 2023 at 2:40 PM Patrick Zhai wrote: > > > > Hi Folks, > > In LinkedIn we need to comply with GDPR for a large part of our data, > and an important part of it is that we need to be sure

Re: [JENKINS] Lucene-9.x-Linux (64bit/hotspot/jdk-11.0.21) - Build # 14204 - Unstable!

2023-12-01 Thread Patrick Zhai
Seems it's because this MockRandomMergePolicy change recently makes ParallelLeafReader unhappy - it's reading two parallel segments from 2 dir and this MP makes

Re: [VOTE] Release Lucene 9.9.0 RC1

2023-11-29 Thread Patrick Zhai
SUCCESS! [1:03:54.880200] +1. Thank you Chris! On Wed, Nov 29, 2023 at 8:45 PM Nhat Nguyen wrote: > SUCCESS! [1:11:30.037919] > > +1. Thanks, Chris! > > On Wed, Nov 29, 2023 at 8:53 AM Chris Hegarty > wrote: > >> Hi, >> >> >> Please vote for release candidate 1 for Lucene 9.9.0 >> >> >> The

Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-12 Thread Patrick Zhai
Thank you everyone! On Sun, Nov 12, 2023, 09:34 Dawid Weiss wrote: > > > Congratulations and welcome, Patrick! > > Dawid > > On Fri, Nov 10, 2023 at 9:05 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> I'm happy to announce that

Re: Lucene 9.9.0 Release

2023-11-21 Thread Patrick Zhai
+1, thank you Chris! On Tue, Nov 21, 2023, 06:49 Benjamin Trent wrote: > +1 9.9 will be a stellar release! > > Thank you Chris! > > On Tue, Nov 21, 2023 at 7:31 AM Adrien Grand wrote: > >> +1 9.9 has plenty of great changes indeed! Thanks for volunteering as a >> RM, Chris. >> >> It would be

Re: [Vote] Bump the Lucene main branch to Java 21

2024-02-23 Thread Patrick Zhai
+1 On Fri, Feb 23, 2024 at 9:34 AM Dawid Weiss wrote: > > I'm fine with this requirement. > > +1. > > On Fri, Feb 23, 2024 at 12:24 PM Chris Hegarty > wrote: > >> Hi, >> >> Since the discussion on bumping the Lucene main branch to Java 21 is >> winding down, let's hold a vote on this important

Re: Lucene 10

2024-03-15 Thread Patrick Zhai
Thanks Adrien +1 to the timelines. I'm also willing to work on/ review the Decouple search concurrency from index geometry task, that's a very nice one to have for those latency sensitive applications (rather than have to tune merge policy case by

Re: Lucene Index/Search Synchronization

2024-05-17 Thread Patrick Zhai
Hi Alexander, IndexWriter by default will only keep the segment files of the latest commit and delete the old segments that are not referenced by IndexWriter anymore, so the situation you described do exists To address the problem, there's several options: 1. You can use Lucene's NRT search

Re: [VOTE] Release Lucene 9.11.0 RC1

2024-06-05 Thread Patrick Zhai
+1 SUCCESS! [1:01:30.064666] On Wed, Jun 5, 2024 at 11:08 AM Houston Putman wrote: > +1 > > SUCCESS! [1:49:36.192513] > > - Houston Putman > > On Wed, Jun 5, 2024 at 12:58 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> +1 SUCCESS! [0:24:55.332837] >> >> Mike McCandless >> >>