Re: Query about the GitHub statistics for Lucene

2024-03-05 Thread Robert Muir
On Tue, Mar 5, 2024 at 4:50 AM Chris Hegarty wrote: > It appears that there is no GH activity for 2024! Clearly this is incorrect. > I’ve yet to track down what’s going on with this. Familiar to anyone here? > Last time I looked at this, it appeared it is looking at the incorrect github

Re: [VOTE] Release Lucene 9.10.0 RC1

2024-02-15 Thread Robert Muir
On Thu, Feb 15, 2024 at 9:54 AM Uwe Schindler wrote: > > Hi, > > My Python knowledge is too limited to fix the build script to allow to test > the smoker with arbitrary JAVA_HOME dircetories next to the baseline (Java > 11). With lots of copypaste I can make it run on Java 21 in addition to 17,

Re: The need for a Lucene 9.9.1 release

2023-12-09 Thread Robert Muir
I don't understand use of the word corruption, isn't it just a bug in intersect() that only affects wildcards etc? e.g. its not gonna merge into new segments or impact written data in any way. And i don't think we should rushout some bugfix release without any test for this? On Sat, Dec 9, 2023

Re: GDPR compliance

2023-11-28 Thread Robert Muir
d in practice? I > was assuming that if there is not a lot of new indexed content and not a lot > of older documents being deleted, large older segment might never have to be > merged. > > > On Tue 28 Nov 2023 at 20:53, Robert Muir wrote: >> >> I don't think th

Re: GDPR compliance

2023-11-28 Thread Robert Muir
I don't think there's any problem with GDPR, and I don't think users should be running unnecessary "optimize". GDRP just says data should be erased without "undue" delay. waiting for a merge to nuke the deleted docs isn't "undue", there is a good reason for it. On Tue, Nov 28, 2023 at 2:40 PM

Re: Ascii folding

2023-11-10 Thread Robert Muir
at 1:13 PM Robert Muir wrote: > > For visual confusing characters we have the option to expose specific > processing for that, e.g. > https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/com/ibm/icu/text/SpoofChecker.html#getSkeleton-java.lang.CharSequence- > > Maybe there are u

Re: Ascii folding

2023-11-10 Thread Robert Muir
For visual confusing characters we have the option to expose specific processing for that, e.g. https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/com/ibm/icu/text/SpoofChecker.html#getSkeleton-java.lang.CharSequence- Maybe there are use-cases for a search engine, e.g. find me documents with

Re: Bump minimum Java version requirement to 21

2023-11-06 Thread Robert Muir
de sitting on the shelf for years). Run "git blame lucene/CHANGES.txt" if you think I am crazy. Here's a change I made nearly two years ago, it just sits on the shelf. 84e4b85b094c lucene/CHANGES.txt (Robert Muir 2021-12-07 21:39:13 -050014) * LUCENE-10010: AutomatonQuery, CompiledA

Re: Bump minimum Java version requirement to 21

2023-11-06 Thread Robert Muir
On Mon, Nov 6, 2023 at 4:22 AM Chris Hegarty wrote: > > Hi, > > Great discussion, I agree with all that you have said. And that we will have > to deal with the intricacies of the MR-JAR regardless of the outcome here, > which is doable. > > I would very much like to avoid supporting Java 17

Re: Squash vs merge of PRs

2023-11-04 Thread Robert Muir
ed to spam. On Sat, Nov 4, 2023 at 8:36 AM Mike Drob wrote: > > We all agree on using Java though, and using a specific version, and even the > style output from gradle tidy. Is that nanny state or community consensus? > > On Sat, Nov 4, 2023 at 7:29 AM Robert Muir wrote: >>

Re: Squash vs merge of PRs

2023-11-04 Thread Robert Muir
example of a nanny state IMO, trying to dictate what git commands to use, or what editor to use. Maybe this works for you in your corporate hellholes, but I think some folks have a bit of a power issue, are accustomed to dictacting this stuff to their employees and so on, but this is open-source.

Re: Can we get rid of "Approve & Run" on GitHub PRs by new contributors (non-committers)?

2023-10-24 Thread Robert Muir
> > Ooh, thank you Dawid! And it's now merged, so we now have a decent timeout > protection, so if a bad actor tries to crypto mine or run some distributed > LLM or whatever, at least the wasted resources are bounded by how long a > "typical" legitimate run takes, plus generous buffer. So

Re: Could we allow an IndexInput to read from a still writing IndexOutput?

2023-10-19 Thread Robert Muir
what will happen on windows? sorry, could not resist. On Thu, Oct 19, 2023 at 9:48 AM Michael McCandless wrote: > > Hi Team, > > Today, Lucene's Directory abstraction does not allow opening an IndexInput on > a file until the file is fully written and closed via IndexOutput. We > enforce

Re: Can we get rid of "Approve & Run" on GitHub PRs by new contributors (non-committers)?

2023-10-16 Thread Robert Muir
I think running the builds with a timeout is a good thing to do anyway, for any CI build. I'm sure github actions has some fancy yaml for that, but you can just do "timeout -k 1m 1h ./gradlew..." instead of "./gradlew" too. On Mon, Oct 16, 2023 at 9:58 AM Michael McCandless wrote: > > When a

Re: [JENKINS] Lucene » Lucene-NightlyTests-9.x - Build # 665 - Unstable!

2023-08-31 Thread Robert Muir
this leniency when binding to port 0. nuke it. On Thu, Aug 31, 2023 at 8:46 AM Robert Muir wrote: > > probably a bug in some jvm sockets code that called accept() in its > default blocking mode, when there wasn't any connection to accept? in > that case accept() call will just block and wait

Re: [JENKINS] Lucene » Lucene-NightlyTests-9.x - Build # 665 - Unstable!

2023-08-31 Thread Robert Muir
probably a bug in some jvm sockets code that called accept() in its default blocking mode, when there wasn't any connection to accept? in that case accept() call will just block and wait for someone to make a new connection. On Thu, Aug 31, 2023 at 8:16 AM Dawid Weiss wrote: > > >

Re: Patch to change murmurhash implementation slightly

2023-08-25 Thread Robert Muir
ads to >>>> https://pastebin.com/kkggV9Vx >>>> >>>> Now, the test vectors in that pastebin do not match either the output of >>>> pre-change Lucene's murmur3, nor the output of the Python mmh3 package. >>>> That said, the pre-change Lucen

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Robert Muir
Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github &

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
he Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, May 16, 2023 at 9:50 PM Robert Muir wrote: > >> by the way, i agree with the idea to MOVE THE LIMIT UNCHANGED to the >> hsnw-specific code. >> >> This way, someone can write a

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
t performs. On Tue, May 16, 2023 at 8:53 PM Robert Muir wrote: > Gus, I think i explained myself multiple times on issues and in this > thread. the performance is unacceptable, everyone knows it, but nobody is > talking about. > I don't need to explain myself time and time again

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
ses. If people hit a hard limit, more of them give up > and never develop the code that will motivate them to look for > optimizations. > > -Gus > > On Tue, May 16, 2023 at 6:04 AM Robert Muir wrote: > >> i still feel -1 (veto) on increasing this limit. sending more em

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
i still feel -1 (veto) on increasing this limit. sending more emails does not change the technical facts or make the veto go away. On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for

Re: TermInSetQuery: seekExact vs. seekCeil

2023-05-09 Thread Robert Muir
I remember the benefits from Terms.intersect being pretty huge. Rather than simple ping-pong, the whole monster gets handed off directly to the codec's term dictionary implementation. For the default terms dictionary using blocktree, this saves time seeking to terms you don't care about (because

Re: TermInSetQuery: seekExact vs. seekCeil

2023-05-09 Thread Robert Muir
The better solution is to use Terms.intersect. Then the postings format can do the right thing. But this query doesn't use Terms.intersect today, instead doing ping-ponging itself. That's the problem. We must *not* tune our algorithms for amazon's search but instead what is the best for users

Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

2023-05-02 Thread Robert Muir
On Tue, May 2, 2023 at 3:24 PM Michael Froh wrote: > > > This seems ok if it isn't invasive. I still feel like something is > > "off" if you are seeing GC time from 1KB-per-segment allocation. Do > > you have way too many segments? > > From what I saw, it's 1KB per "leaf query" to create the

Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

2023-05-02 Thread Robert Muir
On Tue, May 2, 2023 at 2:34 PM Robert Muir wrote: > > On Tue, May 2, 2023 at 12:49 PM Michael Froh wrote: > > > > Hi all, > > > > I was looking into a customer issue where they noticed some increased GC > > time after upgrading from Lucene 7.x to 9.x. After t

Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

2023-05-02 Thread Robert Muir
On Tue, May 2, 2023 at 12:49 PM Michael Froh wrote: > > Hi all, > > I was looking into a customer issue where they noticed some increased GC time > after upgrading from Lucene 7.x to 9.x. After taking some heap dumps from > both systems, the big difference was tracked down to the float[256]

Re: Patch to change murmurhash implementation slightly

2023-04-25 Thread Robert Muir
t; of this thread was a friendly request to please point me to instructions for > running a broad range of Lucene indexing benchmarks, so I can gather data for > further discussion; from my perspective, we haven't even gathered any data, > so obviously we haven't seen an

Re: Patch to change murmurhash implementation slightly

2023-04-25 Thread Robert Muir
va that test strings of length > greater than 8, and my change passes them. Could you explain what you want > tested? > > Cheers, > Thomas > > On Tue, Apr 25, 2023 at 4:21 PM Robert Muir wrote: >> >> sure, but "if length > 8 return 1" might pass these same

Re: Patch to change murmurhash implementation slightly

2023-04-25 Thread Robert Muir
producing that data. > > Cheers, > Thomas > > > > On Tue, Apr 25, 2023 at 4:02 PM Robert Muir wrote: >> >> well there is some cost, as it must add additional checks to see if >> its longer than 8. in your patch, additional loops. it increases the >>

Re: Patch to change murmurhash implementation slightly

2023-04-25 Thread Robert Muir
to a crawl. On Tue, Apr 25, 2023 at 9:56 AM Thomas Dullien wrote: > > Ah, I see what you mean. > > You are correct -- the change will not speed up a 5-byte word, but it *will* > speed up all 8+-byte words, at no cost to the shorter words. > > On Tue, Apr 25, 2023 at 3:20 

Re: Patch to change murmurhash implementation slightly

2023-04-25 Thread Robert Muir
4 > isn't quite enough? > > Cheers, > Thomas > > On Tue, Apr 25, 2023 at 3:07 PM Robert Muir wrote: >> >> i think from my perspective it has nothing to do with cpus being >> 32-bit or 64-bit and more to do with the average length of terms in >> most languages

Re: Patch to change murmurhash implementation slightly

2023-04-25 Thread Robert Muir
slower > indexing on (the dwindling) 32 bit CPUs? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Apr 25, 2023 at 7:39 AM Robert Muir wrote: >> >> I think the results of the benchmark will depend on the properties of >> the indexed ter

Re: Patch to change murmurhash implementation slightly

2023-04-25 Thread Robert Muir
I think the results of the benchmark will depend on the properties of the indexed terms. For english wikipedia (luceneutil) the average word length is around 5 bytes so this optimization may not do much. On Tue, Apr 25, 2023 at 1:58 AM Patrick Zhai wrote: > > I did a quick run with your patch,

Re: Should IndexWriter.flush return seqNo?

2023-04-23 Thread Robert Muir
> > Yes thats true, I just have to add: You can still open a NRT reader > directly from IndexWriter. But you don't need a sequence number there as > its hidden completely. So flushing is fine to allow users to get a new > NRT reader with the state up to that point, but it does not need to > return

Re: Should IndexWriter.flush return seqNo?

2023-04-21 Thread Robert Muir
This is not true: if i call IndexWriter.commit, then i can open an indexreader and see the documents. IndexWriter.flush doesn't do anything at all, really, just moves stuff from RAM to disk but not in a way that indexreader can see it or anything, right? It doesn't make much sense that this

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Robert Muir
ne here is saying we > won't address it, it's just a separate discussion. > > > On Sun, 9 Apr 2023, 12:59 Robert Muir, wrote: >> >> Also, please let's only disucss SEARCH. lucene is a SEARCH ENGINE >> LIBRARY. not a vector database or whatever trash is being proposed >

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Robert Muir
with basically anyone on this thread because they are all stating crazy things that don't make sense. On Sun, Apr 9, 2023 at 6:25 AM Robert Muir wrote: > > Yes, its very clear that folks on this thread are ignoring reason > entirely and completely swooned by chatgpt-hype. > And what

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Robert Muir
mance, >> > they may contribute improvements. >> > This is how you make progress. >> > >> > If it's a reputation thing, trust me that not allowing users to play with >> > high dimensional space will equally damage it. >> > >> >

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-08 Thread Robert Muir
gt; Then you complain about people not meeting you half way. Wow > > On Sat, Apr 8, 2023, 12:40 PM Robert Muir wrote: >> >> On Sat, Apr 8, 2023 at 8:33 AM Michael Wechner >> wrote: >> > >> > What exactly do you consider reasonable? >> >> Let'

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-08 Thread Robert Muir
On Sat, Apr 8, 2023 at 8:33 AM Michael Wechner wrote: > > What exactly do you consider reasonable? Let's begin a real discussion by being HONEST about the current status. Please put politically correct or your own company's wishes aside, we know it's not in a good state. Current status is the

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-08 Thread Robert Muir
ve this without prior >> knowledge of the vectors. Faiss has a nice implementation that fits >> naturally with Lucene called IVF ( >> https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexIVF.html) >> but if we want to avoid running kmeans on every merge we d require t

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-08 Thread Robert Muir
>> > one per cluster. In our case the vectors in each segment could belong to >> > different cluster so I don’t see how we could merge them efficiently. >> > >> > On Fri, 7 Apr 2023 at 22:28, jim ferenczi wrote: >> >> >> >> The inference time (an

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Robert Muir
; Regarding the ram buffer, we could drastically reduce the size by writing >> the vectors on disk instead of keeping them in the heap. With 1k dimensions >> the ram buffer is filled with these vectors quite rapidly. >> >> On Fri, 7 Apr 2023 at 21:59, Robert Muir wrote: >&g

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Robert Muir
On Fri, Apr 7, 2023 at 5:13 PM Benjamin Trent wrote: > > From all I have seen when hooking up JFR when indexing a medium number of > vectors(1M +), almost all the time is spent simply comparing the vectors > (e.g. dot_product). > > This indicates to me that another algorithm won't really help

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-07 Thread Robert Muir
On Fri, Apr 7, 2023 at 7:47 AM Michael Sokolov wrote: > > 8M 1024d float vectors indexed in 1h48m (16G heap, IW buffer size=1994) > 4M 2048d float vectors indexed in 1h44m (w/ 4G heap, IW buffer size=1994) > > Robert, since you're the only on-the-record veto here, does this > change your thinking

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-06 Thread Robert Muir
gt; users and internal data structure optimizations, if any. > > > On Wed, 5 Apr 2023, 18:54 Robert Muir, wrote: >> >> I'd ask anyone voting +1 to raise this limit to at least try to index >> a few million vectors with 756 or 1024, which is allowed today. >> >

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Robert Muir
I'd ask anyone voting +1 to raise this limit to at least try to index a few million vectors with 756 or 1024, which is allowed today. IMO based on how painful it is, it seems the limit is already too high, I realize that will sound controversial but please at least try it out! voting +1 without

Re: Lucene 9.5.0 release

2023-01-17 Thread Robert Muir
+1 to release, thank you for volunteering to be RM! I went thru 9.5 section of CHANGES.txt and tagged all the GH issues in there with milestone too, if they didn't already have it. It looks even bigger now. On Fri, Jan 13, 2023 at 4:54 AM Luca Cavanna wrote: > > Hi all, > I'd like to propose

Re: Is there a way to customize segment names?

2022-12-17 Thread Robert Muir
t;> segments etc) and can pick up from there. You would need a mechanism >> to replay the writes the primary never had a chance to commit. >> >> On Fri, Dec 16, 2022 at 5:41 AM Robert Muir wrote: >> > >> > You are still talking "Multiple writers".

Re: Is there a way to customize segment names?

2022-12-16 Thread Robert Muir
ode (main indexer) is down, how would we recover with > a back up indexer? > > Thanks > Patrick > > > On Thu, Dec 15, 2022 at 7:16 PM Robert Muir wrote: > > > This multiple-writer isn't going to work and customizing names won't > > allow it anyway. Each file

Re: Is there a way to customize segment names?

2022-12-15 Thread Robert Muir
This multiple-writer isn't going to work and customizing names won't allow it anyway. Each file also contains a unique identifier tied to its commit so that we know everything is intact. I would look at the segment replication in lucene/replicator and not try to play games with files and mixing

Re: Fw: Need a Jira account in order to create a ticket for lucene-facet

2022-12-07 Thread Robert Muir
Hi Gennadiy, The lucene project has migrated from JIRA to Github Issues for issue tracking. Please create an issue here: https://github.com/apache/lucene/issues On Wed, Dec 7, 2022 at 11:23 AM Gennadiy Vaysman wrote: > > Hello, Lucene developers, > > My email below to iss...@lucene.apache.org

Re: [lucene] 02/03: Fix longstanding bug in path bounds calculation, and hook up efficient isWithin() and distance logic

2022-11-19 Thread Robert Muir
Multiple spatial tests are failing in jenkins... bisected them to this commit. Can you please look into it? https://github.com/apache/lucene/issues/11956 On Sat, Nov 19, 2022 at 8:22 PM wrote: > > This is an automated email from the ASF dual-hosted git repository. > > kwright pushed a commit to

Re: [VOTE] Release Lucene 9.4.2 RC1

2022-11-18 Thread Robert Muir
I think he is running this from jenkins job. I suspect agents have "stacked up" over time take a look with "ps". Every time i run the smoketester, it "leaks" at least an agent or two. On Fri, Nov 18, 2022 at 9:48 AM Adrien Grand wrote: > > Reading Uwe's error message more carefully, I had

Re: [VOTE] Release Lucene 9.4.2 RC1

2022-11-17 Thread Robert Muir
+1 SUCCESS! [1:16:29.706409] On Thu, Nov 17, 2022 at 9:18 AM Adrien Grand wrote: > > Please vote for release candidate 1 for Lucene 9.4.2 > > The artifacts can be downloaded from: > https://dist.apache.org/repos/dist/dev/lucene/lucene-9.4.2-RC1-rev-858d9b437047a577fa9457089afff43eefa461db > >

Re: [lucene] branch main updated: Prevent NPEs while still handling the polar case for horizontal planes right off the pole

2022-11-17 Thread Robert Muir
if your machine is really 12 cores and 64GB ram but is that slow, then uninstall that windows shit immediately, that's horrible. On Thu, Nov 17, 2022 at 5:46 AM Karl Wright wrote: > > Thanks - the target I was using was the complete "build" target on the whole > project. This will be a

Re: Release Lucene 9.4.2

2022-11-16 Thread Robert Muir
> I plan on starting the release process tomorrow if there are no objections. > > On Fri, Nov 11, 2022 at 4:22 PM Robert Muir wrote: >> >> These are the 9.4.2 completed issues: >> >> https://github.com/apache/lucene/pull/11905 <-- bug and associated monster >

Re: [JENKINS] Lucene-9.x-MacOSX (64bit/jdk-18) - Build # 1386 - Failure!

2022-11-16 Thread Robert Muir
x on main and checked that it works with error prone, in > process compilation and alt javac. But double checking would be probably > good. :) > > Dawid > > On Wed, Nov 16, 2022 at 12:18 AM Robert Muir wrote: >> >> It is my fault. I will revert my changes and test with &q

Re: Release Lucene 9.4.2

2022-11-11 Thread Robert Muir
te: > > > > +1 from me for a bugfix release once we've solidified testing. Thanks to > > everyone working on improving tests and static analysis -- this now is our > > second time encountering a bad arithmetic bug and it's important to get > > ahead of these issues

Re: Release Lucene 9.4.2

2022-11-09 Thread Robert Muir
Friday is also a public holiday here, > celebrating the end of World War 1. :) > > On Wed, Nov 9, 2022 at 4:41 PM Robert Muir wrote: >> >> Can we please have a few days to improve the test situation? I think >> we need to beef up checkindex to exercise seek() on the

Re: Release Lucene 9.4.2

2022-11-09 Thread Robert Muir
Can we please have a few days to improve the test situation? I think we need to beef up checkindex to exercise seek() on the vectors, also we need to look at static analysis to try to find other similar bugs. This would help prevent "whack-a-mole" and improve correctness going forwards. I want to

Re: Expressions greedy advanceExact implementation

2022-10-26 Thread Robert Muir
I think deferring the advance call like this is fine and harmless, only because this DoubleValues "caches" the result for the current doc, so its idempotent anyway. Yes, about "advancing all the operands" as I mentioned, expressions has no clue about this. If you wanted to change it, you'd have

Re: Expressions greedy advanceExact implementation

2022-10-25 Thread Robert Muir
Iirc the expressions acts like a simple scripting engine where it just compiles bytecode for your expression and you are able to bind variables that you pass to the method... I don't know of an easy way to do this. On Tue, Oct 25, 2022, 1:13 PM Michael Sokolov wrote: >

Re: [VOTE] Release Lucene 9.4.1 RC1

2022-10-21 Thread Robert Muir
ore) The slowest suites (exceeding 1s) during this run: 8512.27s TestManyKnnVectors (:lucene:core) BUILD SUCCESSFUL in 2h 22m 55s 19 actionable tasks: 13 executed, 6 up-to-date On Thu, Oct 20, 2022 at 3:57 PM Robert Muir wrote: > > Thank you Julie for the draft test! I will try to repro

Re: [VOTE] Release Lucene 9.4.1 RC1

2022-10-20 Thread Robert Muir
Thank you Julie for the draft test! I will try to reproduce/test with it. On Thu, Oct 20, 2022 at 3:45 PM Julie Tibshirani wrote: > > Thank you Ignacio for taking over as release manager! I ran into some issues > with my signing key and Ignacio saved the day. > > Robert, I understand your

Re: [VOTE] Release Lucene 9.4.1 RC1

2022-10-20 Thread Robert Muir
+0 SUCCESS! [0:39:31.979476] I say +0 instead of +1, because i am still worried that we release with a bugfix without any test. I am happy to change vote to a +1 if we even have a hacky test in a draft PR. the release artifacts don't need to contain such a test or anything like that. i just want

Re: Code coverage check for PRs

2022-10-05 Thread Robert Muir
oco/ On Wed, Oct 5, 2022 at 8:58 AM Patrick Zhai wrote: > > Make sense to me, I'll try to look into it! > > On Tue, Oct 4, 2022, 16:50 Robert Muir wrote: >> >> We already have code coverage integrated into the build. See the >> documentation on how to generate the report

Re: Code coverage check for PRs

2022-10-04 Thread Robert Muir
rwise I can try it a little bit with > my own repo first and then try to add it to lucene. > > Best > Patrick > > > > On Tue, Oct 4, 2022, 06:36 Robert Muir wrote: >> >> btw, you can look at the current reports created by jenkins here: >> https://ci-builds.ap

Re: Code coverage check for PRs

2022-10-04 Thread Robert Muir
btw, you can look at the current reports created by jenkins here: https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/lastBuild/jacoco/ On Tue, Oct 4, 2022 at 6:51 AM Robert Muir wrote: > > we can run the tests with coverage option and produce coverage graph > from t

Re: Code coverage check for PRs

2022-10-04 Thread Robert Muir
we can run the tests with coverage option and produce coverage graph from the github actions, but need to look at the docs to see where to put it so it will be available. I want us to be careful about the word "check" as I'm adamantly against any such automated check (e.g. coverage > N%) in the

Re: [JENKINS] Lucene-9.x-Linux (64bit/jdk-18) - Build # 5866 - Unstable!

2022-09-30 Thread Robert Muir
Github number). > > Uwe > > Am 30.09.2022 um 09:51 schrieb Robert Muir: > > I've seen this failure before here: > > https://github.com/apache/lucene/issues/11754 > > > > From what I remember, seems something blows up with the multiplier > > that causes the usag

Re: [JENKINS] Lucene-9.x-Linux (64bit/jdk-18) - Build # 5866 - Unstable!

2022-09-30 Thread Robert Muir
I've seen this failure before here: https://github.com/apache/lucene/issues/11754 >From what I remember, seems something blows up with the multiplier that causes the usage. On Fri, Sep 30, 2022 at 3:17 AM Uwe Schindler wrote: > > Hi, > > I have never seen this before. It looks like something in

Re: IMPORTANT: Please update your gradle.properties file in your Lucene checkout!

2022-09-27 Thread Robert Muir
the 'gradlew -q javaToolChains' command is useful to see which JVMs gradle knows about. On Tue, Sep 27, 2022 at 3:34 PM Uwe Schindler wrote: > > You just need to recreate Gradle properties, e.g. by deleting the old file. > > If you do not change anything Gradle will just work. On first build it

Re: [VOTE] Release Lucene 9.4.0 RC3

2022-09-27 Thread Robert Muir
+1 Smoketester works for me again without hassles, thanks Uwe. I tested both java 11 and java 17. SUCCESS! [2:49:13.336252] P.S. It would be nice option in the future to be able to test other versions that we have MR-jar'd code for (e.g. 19 in this case). On Tue, Sep 27, 2022 at 9:15 AM

Re: [VOTE] Release Lucene 9.4.0 RC1

2022-09-21 Thread Robert Muir
+1 Ran the smoketester with both java 11 and 17: SUCCESS! [2:41:19.024193] On Tue, Sep 20, 2022 at 10:10 PM Michael Sokolov wrote: > > Please vote for release candidate 1 for Lucene 9.4.0 > > The artifacts can be downloaded from: >

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 759 - Failure!

2022-09-13 Thread Robert Muir
Can also potentially avoid them and reduce the amount of back-n-forth by pulling from the ultimate URL instead of redirecting around: https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar On Tue, Sep 13, 2022 at 3:20 AM Dawid Weiss wrote: > > These 500/503s are

Re: release notes question

2022-09-02 Thread Robert Muir
Take a look here for the older ones: https://cwiki.apache.org/confluence/display/LUCENE/Release+Notes On one hand you have to deal with confluence, but using the wiki has the advantage that other ppl can edit it. So you can basically copy-paste from a previous one as a template and enlist help

Re: [lucene] branch main updated: SimpleText knn vectors; fix searchExhaustively and suppress a byte format test case (#11725)

2022-08-31 Thread Robert Muir
thanks for fixing! On Wed, Aug 31, 2022 at 2:43 PM Michael Sokolov wrote: > > Oh -- sorry, I guess I forgot to backport. Thanks for tracking it down > - I'll push to branch_9x shortly > > On Wed, Aug 31, 2022 at 10:25 AM Robert Muir wrote: > > > > can we backport to 9

Re: [JENKINS] Lucene » Lucene-Check-main - Build # 6584 - Failure!

2022-08-31 Thread Robert Muir
maybe the OOMKiller kicked in. On Wed, Aug 31, 2022 at 3:06 PM Dawid Weiss wrote: > > > I think Lucene tests killed the job runner. :) > > > Task :lucene:analysis:nori:spotlessJavaCheck > > Task :lucene:analysis:nori:spotlessCheck > FATAL: command execution failed > java.io.IOException: Backing

Re: [lucene] branch main updated: SimpleText knn vectors; fix searchExhaustively and suppress a byte format test case (#11725)

2022-08-31 Thread Robert Muir
can we backport to 9.x if you get a chance? I'm still seeing this test trip in 9.x jenkins builds. On Mon, Aug 29, 2022 at 11:50 AM wrote: > > This is an automated email from the ASF dual-hosted git repository. > > sokolov pushed a commit to branch main > in repository

Re: Label vs. Milestone for version management?

2022-08-25 Thread Robert Muir
On Thu, Aug 25, 2022 at 9:47 AM Michael Sokolov wrote: > > I agree; I've always used CHANGES for a quick historical view. What > about the release manager use case? I haven't done a release, but I > think we generally want to know if people are targeting changes for an > upcoming release,

Re: Label vs. Milestone for version management?

2022-08-25 Thread Robert Muir
On Thu, Aug 25, 2022 at 6:11 AM Michael Sokolov wrote: > > The milestone looks appealing since it is prominent and relatively easy to > use. The only drawback I have heard is that it is single valued. It still > seems we could use it to document the first version in which something is >

Re: [JENKINS] Lucene-9.x-MacOSX (64bit/jdk-18) - Build # 978 - Unstable!

2022-08-24 Thread Robert Muir
On Wed, Aug 24, 2022 at 11:40 AM Uwe Schindler wrote: > > Hi, > > this is the MacOS virtualbox. This one often hast timeshifts caused by > Virtualbox and the NTP daemon of OSX is bullshit (no chrony). > > Actually earlier versions of MacOS had a bug in their OS libc > segfaulting the app to crash

Re: [JENKINS] Lucene-9.x-MacOSX (64bit/jdk-18) - Build # 978 - Unstable!

2022-08-24 Thread Robert Muir
would indeed have to be significant > for this to fail (and in the middle of the process?!). Anyway, I'll > look into this - thanks for the pointer! > > Dawid > > On Wed, Aug 24, 2022 at 1:39 PM Robert Muir wrote: > > > > Hi Dawid, I looked at this and also >

Re: [JENKINS] Lucene-9.x-MacOSX (64bit/jdk-18) - Build # 978 - Unstable!

2022-08-24 Thread Robert Muir
Hi Dawid, I looked at this and also https://github.com/apache/lucene/issues/7687 If you look at the instances and how sporadic they are, the problem could be caused by TimeoutSuite using wall-clock time in com.carrotsearch.randomizedtesting? Especially in virtual machines, wall-clock time can be

Re: Boolean query regression after migrating from Lucene 8.5 to 9.2

2022-08-19 Thread Robert Muir
On Thu, Aug 18, 2022 at 1:47 PM Alexander Lukyanchikov wrote: > > Currently we are trying to avoid switching to MMAP because there is another > process running on the same host and extensively utilizes the FS cache. > This makes no sense, NIOFSDirectory uses the FS cache the exact same way as

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-18 Thread Robert Muir
rge majority of issues. > Don't we have to gain the consent of each individual to map both accounts? > No, we don't have to ask permission to mention someone with an @username > 2022年6月18日(土) 18:52 Robert Muir : > > > > I looked at some related projects on github: > > h

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-18 Thread Robert Muir
sue. (thus obfuscating/splitting >> >> > > > > the very important rich history across systems). >> >> > > > > >> >> > > > > So that's why I feel issues should be completely tracked in the >> >> > > > > system where

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-17 Thread Robert Muir
On Fri, Jun 17, 2022 at 3:27 PM Dawid Weiss wrote: > > I'd be more afraid of what happens to github issues in two years (or longer). > Will it look the same? Will it be different? Will it be gone (and how do we > get a backup of the isse history then)? Contrary to the apache-hosted Jira, >

Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-17 Thread Robert Muir
On Fri, Jun 17, 2022 at 12:08 PM Michael McCandless wrote: > > I agree the embedded links are tricky. Not sure whether we could do a big > rewrite of those links or not ... seems a chicken/egg situation. We could 1) > append a forwarding link comment on the Jira issue to its GitHub version,

Re: exposing per-field storage usage

2022-06-14 Thread Robert Muir
On Tue, Jun 14, 2022 at 10:37 AM Michael Sokolov wrote: > > Oh, yes that's a clever idea. It seems it would take quite a while > (tens of minutes?) for a larger index though? Much faster than the > force-merge solution for sure. I guess to get faster we would have to > instrument each format. I

Re: exposing per-field storage usage

2022-06-13 Thread Robert Muir
On Mon, Jun 13, 2022 at 3:26 PM Nhat Nguyen wrote: > > Hi Michael, > > We developed a similar functionality in Elasticsearch. The DiskUsage API > estimates the storage of each field by iterating its structures (i.e., > inverted index, doc-values, stored fields, etc.) and tracking the number of

Re: [VOTE] Migration to GitHub issue from Jira (LUCENE-10557)

2022-06-07 Thread Robert Muir
+1 On Mon, May 30, 2022 at 11:40 AM Tomoko Uchida wrote: > > Hi everyone! > > As we had previous discussion thread [1], I propose migration to GitHub issue > from Jira. > It'd be technically possible (see [2] for details) and I think it'd be good > for the project - not only for welcoming new

Re: Adding a new PointDocValuesField

2022-05-26 Thread Robert Muir
On Thu, May 26, 2022 at 11:49 AM Greg Miller wrote: > > I agree that technically it's just as good. I also think it's less > clear for a user. The concept of "points" is something we've > established in Lucene, so I think it makes sense for users to think > about indexing points as a doc value as

Re: Adding a new PointDocValuesField

2022-05-25 Thread Robert Muir
On Wed, May 25, 2022 at 2:08 PM Greg Miller wrote: > > > I guess with an “unsorted” numeric DV type we could get there with aligned > indices, as you describe, but that seems less appealing than supporting > multi-dim points directly. > Name one technical reason why? Unsorted would be exactly

Re: Adding a new PointDocValuesField

2022-05-25 Thread Robert Muir
On Wed, May 25, 2022 at 12:17 AM Greg Miller wrote: > > A "two separate field approach" would > consist of indexing year and make separately, and you'd lose the > information that only certain combinations are valid. Am I overlooking > something with your suggestion? Maybe there's something we

Re: Adding a new PointDocValuesField

2022-05-25 Thread Robert Muir
On Wed, May 25, 2022 at 8:04 AM Michael Sokolov wrote: > > Also, there should be examples from other fields. Suppose you are > indexing map data and want to support a UI that shows "hot spots" on > the map where there is a lot of let's say ... activity of some sort. > You'd like to facet on 2-d

Re: Adding a new PointDocValuesField

2022-05-24 Thread Robert Muir
This seems really exotic feature to add a dedicated docvalues field for. We should let BINARY be the catchall for stuff like this. On Mon, May 23, 2022 at 10:17 PM Marc D'Mello wrote: > > Hi, > > Some background: I've been working on this PR to add hyper rectangle faceting > capabilities to

Re: [VOTE] Release Lucene 9.2.0 RC1

2022-05-18 Thread Robert Muir
I opened issue about this. It shouldn't block the release, but it is pretty crazy and something to improve. https://issues.apache.org/jira/browse/LUCENE-10579 On Wed, May 18, 2022 at 3:10 PM Robert Muir wrote: > > It seems strange the way that > confirmAllReleasesAreTestedForBackCompat

  1   2   3   4   5   6   7   8   9   10   >