debugging query execution plan

2021-05-06 Thread Michael Sokolov
Do we have a way to understand how BooleanQuery (and other composite queries) are advancing their child queries? For example, a simple conjunction of two queries advances the more restrictive (lower cost()) query first, enabling the more costly query to skip over more documents. But we may not be m

Re: Welcome Zach Chen as Lucene committer

2021-04-19 Thread Michael Sokolov
Welcome, Zach! On Mon, Apr 19, 2021, 1:14 PM Gus Heck wrote: > Welcome Zach :) > > On Mon, Apr 19, 2021 at 1:09 PM Xi Chen > wrote: > >> Thanks Adrien for the announcement and everyone for the warm welcome! I’m >> deeply honored to be able to join this great community! >> >> >> I work at Amazon

Re: 9.0 release

2021-04-14 Thread Michael Sokolov
brought back to the lucene repo. Is this an accurate view of the state of >> things? >> >> Now that I'm done with 8.8.2, I would love to see how we can continue to >> make headway on 9.0! >> >> >> >> On Mon, Mar 29, 2021 at 3:25 PM Michael Sokol

Re: 9.0 release

2021-04-14 Thread Michael Sokolov
cene repo. Is this an accurate view of the state of > things? > > Now that I'm done with 8.8.2, I would love to see how we can continue to make > headway on 9.0! > > > > On Mon, Mar 29, 2021 at 3:25 PM Michael Sokolov wrote: >> >> There has been some discussio

Re: Welcome Peter Gromov as Lucene committer

2021-04-06 Thread Michael Sokolov
Welcome, Peter! On Tue, Apr 6, 2021 at 2:39 PM Michael McCandless wrote: > > Welcome Peter! Thank you for the massive Hunspell improvements! > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Apr 6, 2021 at 1:48 PM Robert Muir wrote: >> >> I'm pleased to announce that Peter Gr

Re: Redirect build logs from Solr 8.x branch to bui...@solr.apache.org

2021-04-02 Thread Michael Sokolov
+1 it would be nice to be able to sort these out differently with filters On Fri, Apr 2, 2021 at 3:54 AM Dawid Weiss wrote: > > > Hi folks! > > I know the development repository for 8x stays in the previous location but > can we (should we) update the mailing list address on Solr 8x build jobs t

Re: 9.0 release

2021-03-29 Thread Michael Sokolov
gt; Since you mentioned the Gradle build, I believe that we still need to >> migrate some of the release tooling from Ant to Gradle, e.g. >> dev-tools/scripts/addBackcompatIndexes.py. These scripts are not easy to >> test without actually doing a release so the 9.0 RM might have so

Re: Questions about the new vector API

2021-03-28 Thread Michael Sokolov
Ugh sorry for misspelling your name, I blame the phone! On Sun, Mar 28, 2021, 6:50 AM Michael Sokolov wrote: > Hi Dimitry, I worked initially from the papers cited in LUCENE-9004, which > I think is also what Tomoko was doing. Later I did refer to nmslib too. > > On Sat, Mar 27, 2

Re: Questions about the new vector API

2021-03-28 Thread Michael Sokolov
t; different KNN implementations and blogging about it. > > Did you use nmslib for HNSW implementation or something else? > > On Tue, 16 Mar 2021 at 22:47, Michael Sokolov wrote: > >> Yeah, HNSW is problematic in a few ways: (1) merging is costly due to >> the ne

Re: Questions about the new vector API

2021-03-17 Thread Michael Sokolov
//github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java > > Would a similar approach work in your case? > > Le mar. 16 mars 2021 à 22:21, Michael Sokolov a écrit : >> >> > I was more thinking of moving VectorValues#search to

Re: Questions about the new vector API

2021-03-16 Thread Michael Sokolov
n merging for example. -Mike On Tue, Mar 16, 2021 at 2:15 PM Adrien Grand wrote: > > Hello Mike, > > On Tue, Mar 16, 2021 at 5:05 PM Michael Sokolov wrote: >> >> I think the reason we have search() on VectorValues is that we have >> LeafReader.getVectorValues() (by an

Re: Questions about the new vector API

2021-03-16 Thread Michael Sokolov
n every access. > I don't see any way to even amortize the pain with some kind of bulk merge > trick. > > So if we find algorithms that scale better, I think we should lend a > preference towards them. For example, algorithms that allow > per-segment/sequential index

Re: Questions about the new vector API

2021-03-16 Thread Michael Sokolov
ential iterators and > don't need random access? > > Seems like these should be the ones we initially add to lucene, and HNSW > should be put aside for now? (is it a toy, or can we do it without jazillions > of random accesses?) > > On Tue, Mar 16, 2021 at 12:15 PM Micha

Re: Questions about the new vector API

2021-03-16 Thread Michael Sokolov
maybe we should abandon the iterator API since it is redundant (you can always iterate over a random access API if you know the size)? On Tue, Mar 16, 2021 at 12:10 PM Michael Sokolov wrote: > > Also, Tomoko re:LUCENE-9322, did it succeed? I guess we won't know for > sure unless some

Re: Questions about the new vector API

2021-03-16 Thread Michael Sokolov
Also, Tomoko re:LUCENE-9322, did it succeed? I guess we won't know for sure unless someone revives https://issues.apache.org/jira/browse/LUCENE-9136 or something like that On Tue, Mar 16, 2021 at 12:04 PM Michael Sokolov wrote: > > Consistent plural naming makes sense to me. I think

Re: Questions about the new vector API

2021-03-16 Thread Michael Sokolov
Consistent plural naming makes sense to me. I think it ended up singular because I am biased to avoid plural names unless there is a useful distinction to be made. But consistency should trump my predilections. I think the reason we have search() on VectorValues is that we have LeafReader.getVecto

Re: Lucene-Solr-cloud2refimpl

2021-03-13 Thread Michael Sokolov
i.de > eMail: u...@thetaphi.de > > > -----Original Message- > > From: Michael Sokolov > > Sent: Friday, March 12, 2021 2:59 PM > > To: Lucene Dev > > Subject: Lucene-Solr-cloud2refimpl > > > > Should t

Re: Lucene (unexpected ) fsync on existing segments

2021-03-12 Thread Michael Sokolov
Also - I should have said - I think the first step here is to write a focused unit test that demonstrates the existence of the extra fsyncs that we want to eliminate. It would be awesome if you were able to create such a thing. On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov wrote: > &g

Re: Lucene (unexpected ) fsync on existing segments

2021-03-12 Thread Michael Sokolov
h > might not be feasible. So we directly back the index up from the Solr node to > a remote repository. > > Thanks, > Rahul > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov wrote: >> >> Well, it certainly doesn't seem necessary to fsync files that are &g

Lucene-Solr-cloud2refimpl

2021-03-12 Thread Michael Sokolov
Should these Jenkins builds be directing their email to bui...@solr.apache.org now? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene (unexpected ) fsync on existing segments

2021-03-11 Thread Michael Sokolov
ed my fundamental understanding that segment files once > written are immutable, no matter what (unless picked up for a merge of > course). Hence I thought of reaching out, in case there are scenarios where > this might happen which I might be unaware of. > > Thanks, > Ra

Re: Lucene (unexpected ) fsync on existing segments

2021-03-11 Thread Michael Sokolov
This isn't a support forum; solr-users@ might be more appropriate. On that list someone might have a better idea about how the replication handler gets its list of files. This would be a good list to try if you wanted to propose a fix for the problem you're having. But since you're here -- it looks

Re: Welcome Bruno to the Apache Lucene PMC

2021-03-10 Thread Michael Sokolov
Welcome, Bruno! On Wed, Mar 10, 2021, 7:56 PM Mike Drob wrote: > I am pleased to announce that Bruno has accepted an invitation to join the > Lucene PMC! > > Congratulations, and welcome aboard! > > Mike >

Re: Lucene and Solr repositories mirrored, main branch ready

2021-03-10 Thread Michael Sokolov
Big thank you, Dawid, and Jan and others for taking the bull by the horns! On Wed, Mar 10, 2021, 3:14 PM Dawid Weiss wrote: > > Just tested out the main branch of the new repo, packaged, started, > loaded data, searched from the UI. All looks great. > > Thank you, great to know! > > Dawid > > --

Re: [DISCUSS] Sunset the general@l.a.o mailing list?

2021-03-08 Thread Michael Sokolov
maybe send an email to the list informing anyone who's listening? On Mon, Mar 8, 2021 at 8:07 AM Jan Høydahl wrote: > > It's been one week, and there were 5 respondents. Three explicitly in favour > of sunsetting the list. > This was not a VOTE, but I'll interpret the response as lazy consensus,

Re: Who has access to Google Analytics for Lucene site?

2021-03-03 Thread Michael Sokolov
Before you look, should we have a betting pool on the number of downloads/day? I will arrange for a bottle of some excellent liquid to be sent to the closest guess at the number of redirects to the mirror sites, as determined by Alexandre. Also, has it been increasing over the last year? Finally, i

Re: [DISCUSS] Sunset the general@l.a.o mailing list?

2021-03-01 Thread Michael Sokolov
Yes, let's consolidate On Mon, Mar 1, 2021 at 6:45 AM Tomoko Uchida wrote: > > > I've been sending periodic PyLucene release votes there in order not to spam > > lucene-dev but I guess I can use lucene-dev instead ? > > In my view it's totally okay to send PyLucene release votes to lucene-dev. >

Re: Review request - New Solr website

2021-03-01 Thread Michael Sokolov
I clicked around a bit; didn't do a thorough copy edit or anything, but it seems as if the links are working, content looks accurate. The notices about the new TLP seem good to me, too. Thanks for forging ahead, Jan -Mike On Mon, Mar 1, 2021 at 3:56 AM Jan Høydahl wrote: > > Hi, > > I have been

Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-18 Thread Michael Sokolov
Yes, Congratulations and a big thank you Jan! On Thu, Feb 18, 2021 at 1:56 PM Anshum Gupta wrote: > > Hi everyone, > > I’d like to inform everyone that the newly formed Apache Solr PMC nominated > and elected Jan Høydahl for the position of the Solr PMC Chair and Vice > President. This decision

Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-18 Thread Michael Sokolov
ky that you accepted to take this role. > > On Wed, Feb 17, 2021 at 10:32 PM Anshum Gupta wrote: >> >> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice >> President position. >> >> This year we nominated and elected Michael Sokolov as the Ch

Re: [VOTE] Release Lucene/Solr 8.8.1 RC1

2021-02-16 Thread Michael Sokolov
Hmm, I got a failure on org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory, but it did not reproduce (tried twice). Would that possibly also be addressed by those fixes? On Tue, Feb 16, 2021 at 7:38 AM Ishan Chattopadhyaya wrote: > > > The failure seems to be because of a tim

Re: [GitHub] [lucene-solr] uschindler commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

2021-02-14 Thread Michael Sokolov
So sorry to hear that Uwe; take your time to grieve - that's a big one, I think -Mike On Sat, Feb 13, 2021 at 9:57 AM GitBox wrote: > > > uschindler commented on pull request #2306: > URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778630343 > > >> @uschindler if you want t

Re: Help needed with fixing lucene-site GitHub repo

2021-02-10 Thread Michael Sokolov
Have you considered using a merge commit for this? That won't require force pushing On Wed, Feb 10, 2021 at 2:51 PM Anshum Gupta wrote: > > Hi All, > > Seems like during the last release, we directly committed the website changes > to the production branch, bypassing the master. This is now caus

Re: [JENKINS] Lucene » Lucene-Solr-Check-master - Build # 1518 - Still Unstable!

2021-02-03 Thread Michael Sokolov
Ah, I see it fail intermittently, but can reproduce. Thanks, Dawid, I'll see if I can find why. Hopefully a bad assumption in the test, we'll see! On Wed, Feb 3, 2021 at 3:14 AM Dawid Weiss wrote: > > This reproduces for me. (Mike, help! :). > > gradlew test --tests TestKnnGraph.testMergeProduces

Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-11.0.6) - Build # 29374 - Still Unstable!

2021-01-28 Thread Michael Sokolov
Looks like the randomness in the test case was a little too generous and included a degenerate graph with max-connections = 1 that isn't sufficient for searching. I'll push a fix soon On Thu, Jan 28, 2021 at 8:35 AM Michael Sokolov wrote: > > Oooh I'll dig > > On T

Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-11.0.6) - Build # 29374 - Still Unstable!

2021-01-28 Thread Michael Sokolov
Oooh I'll dig On Thu, Jan 28, 2021 at 8:30 AM Policeman Jenkins Server wrote: > > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/29374/ > Java: 64bit/jdk-11.0.6 -XX:+UseCompressedOops -XX:+UseParallelGC > > 1 tests failed. > FAILED: org.apache.lucene.index.TestKnnGraph.testSearc

Re: [VOTE] Release Lucene/Solr 8.8.0 RC2

2021-01-28 Thread Michael Sokolov
SUCCESS! [0:58:25.213071] +1 better late than never? On Thu, Jan 28, 2021 at 8:04 AM Ishan Chattopadhyaya wrote: > > Thanks Noble! > > On Thu, 28 Jan, 2021, 4:24 pm Noble Paul, wrote: >> >> [+1] 9 (4 binding) >> >> [0] 0 >> >> [-1] 0 >> >> >> This vote has PASSED. >> >> I shall proceed wit

Re: Merging segment parts concurrently (SegmentMerger)

2021-01-27 Thread Michael Sokolov
I thought I remembered the discussion, searched for the issue in jira, but could not find. Probably Mike used his souped up search? On Wed, Jan 27, 2021, 3:07 AM Dawid Weiss wrote: > Darn... I swear sometimes, when I try hard enough, I can hear my brain > cells giving up to atrophy... Sigh. > >

Re: Merging segment parts concurrently (SegmentMerger)

2021-01-25 Thread Michael Sokolov
David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Mon, Jan 25, 2021 at 11:05 AM Michael Sokolov wrote: >> >> It makes sense to me. I don't have the full picture, but I did just >> implement merging for vect

Re: Merging segment parts concurrently (SegmentMerger)

2021-01-25 Thread Michael Sokolov
It makes sense to me. I don't have the full picture, but I did just implement merging for vector format, and that at least, could be done fully concurrent with other formats. I expect the same is true of DocValues, Terms, etc. I'm not sure about the different kinds of DocValues - they might want to

Re: EOF error in VectorValues in Lucene nightly benchmarks

2021-01-23 Thread Michael Sokolov
generated the exceptions. On Fri, Jan 22, 2021 at 4:21 PM Michael Sokolov wrote: > > Thanks Anton! Providing a stack trace is great; we can afford the > black pixels I think ;) > > I found some bugs in a separate effort which might be related, > although I saw a different excepti

Re: EOF error in VectorValues in Lucene nightly benchmarks

2021-01-22 Thread Michael Sokolov
Thanks Anton! Providing a stack trace is great; we can afford the black pixels I think ;) I found some bugs in a separate effort which might be related, although I saw a different exception so I'm not sure. I'll post a patch soon, and if you are able to re-test and see if you can reproduce, that w

Re: Faster advance on Vector Values

2021-01-17 Thread Michael Sokolov
Thanks for the suggestion! This will be a nice improvement for use cases wanting to retrieve vectors for a sparse set of documents, eg when incorporating a vector-based score as a scoring signal. Would you mind opening an issue, Anand? On Sat, Jan 16, 2021 at 9:07 AM Anand Kotriwal wrote: > > Hi

Re: 2021-01 Lucene/Solr Committer meeting

2021-01-17 Thread Michael Sokolov
There was some concern about the tasks needed for the release that are *not* code-related: figure out how to maintain code repository, make any needed changes to build system and release process; create a different web site, and I'm sure other things I'm forgetting. Personally, I hope these don't t

Re: Blog post - Profiling the Lucene nightly benchmarks

2021-01-17 Thread Michael Sokolov
Indeed! Thank you for all the helpful suggestions, especially from my point of view re: HNSW, which is indeed costly to index. I am surprised how much time is spent in SparseBitSet; perhaps a full (non-sparse) bitset is called for, although I had initially shied away from it since this indexing is

Re: 2021-01 Lucene/Solr Committer meeting

2021-01-14 Thread Michael Sokolov
The question came up in the context of a discussion about whether it could be sensible for Lucene 9.0 to be released without Solr 9.0 being released. I think we're struggling a bit to conceptualize how the next release will work: release 9.0 as we do today and delay the split until after, coordinat

Re: [JENKINS-EA] Lucene-Solr-master-Linux (64bit/jdk-16-ea+30) - Build # 29209 - Still Unstable!

2021-01-08 Thread Michael Sokolov
I ran 3000+ iterations with that seed (on JDK11) and it did not reproduce. Then another 2000+ with JDK-16 runtime, and no fails there either. On Fri, Jan 8, 2021 at 10:32 AM David Smiley wrote: > > Hm; this is spooky > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com

Re: Failing gradle precommits

2021-01-08 Thread Michael Sokolov
;> > > >> > IMHO, we should change the command line and pass JVM options to set > heap size as it is written to the settings file. > >> > > >> > Uwe > >> > > >> > Am January 8, 2021 6:13:01 PM UTC schrieb David Smiley < > dsmi..

Re: [JENKINS-EA] Lucene-Solr-jdk16panama-Linux (64bit/jdk-16-ea+30) - Build # 11 - Still Unstable!

2021-01-07 Thread Michael Sokolov
we Schindler wrote: > > Hi, > > > > I am a bit confused by this mail: “Lucene-Solr-jdk16panama-Linux” Jenkins > jobs only run „gradlew test -Dtests.directory=MMapDirectory” nothing else!? > > > > Uwe > > > > - > > Uwe Schindler > > Ach

Re: RFC: N-2 compatibility for file formats

2021-01-06 Thread Michael Sokolov
In practice what would this mean? We relax the restriction that David mentions, and we keep old codecs around in backwards-codecs for two major releases instead of one? Are there other implications? Suppose we had a Query that relied on a specific index format, which gets retired. We keep the index

Re: influence of memory access patterns on dot-product performance

2020-12-30 Thread Michael Sokolov
unsafe it is not. > e.g. something like this recent bug: > https://bugs.openjdk.java.net/browse/JDK-8257531 > > You could re-run your bench on a JDK-16 early access as well, with > that fix, and see what happens. > > On Wed, Dec 30, 2020 at 9:00 AM Michael Sokolov wrote: > >

Re: Old programmers do fade away

2020-12-30 Thread Michael Sokolov
Woah! That plan sounds like fun! I might have to join you, but not yet :) On the topic of squirrels, you must have seen this https://www.youtube.com/watch?v=hFZFjoX2cGg, but I share it again because it always deserves a second watch. Also, my own personal attempt at squirrel-proofing didn't go so w

influence of memory access patterns on dot-product performance

2020-12-30 Thread Michael Sokolov
Hi, I've been working on improving performance of vector KNN search, and found some behavior that seemed surprising to me, showing huge differences in some cases comparing on-heap memory access with the way we access data today via IndexInput. I'd love to get some other eyes on this to help me bett

9.0 release

2020-12-28 Thread Michael Sokolov
Hi everyone, as we head into a new year full of optimism, is it time to start discussing the next major release? We released 8.0 on Jun 18, 2019, over 18 months ago. Since then we've switched to a gradle-based build. We have added vector-valued fields and an HNSW neighbor search algorithm for them.

Re: Code reformatting

2020-12-24 Thread Michael Sokolov
The Google convention you cited says this, I think? >Braces are used with if, else, for, do and while statements, even when the > body is empty or contains only a single statement. On Thu, Dec 24, 2020 at 8:00 AM Dawid Weiss wrote: > > > Personally I would ban the non block conditional, but

Re: Code reformatting

2020-12-23 Thread Michael Sokolov
Personally I would ban the non block conditional, but I think it's moot in this context since spotless just does what it does and is not configurable, as I understand it. I suppose we could manually "fix" all the conditionals though? On Wed, Dec 23, 2020, 9:07 AM Erick Erickson wrote: > I took a

Re: javac reports broken HTML on multiline mailto links

2020-12-23 Thread Michael Sokolov
Ugh the mailto: breaks it? Seems like a bug to me. Maybe the javadoc parser tries to validate the content of an href attribute? On Wed, Dec 23, 2020, 5:39 AM Dawid Weiss wrote: > Hello and Merry Christmas, > > I discovered this odd javac behavior with jdk8 up to jdk15 (didn't > check the latest

Re: Code reformatting

2020-12-22 Thread Michael Sokolov
Yes, that is what I saw; line breaking choices that are different than what I would manually have chosen. I don't mean to sound negative - this is a nice improvement that gets us away from having to fuss about indentation and other formatting. Even regarding these line breaks, it is sensible to ha

Re: Code reformatting

2020-12-22 Thread Michael Sokolov
I see it there - yes, it makes some occasional widows, and sometimes fails to join up consecutive single-line comments (I think you mentioned this elsewhere) but I can live with it :) On Tue, Dec 22, 2020 at 10:07 AM Dawid Weiss wrote: > > > Looks as if javadoc is likely to be the main challenge?

Re: Code reformatting

2020-12-22 Thread Michael Sokolov
Hmm, I committed my outstanding PR with no conflicts, so I assume you didn't get the reformatting in yet; let me know if *I* can help :) -MIke On Tue, Dec 22, 2020 at 9:06 AM Michael Sokolov wrote: > > Thanks for the heads up. If you commit your changes first, I'll tackle the

Re: Code reformatting

2020-12-22 Thread Michael Sokolov
Thanks for the heads up. If you commit your changes first, I'll tackle the reformatting and let you know if I run into issues. Looks as if javadoc is likely to be the main challenge? On Tue, Dec 22, 2020, 8:56 AM Dawid Weiss wrote: > Hi Mike, Ignacio, > > Just wanted to let you know that I wante

Re: Deterministic index construction

2020-12-19 Thread Michael Sokolov
I don't know about addIndexes. Does that let you say which document goes where somehow? Wouldn't you have to select a subset of documents from each originally indexed segment? On Sat, Dec 19, 2020, 12:11 PM Michael Sokolov wrote: > I think the idea is to exert control over the di

Re: Deterministic index construction

2020-12-19 Thread Michael Sokolov
I think the idea is to exert control over the distribution of documents among the segments, in a deterministic reproducible way. On Sat, Dec 19, 2020, 11:39 AM Adrien Grand wrote: > Have you considered leveraging Lucene's built-in index sorting? It > supports concurrent indexing and is quite fas

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Michael Sokolov
I feel like there could be some considerable overlap with features provided by Luwak, which was contributed to Lucene fairly recently, and I think does the query inversion work required for this; maybe more of it already exists here? I don't know if that module handles the query rewriting, or the t

Re: Welcome Houston Putman to the PMC

2020-12-02 Thread Michael Sokolov
Welcome, Houston! On Wed, Dec 2, 2020, 2:34 PM David Smiley wrote: > Welcome Houston! > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Dec 1, 2020 at 4:20 PM Mike Drob wrote: > >> I am pleased to announce that Houston Putman has ac

Re: normalizing vectors

2020-11-26 Thread Michael Sokolov
; > are not unit-length. > > For simplicity, we could assume normalized vectors as inputs and just > document it - without any checks? Meanwhile some utility functions (e.g., > o.a.l.util.VectorUtil) for it could be helpful. > > Tomoko > > > 2020年11月26日(木) 6:23 Michae

normalizing vectors

2020-11-25 Thread Michael Sokolov
I have been working on getting benchmarks working on the GloVe public data set and spent a while chasing down a bug with VectorValues.search that turned out to be a bug with the data (sort of)! When comparing vectors using an angular (dot product) measure, one has to normalize by the vectors' lengt

Re: Possible resource leak in IndexWriter.deleteAll()/FieldNumbers.clear()

2020-11-18 Thread Michael Sokolov
n reusing on flush(). > > Also I was partly motivated by laziness. The production code I'm borrowing > for this prototype doesn't make it easy to recreate the IndexWriterConfig, > and IWC is not reusable across IndexWriter instances. > > On Wed, Nov 18, 2020 at 12:25 PM Micha

Re: Possible resource leak in IndexWriter.deleteAll()/FieldNumbers.clear()

2020-11-18 Thread Michael Sokolov
I'm curious if you tried creating a new IndexWriter for each batch? On Wed, Nov 18, 2020 at 1:18 PM Michael Froh wrote: > > I have some code that is kind of abusing IndexWriter.deleteAll(). In short, > I'm basically experimenting with using tiny (one block of joined parent/child > documents) in

Welcome Julie Tibshirani as Lucene/Solr committer

2020-11-18 Thread Michael Sokolov
I'm pleased to announce that Julie Tibshirani has accepted the PMC's invitation to become a committer. Julie, the tradition is that new committers introduce themselves with a brief bio. I think we may still be sorting out the details of your Apache account (julie@ may have been taken?), but as so

Re: Please set: git config --global pull.rebase true

2020-10-20 Thread Michael Sokolov
My experience has been "git pull" -- starts a merge -- slowly back away, "git reset --hard", and the "git pull --rebase". If this setting would help me avoid that dance, I'm all for it On Tue, Oct 20, 2020 at 6:55 AM Andrzej Białecki wrote: > > > On 20 Oct 2020, at 09:33, Dawid Weiss wrote: > >

Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-14.0.1) - Build # 28367 - Unstable!

2020-10-19 Thread Michael Sokolov
Hmm looks like a randomized failure in a new test case I just added. I'll take a look On Sun, Oct 18, 2020 at 8:16 PM Policeman Jenkins Server wrote: > > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/28367/ > Java: 64bit/jdk-14.0.1 -XX:-UseCompressedOops -XX:+UseG1GC > > 1 tests

Re: recent test failures in solr.core

2020-10-18 Thread Michael Sokolov
nixFileSystemProvider.java:369) > at java.base/java.nio.file.Files.getFileStore(Files.java:1492) > at org.apache.solr.handler.IndexFetcher.getUsableSpace(IndexFetcher.java:1046) > > > > > On Sun, Oct 18, 2020, 11:06 AM Michael Sokolov wrote: >> >> I ran a full test

recent test failures in solr.core

2020-10-18 Thread Michael Sokolov
I ran a full test suite this morning and got 51 test failures, all in solr.core. It looks like the same has been happening in Jenkins for the last couple of days https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Check-master/. Based on timing, the only commit that seemed likely was LUCENE-95

Re: [VOTE] Release Lucene/Solr 8.6.3 RC1

2020-10-07 Thread Michael Sokolov
+1 (binding) SUCCESS! [0:56:16.661736] On Tue, Oct 6, 2020 at 2:18 AM Tomás Fernández Löbbe wrote: > > +1 (binding) > > SUCCESS! [1:05:14.591357] > > On Mon, Oct 5, 2020 at 1:13 PM Anshum Gupta wrote: >> >> +1 (binding) >> >> SUCCESS! [1:00:37.423566] >> >> Tried basic indexing and search and r

Re: CWiki and IDE instructions

2020-10-04 Thread Michael Sokolov
I'm curious if "gradlew idea" is needed. Is seem to remember idea just importing a gradlew project with no additional steps? Although it's possible I had run the setup commander and forgotten. On Sun, Oct 4, 2020, 3:39 PM Erick Erickson wrote: > The "How to contribute" page here: > > https://cwi

Re: Highlight with Proximity search throws an exception

2020-10-01 Thread Michael Sokolov
I traced this to this block in FuzzyTermsEnum: if (ed == 0) { // exact match boostAtt.setBoost(1.0F); } else { final int codePointCount = UnicodeUtil.codePointCount(term); int minTermLength = Math.min(codePointCount, termLength); float similarity = 1.0f - (float) e

Re: [GitHub] [lucene-solr] uschindler commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-27 Thread Michael Sokolov
Uwe, I don't know if this was a phone transcription problem, or a failed German English pun, but your wife is not an appendix! On Sun, Sep 27, 2020, 10:49 AM GitBox wrote: > > uschindler commented on pull request #1836: > URL: > https://github.com/apache/lucene-solr/pull/1836#issuecomment-69964

Re: Notification of analysis on publicly available project data

2020-09-16 Thread Michael Sokolov
Thanks Christian. I guess "... the findings we obtained from the Community Survey that was run this year" was kind of tantalizing :) On Wed, Sep 16, 2020 at 5:55 AM Christian Grobmeier wrote: > > On Tue, Sep 15, 2020, at 14:17, Michael Sokolov wrote: > > I am curious abou

Re: [JENKINS] Lucene » Lucene-Solr-NightlyTests-master - Build # 36 - Still unstable!

2020-09-16 Thread Michael Sokolov
I guess this IndexWriter has issues with commitment? On Tue, Sep 15, 2020, 12:15 PM Dawid Weiss wrote: > Is this something that should be expected in this test? > > On Tue, Sep 15, 2020 at 7:10 AM Apache Jenkins Server > wrote: > > > > Build: > https://ci-builds.apache.org/job/Lucene/job/Lucene

Re: Notification of analysis on publicly available project data

2020-09-15 Thread Michael Sokolov
I am curious about the Community Survey -- is there a written document presenting the findings (slides maybe?) or just the video? On Thu, Sep 10, 2020 at 12:49 PM Griselda Cuevas wrote: > > Dear PMC, > > > I’m contacting you because your project has been selected by the ASF D&I > committee which

Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-10 Thread Michael Sokolov
A slightly different but related topic is how to manage lots of fields I agree that sub-fields are a pain and that mashing everything together in an all-field is a mess, but for best performance with a large number of fields/sub-fields, it is the only workable option I can see? Expanding a query o

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-04 Thread Michael Sokolov
A1, D, A2 (binding) On Fri, Sep 4, 2020 at 12:46 AM David Smiley wrote: > > (binding) > vote: D, A1 > > > (thanks Ryan for your thorough vote instructions & preparation) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.or

Re: Approach towards solving split package issues?

2020-09-01 Thread Michael Sokolov
I'm in favor - there may be some difficult choices though. As I recall one issue was around where to put analysis packages? I forget the details, but there was some pretty strong feeling that you should have a functioning system with core only. However some basic analysis tools are required for tha

Re: [VOTE] Lucene logo contest, here we go again

2020-09-01 Thread Michael Sokolov
A1, binding On Mon, Aug 31, 2020 at 8:26 PM Ryan Ernst wrote: > > Dear Lucene and Solr developers! > > In February a contest was started to design a new logo for Lucene > [jira-issue]. The initial attempt [first-vote] to call a vote resulted in > some confusion on the rules, as well the request

Re: [VOTE] Release Lucene/Solr 8.6.2 RC1

2020-08-27 Thread Michael Sokolov
SUCCESS! [0:56:28.589654] +1 On Wed, Aug 26, 2020 at 12:41 PM Nhat Nguyen wrote: > > +1 > > SUCCESS! [0:52:44.607871] > > On Wed, Aug 26, 2020 at 12:12 PM Tomoko Uchida > wrote: >> >> +1 (non-binding) >> SUCCESS! [0:51:55.207272] >> >> >> 2020年8月26日(水) 22:42 Ignacio Vera : >>> >>> Please vote

Re: Welcome Namgyu Kim to the PMC

2020-08-15 Thread Michael Sokolov
Welcome, Namgyu! On Thu, Aug 6, 2020 at 9:40 PM Yonik Seeley wrote: > > Congrats Namgyu! > > -Yonik > > > On Sun, Aug 2, 2020 at 7:19 PM Ishan Chattopadhyaya > wrote: >> >> I am pleased to announce that Namgyu Kim has accepted the PMC's invitation >> to join. >> >> Congratulations and welcome

Re: Welcome Munendra SN to the PMC

2020-08-15 Thread Michael Sokolov
Welcome, Munendra! On Thu, Aug 6, 2020 at 9:38 PM Yonik Seeley wrote: > Congrats Munendra! > > -Yonik > > > On Sun, Aug 2, 2020 at 7:20 PM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > >> I am pleased to announce that Munendra SN has accepted the PMC's >> invitation to join. >> >>

Re: Welcome Gus Heck to the PMC

2020-08-15 Thread Michael Sokolov
Welcome, Gus! On Thu, Aug 6, 2020 at 9:39 PM Yonik Seeley wrote: > > Congrats Gus! > > -Yonik > > On Sun, Aug 2, 2020 at 7:21 PM Ishan Chattopadhyaya > wrote: >> >> I am pleased to announce that Gus Heck has accepted the PMC's invitation to >> join. >> >> Congratulations and welcome, Gus! ---

Re: Welcome Mike Drob to the PMC

2020-08-15 Thread Michael Sokolov
Late add from me too, Mike: Welcome! On Fri, Jul 31, 2020 at 8:16 AM Noble Paul wrote: > > Welcome Mike! > > On Thu, Jul 30, 2020 at 12:33 AM Erik Hatcher wrote: > > > > Oh yeah! Welcome, Mike! > > > > > On Jul 24, 2020, at 3:55 PM, Anshum Gupta wrote: > > > > > > I am pleased to announce tha

Re: Welcome Michael Sokolov to the PMC

2020-08-15 Thread Michael Sokolov
apache.org At: 07/03/20 13:15:05 > > To: dev@lucene.apache.org > > Subject: Re: Welcome Michael Sokolov to the PMC > > > > Thanks Adrien, and to the whole PMC > > > > Mike > > > > On Fri, Jul 3, 2020, 7:57 AM Adrien Grand wrote: > >> &g

Re: [VOTE] Solr to become a top-level Apache project (TLP)

2020-08-15 Thread Michael Sokolov
> PMC Christian Moen (c...@atilika.com) >> -- Gora Mohanty (g...@mimirtech.com) >> PMC Robert Muir (rcm...@gmail.com) >> PMC Nhat Nguyen (nhat.ngu...@elastic.co.invalid) >> PMC Kevin Risden (kris...@apache.org) >> PMC Steven A Rowe (sar...@gmail.com) >> PMC

Re: [jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-07-18 Thread Michael Sokolov
Thanks, Alex! That's super helpful. Did you by any chance save any of your benchmarking results? I'd be curious to see them. I'm not super current here, but it looks like as of JDK11 VarHandles might be the supported way to do this sort of thing? I'm also curious about whether you have any experien

Re: [VOTE] Release Lucene/Solr 8.6.0 RC1

2020-07-09 Thread Michael Sokolov
+1 SUCCESS! [0:59:20.777306] (tested on Graviton ARM processor) On Thu, Jul 9, 2020 at 1:10 PM Anshum Gupta wrote: > > +1 > > SUCCESS! [1:15:03.975368] > > On Wed, Jul 8, 2020 at 1:56 AM Bruno Roustant > wrote: >> >> Please vote for release candidate 1 for Lucene/Solr 8.6.0 >> >> The artifact

Re: Welcome Michael Sokolov to the PMC

2020-07-03 Thread Michael Sokolov
Thanks Adrien, and to the whole PMC Mike On Fri, Jul 3, 2020, 7:57 AM Adrien Grand wrote: > I am pleased to announce that Michael Sokolov has accepted the PMC's > invitation to join. > > Welcome Michael! > > -- > Adrien >

Re: [GitHub] [lucene-solr] atris commented on pull request #1636: Remove Compound file formation

2020-07-02 Thread Michael Sokolov
Thanks, Jan On Wed, Jul 1, 2020 at 5:25 PM Jan Høydahl wrote: > > And he does not reply to comments either. I closed all his PRs. > > Jan Høydahl > > 1. jul. 2020 kl. 13:56 skrev Michael Sokolov : > >  > Seems like trolling > > On Wed, Jul 1, 2020, 6:51

Re: [GitHub] [lucene-solr] atris commented on pull request #1636: Remove Compound file formation

2020-07-01 Thread Michael Sokolov
Seems like trolling On Wed, Jul 1, 2020, 6:51 AM GitBox wrote: > > atris commented on pull request #1636: > URL: > https://github.com/apache/lucene-solr/pull/1636#issuecomment-652347025 > > >I am pretty sure I am missing the point of this PR? > > > ---

Re: 8.6 release

2020-06-26 Thread Michael Sokolov
I think Simon is working on LUCENE-8962 and seems to be close, but we have been here before; it's quite possible we need more time to fully bake this change. I'll let him weigh in on whether he thinks targeting 8.6 is reasonable, but we might need to slip until 8.7 for that one On Fri, Jun 26, 202

Re: The moment you've all been waiting for PLEASE READ, Gradle builds will start failing on warnings on 9x!

2020-06-23 Thread Michael Sokolov
+1 thanks huge step forward for code hygiene Maybe someday we will agree on a minimal check style enforcement too 😮 ... Sorry, too soon? On Tue, Jun 23, 2020, 6:21 PM Anshum Gupta wrote: > Thanks, Erick! This is awesome :) > > On Tue, Jun 23, 2020 at 2:18 PM Erick Erickson > wrote: > >> As of

Re: Welcome Ilan Ginzburg as Lucene/Solr committer

2020-06-22 Thread Michael Sokolov
Welcome Ilan! Nice beat on lonely boy there. On Mon, Jun 22, 2020, 9:45 AM Ilan Ginzburg wrote: > Thank you, merci, תודה for the trust and the welcome, Noble and everybody! > > I’m based in France near Grenoble, a flat city high tech hub surrounded by > mountains. > > For the past 7 years I’ve

<    1   2   3   4   5   6   >