[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-14 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-841074895 @jpountz It's great to work with you on this optimization :smile: Thanks for taking so much time to help me. -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-13 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-840385022 Comment addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-13 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-840373136 > This looks good to me. Can we better check that the sort is actually stable in the tests? E.g. maybe we could verify that the arrays are not only equal after sorting with

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-12 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-840256710 @jpountz sorry, forget to push :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-12 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-839858327 Comments addressed. 1. Reuse offset array with size of `HISTOGRAM_SIZE` in reorder. 2. Update CHANGES document. 3. Remove benchmark test case. 4. Add logic to check doc

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-11 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-839400284 Thanks for taking time working on my branch. I merged your change into this PR, the code looks much better . I was wondering which test case do I neglect besides

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-27 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-827678981 I spent some time trying to use the real case benchmark. The speedup of `IndexWriter` is what we expected, faster than main branch, total time elapsed (include adding doc,

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-23 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-825779387 @jpountz are there any testcases suitable to verify the end to end performance improvement, like through IndexWriter? maybe I could give it try. -- This is an automated message

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-23 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-825776633 I make `StableMSBRadixSorter` as default sorter and use `InPlaceMergeSorter` as fallback sorter. Please check the latest commit. (I did not squash commits, and save every commit

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-22 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-824891023 I use `TimSort` instead of `InPlaceMergeSorter`, expect it to be faster, but it turns out to be slower. @jpountz would you check my latest commit to see if I implement Tim Sort

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-824172567 > For instance I'd expect users who index integers (4 bytes) between 0 and 2^24 to notice speedups that are closer to the one that you computed for bytesPerDim=3 than for

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-824164245 > +1 to always use the stable version of the algorithm. This would only use transient memory and in reasonable amounts, so I'm not concerned with the memory usage. Per

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-823960282 > For instance I'd expect users who index integers (4 bytes) between 0 and 2^24 to notice speedups that are closer to the one that you computed for bytesPerDim=3 than for

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-21 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-823949164 @jpountz Per your advice, I have updated the code. In terms of performance, I refined `TestBKDDisableSortDocId`, to make it re-runnable as a benchmark. I have made the

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-20 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-823172724 @jpountz Good advice! Before that I am still struggling where to propagate this config up to the index builder layer. I will give it a try, the first thing comes up my mind is