[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

2022-07-07 Thread GitBox
mayya-sharipova commented on PR #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1178089438 Closing this PR in favour of [alternative](https://github.com/apache/lucene/pull/992) -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

2022-05-25 Thread GitBox
mayya-sharipova commented on PR #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1137833962 @LuXugang Thanks for looking into this. I was thinking to close this issue and this PR. As @jtibshirani noted the problem with this approach is that flush or a segment creation may

[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

2022-03-04 Thread GitBox
mayya-sharipova commented on pull request #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1059210295 @rmuir Thanks a lot for your review and explanation of of IndexWriter behavior. > If IndexWriter shouldn't buffer vectors, then can it simply stream vectors to the

[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

2022-03-04 Thread GitBox
mayya-sharipova commented on pull request #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1059202282 @jtibshirani Thanks a lot for your review. > If a user had 100 vector fields, then now we might have 100+ files being written concurrently, multiplied by the number

[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

2022-03-04 Thread GitBox
mayya-sharipova commented on pull request #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1059202174 @msokolov Thanks a lot for your review. >I'm not sure what unset means? I guess it goes to the default 16MB, but I assume you must be doing the same in the other te

[GitHub] [lucene] mayya-sharipova commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

2022-03-03 Thread GitBox
mayya-sharipova commented on pull request #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1058148842 I've benchmarked the results with ann-benchmarks on glove-100-angular (M:16, efConstruction:100) - baseline: main branch where we unset RAMBufferSizeMB, which defau