Lucene Cyborg

2024-07-22 Thread Adrien Grand
Hello everyone, I recently stumbled on this paper after Ishan shared it on LinkedIn: https://github.com/0ctopus13prime/lucene-cyborg-paper/blob/main/LuceneCyborg_Hybrid_Search_Engine_Written_in_Java_and_C%2B%2B.pdf . This is quite impressive: this person did a high-fidelity rewrite of Lucene in C

Re: Lucene Cyborg

2024-07-22 Thread Michael McCandless
Thanks for sharing Adrien, this is really cool! It's neat that the relative gains of Java vs C are quite a bit less than they were ~11 years ago when I played with a much smaller subset of queries. Also, COUNT on disjunction queries with Lucene Cyborg got slower. What a feat, to port so much of

Re: Lucene Cyborg

2024-07-22 Thread Uwe Schindler
Hi, First of all: We have to be a bit careful with the results. E.g., the SegmentReader::get_live_docs() returns null, so the code does not use deleted documents. Of course this is not relevant if the original Lucene Index is also without deletions, but you need to keep an eye on it. What's