Renaud Delbru created LUCENE-10449: -------------------------------------- Summary: Unnecessary ByteArrayDataInput introduced with compression on binary doc values introduced Key: LUCENE-10449 URL: https://issues.apache.org/jira/browse/LUCENE-10449 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 9.0 Reporter: Renaud Delbru Attachments: lucene-8.11-no-compression.png, lucene-9.png
LUCENE-9211 introduced a compression mechanism for binary doc values, which was then removed at a later stage in LUCENE-9843 as it was impacting performance on some workload. However, LUCENE-9843 didn't revert the code as it was prior to that. Instead of reading the block directly from the `IndexInput` as in [1], the `decompressBlock()` call [2] is kept which is decompressing a non-compress block (from our understanding). The `decompressBlock` method deleguates to `LZ4.decompress` and it looks like this is adding a significant overhead (e.g., `readByte`). This has quite an impact on our workloads which heavily uses doc values. It may lead to perf regression from 2x up to 5x. See samples below. {code:java} ❯ times_tasks Elasticsearch 7.10.2 (Lucene 8.7) - no binary compression name type time_min time_max time_p50 time_p90 7.10.2-22.6-SNAPSHOT.json total 42 90 45 66 7.10.2-22.6-SNAPSHOT.json SearchJoinRequest1 14 32 15 18 7.10.2-22.6-SNAPSHOT.json SearchTaskBroadcastRequest2 23 53 27 43 ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - with binary compression name type time_min time_max time_p50 time_p90 7.17.0-27.1-SNAPSHOT.json total 253 327 285 310 7.17.0-27.1-SNAPSHOT.json SearchJoinRequest1 121 154 142 152 7.17.0-27.1-SNAPSHOT.json SearchTaskBroadcastRequest2 122 173 140 152 ❯ times_tasks Elasticsearch 7.17.1 (Lucene 8.11) - lucene_default codec is used to bypass the binary compression name type time_min time_max time_p50 time_p90 7.17.0-27.1-SNAPSHOT.json.2 total 48 96 63 75 7.17.0-27.1-SNAPSHOT.json.2 SearchJoinRequest1 19 44 25 31 7.17.0-27.1-SNAPSHOT.json.2 SearchTaskBroadcastRequest2 23 42 29 37 ❯ times_tasks Elasticsearch 8.0 (Lucene 9.0) - no binary compression name type time_min time_max time_p50 time_p90 8.0.0-28.0-SNAPSHOT.json total 260 327 287 313 8.0.0-28.0-SNAPSHOT.json SearchJoinRequest1 122 168 148 158 8.0.0-28.0-SNAPSHOT.json SearchTaskBroadcastRequest2 123 165 139 155{code} We can clearly see that in Lucene 9.0, even after the removal of the binary doc values compression, the performance didn't improve. Profiling the execution indicates that the bottleneck is the `LZ4.decompress`. We have attached two screenshots of a flamegraph. The CPU time of the `TermsDict.next` method with Lucene 8.11 with no compression is around 2 seconds, while the CPU time of the same method in Lucene 9.0 is 12 seconds. This was measured on a small benchmark reading a fixed number of times a binary doc values field. The documents were created with a single binary value that represents a UUID. [1] [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.0/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java#L1159] [2] [https://github.com/apache/lucene/commit/a7a02519f0a5652110a186f4909347ac3349092d#diff-ab443662a6310fda675a4bd6d01fabf3a38c4c825ec2acef8f9a34af79f0b252R1022] -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org