Quanlong Huang created ORC-1150: ----------------------------------- Summary: Improve RowReaderImpl::computeBatchSize() Key: ORC-1150 URL: https://issues.apache.org/jira/browse/ORC-1150 Project: ORC Issue Type: Improvement Components: C++ Reporter: Quanlong Huang Attachments: RowReaderImpl_next_annotation.png, image-2022-04-12-17-11-28-091.png
RowReaderImpl::computeBatchSize() can be the hot path when sargs exists. The following perf report shows that orc::RowReaderImpl::next() itself takes 1/4 of the scan time. It's measured using orc-scan with sargs "inv_quantity_on_hand between -1 and 5000" scanning 4 orc files of TPCDS-inventory table (768.23MB in total size). !image-2022-04-12-17-11-28-091.png|width=713,height=251! Looking into the disassembly of it, the time is taken by a loop: !RowReaderImpl_next_annotation.png|width=556,height=465! The annotation indicates it's the inlined RowReaderImpl::computeBatchSize() method. Disassembly codes: {code:java} │ d0:┌─→mov %r14,%r15 0.36 │ │ mov %esi,%ecx 0.13 │ │ shr $0x6,%rdx 22.81 │ │ shl %cl,%r15 24.24 │ │ test %r15,(%r9,%rdx,8) │ │↓ je fb │ e2:│ lea 0x1(%rsi),%edx 0.22 │ │ mov %r10,%rax 0.18 │ │ imul %rdx,%rax 25.31 │ │ mov %rdx,%rsi │ │ cmp %rdi,%rax 0.54 │ │ cmova %rdi,%rax 0.04 │ ├──cmp %r11,%rdx 23.79 │ └──jb d0 0.31 │ fb: sub %r8,%rax{code} The corresponding loop: {code:cpp} endRowInStripe = currentRowInStripe; uint32_t rg = static_cast<uint32_t>(currentRowInStripe / rowIndexStride); for (; rg < includedRowGroups.size(); ++rg) { if (!includedRowGroups[rg]) { break; } else { endRowInStripe = std::min(rowsInCurrentStripe, (rg + 1) * rowIndexStride); } } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)