Quanlong Huang created ORC-1150:
-----------------------------------

             Summary: Improve RowReaderImpl::computeBatchSize()
                 Key: ORC-1150
                 URL: https://issues.apache.org/jira/browse/ORC-1150
             Project: ORC
          Issue Type: Improvement
          Components: C++
            Reporter: Quanlong Huang
         Attachments: RowReaderImpl_next_annotation.png, 
image-2022-04-12-17-11-28-091.png

RowReaderImpl::computeBatchSize() can be the hot path when sargs exists. The 
following perf report shows that orc::RowReaderImpl::next() itself takes 1/4 of 
the scan time. It's measured using orc-scan with sargs "inv_quantity_on_hand 
between -1 and 5000" scanning 4 orc files of TPCDS-inventory table (768.23MB in 
total size).  !image-2022-04-12-17-11-28-091.png|width=713,height=251!

Looking into the disassembly of it, the time is taken by a loop:
!RowReaderImpl_next_annotation.png|width=556,height=465!
The annotation indicates it's the inlined RowReaderImpl::computeBatchSize() 
method. Disassembly codes:
{code:java}
       │ d0:┌─→mov    %r14,%r15
  0.36 │    │  mov    %esi,%ecx
  0.13 │    │  shr    $0x6,%rdx
 22.81 │    │  shl    %cl,%r15
 24.24 │    │  test   %r15,(%r9,%rdx,8)
       │    │↓ je     fb  
       │ e2:│  lea    0x1(%rsi),%edx
  0.22 │    │  mov    %r10,%rax
  0.18 │    │  imul   %rdx,%rax
 25.31 │    │  mov    %rdx,%rsi
       │    │  cmp    %rdi,%rax
  0.54 │    │  cmova  %rdi,%rax
  0.04 │    ├──cmp    %r11,%rdx
 23.79 │    └──jb     d0  
  0.31 │ fb:   sub    %r8,%rax{code}
 The corresponding loop:
{code:cpp}
endRowInStripe = currentRowInStripe;
uint32_t rg = static_cast<uint32_t>(currentRowInStripe / rowIndexStride);
for (; rg < includedRowGroups.size(); ++rg) {
  if (!includedRowGroups[rg]) {
    break;
  } else {
    endRowInStripe = std::min(rowsInCurrentStripe, (rg + 1) * rowIndexStride);
  }
} {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to