GitHub user KOKOSde added a comment to the discussion: Arrow-Datasets C++: Scanner::Scan visitor is executing serially despite use_threads=true
`use_threads=true` does not mean your visitor body will run in parallel the way a custom thread pool would. Arrow can parallelize file reading and decode, but if the visitor does blocking work, the scan pipeline will still look serial because each callback has to finish before more work can flow through. The practical fix is to keep the visitor tiny, push each `RecordBatch` into your own queue, and do the heavy processing in a separate pool. If you want direct control, `ScanBatchesAsync` is a better fit than doing real work inside the visitor. GitHub link: https://github.com/apache/arrow/discussions/49568#discussioncomment-16287480 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
