GitHub user KOKOSde added a comment to the discussion: Arrow-Datasets C++: 
Scanner::Scan visitor is executing serially despite use_threads=true

`use_threads=true` does not mean your visitor body will run in parallel the way 
a custom thread pool would. Arrow can parallelize file reading and decode, but 
if the visitor does blocking work, the scan pipeline will still look serial 
because each callback has to finish before more work can flow through. The 
practical fix is to keep the visitor tiny, push each `RecordBatch` into your 
own queue, and do the heavy processing in a separate pool. If you want direct 
control, `ScanBatchesAsync` is a better fit than doing real work inside the 
visitor.

GitHub link: 
https://github.com/apache/arrow/discussions/49568#discussioncomment-16287480

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to