Hi, Thanks for the reply and the suggestion on custom executor. I'll take a look at it.
I was profiling my application (windows 10) which is dealing with millions of rows (roughly 10M) of data and I found some places where only a few cpus were engaged and the rest of them were sitting idle. On investigating further, I found out the time spent was in arrow compute calls (take, sort_indices etc). In fact out of 12 cores, only one core was doing work. The CPU thread count was showing 24 (12 * 2). So I'm trying to find out if there's any way I can improve the performance as my application is already using TBB for some tasks. Any pointers in this direction would be greatly appreciated. Thanks, Surya On Sat, Apr 22, 2023 at 2:27 AM Weston Pace <[email protected]> wrote: > No, there's no build-time configuration settings to enable TBB > specifically. > > You can, at runtime, specify a custom executor to use for most > operations. We use one thread pool for CPU tasks and one for I/O tasks. > You could replace either or both with a TBB-based executor. > > For example, the method for creating a CSV file is defined as: > > ``` > static Future<std::shared_ptr<StreamingReader>> MakeAsync( > io::IOContext io_context, std::shared_ptr<io::InputStream> input, > arrow::internal::Executor* cpu_executor, const ReadOptions&, const > ParseOptions&, > const ConvertOptions&); > ``` > > The `cpu_executor` property specifies which thread pool to use for CPU > tasks. The I/O executor is a part of the `io_context`. > > The executor interface is pretty straightforward. Hiding the utility > functions it is... > > ``` > class ARROW_EXPORT Executor { > public: > virtual int GetCapacity() = 0; > protected: > virtual Status SpawnReal(TaskHints hints, FnOnce<void()> task, > StopToken, StopCallback&&) = 0; > }; > ``` > > It shouldn't be too much work to create a custom implementation based on > TBB. Out of curiosity, what is the motivation for using TBB? > > -Weston > > On Fri, Apr 21, 2023 at 11:04 AM Surya Kiran Gullapalli < > [email protected]> wrote: > >> Hello, >> I'm curious to know if c++ sdk of arrow compute functions can use tbb >> parallelization underneath ? >> The documentation mentions that arrow uses a threadpool for >> parallelization. Does compute functions also use threadpool and parallelize >> computation ? >> >> Looking at the .so file created I do not see tbb library as a dependency >> for arrow library. >> >> Is there a configuration variable during build which can activate this ? >> >> Thanks, >> Surya >> >
