Hello! I'm wondering if there's any documentation that describes the concurrency/parallelism architecture for the compute API. I'd also be interested if there are recommended approaches for seeing performance of threads used by Arrow--should I try to check a processor ID and infer performance or are there particular tools that the community uses?
Specifically, I am wondering if the concurrency is going to be different when using a ChunkedArray as an input compared to an Array or for ChunkedArrays with various chunk sizes (1 chunk vs tens or hundreds). I see a large difference between the total time to apply compute functions to a single table (concatenated from many small tables) compared to applying compute functions to each sub-table in the composition. I'm trying to figure out where that difference may come from and I'm wondering if it's related to parallelism within Arrow. I tried using the github issues and JIRA issues (e.g. [1]) as a way to sleuth the info, but I couldn't find anything. The pyarrow API seems to have functions I could try and use to figure it out (cpu_count and set_cpu_count), but that seems like a vague road. [1]: https://issues.apache.org/jira/browse/ARROW-12726 Thank you! Aldrin Montana Computer Science PhD Student UC Santa Cruz
