Hi Aldrin, It would be helpful to know what sort of compute operators you are using.
On Thu, Mar 10, 2022, 19:12 Aldrin <[email protected]> wrote: > I will work on a reproducible example. > > As a sneak peek, what I was seeing was the following (pasted in gmail, see > [1] for markdown version): > > Table ID Columns Rows Rows (slice) Slice count Time (ms) > total; by slice Time (ms) > total; by table > E-GEOD-100618 415 20631 299 69 644.065 410 > E-GEOD-76312 2152 27120 48 565 25607.927 2953 > E-GEOD-106540 2145 24480 45 544 25193.507 3088 > > Where "by slice" is total time, summed from running the function on each > slice and "by table" is the time of just running the function on the table > concatenated from each slice. > > The difference was large (but not *so* large) for ~70 iterations (1.5x); > but for ~550 iterations (and 6x fewer rows, 5x more columns) the difference > became significant (~10x). > > I will follow up here when I have a more reproducible example. I also > started doing this before tensors were available, so I'll try to see how > that changes performance. > > > [1]: https://gist.github.com/drin/4b2e2ea97a07c9ad54647bcdc462611a > > Aldrin Montana > Computer Science PhD Student > UC Santa Cruz > > > On Thu, Mar 10, 2022 at 2:32 PM Weston Pace <[email protected]> wrote: > >> As far as I know (and my knowledge here may be dated) the compute >> kernels themselves do not do any concurrency. There are certainly >> compute kernels that could benefit from concurrency in this manner >> (many kernels naively so) and I think things are setup so that, if we >> decide to tackle this feature, we could do so in a systematic way >> (instead of writing something for each kernel). >> >> I believe that kernels, if given a unique kernel context, should be >> thread safe. >> >> The streaming compute engine, on the other hand, does support >> concurrency. It is mostly driven by the scanner at the moment (e.g. >> each batch we fetch from the scanner gets a fresh thread task for >> running through the execution plan) but there is some intra-node >> concurrency in the hash join and (I think) the hash aggregate nodes. >> This has been sufficient to saturate cores on the benchmarks we run. >> I know there is ongoing interest in understanding and improving our >> concurrency here. >> >> The scanner supports concurrency. It will typically fetch multiple >> files at once and, for each file, it will fetch multiple batches at >> once (assuming the file has more than one batch). >> >> > I see a large difference between the total time to apply compute >> functions to a single table (concatenated from many small tables) compared >> to applying compute functions to each sub-table in the composition. >> >> Which one is better? Can you share a reproducible example? >> >> On Thu, Mar 10, 2022 at 12:01 PM Aldrin <[email protected]> wrote: >> > >> > Hello! >> > >> > I'm wondering if there's any documentation that describes the >> concurrency/parallelism architecture for the compute API. I'd also be >> interested if there are recommended approaches for seeing performance of >> threads used by Arrow--should I try to check a processor ID and infer >> performance or are there particular tools that the community uses? >> > >> > Specifically, I am wondering if the concurrency is going to be >> different when using a ChunkedArray as an input compared to an Array or for >> ChunkedArrays with various chunk sizes (1 chunk vs tens or hundreds). I see >> a large difference between the total time to apply compute functions to a >> single table (concatenated from many small tables) compared to applying >> compute functions to each sub-table in the composition. I'm trying to >> figure out where that difference may come from and I'm wondering if it's >> related to parallelism within Arrow. >> > >> > I tried using the github issues and JIRA issues (e.g. [1]) as a way to >> sleuth the info, but I couldn't find anything. The pyarrow API seems to >> have functions I could try and use to figure it out (cpu_count and >> set_cpu_count), but that seems like a vague road. >> > >> > [1]: https://issues.apache.org/jira/browse/ARROW-12726 >> > >> > >> > Thank you! >> > >> > Aldrin Montana >> > Computer Science PhD Student >> > UC Santa Cruz >> >
