Re: Documentation of concurrency of the compute API?

Niranda Perera Thu, 10 Mar 2022 16:30:51 -0800

Hi Aldrin,

It would be helpful to know what sort of compute operators you are using.


On Thu, Mar 10, 2022, 19:12 Aldrin <[email protected]> wrote:

> I will work on a reproducible example.
>
> As a sneak peek, what I was seeing was the following (pasted in gmail, see
> [1] for markdown version):
>
> Table ID Columns Rows Rows (slice) Slice count Time (ms)
> total; by slice Time (ms)
> total; by table
> E-GEOD-100618 415 20631 299 69 644.065 410
> E-GEOD-76312 2152 27120 48 565 25607.927 2953
> E-GEOD-106540 2145 24480 45 544 25193.507 3088
>
> Where "by slice" is total time, summed from running the function on each
> slice and "by table" is the time of just running the function on the table
> concatenated from each slice.
>
> The difference was large (but not *so* large) for ~70 iterations (1.5x);
> but for ~550 iterations (and 6x fewer rows, 5x more columns) the difference
> became significant (~10x).
>
> I will follow up here when I have a more reproducible example. I also
> started doing this before tensors were available, so I'll try to see how
> that changes performance.
>
>
> [1]: https://gist.github.com/drin/4b2e2ea97a07c9ad54647bcdc462611a
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Thu, Mar 10, 2022 at 2:32 PM Weston Pace <[email protected]> wrote:
>
>> As far as I know (and my knowledge here may be dated) the compute
>> kernels themselves do not do any concurrency.  There are certainly
>> compute kernels that could benefit from concurrency in this manner
>> (many kernels naively so) and I think things are setup so that, if we
>> decide to tackle this feature, we could do so in a systematic way
>> (instead of writing something for each kernel).
>>
>> I believe that kernels, if given a unique kernel context, should be
>> thread safe.
>>
>> The streaming compute engine, on the other hand, does support
>> concurrency.  It is mostly driven by the scanner at the moment (e.g.
>> each batch we fetch from the scanner gets a fresh thread task for
>> running through the execution plan) but there is some intra-node
>> concurrency in the hash join and (I think) the hash aggregate nodes.
>> This has been sufficient to saturate cores on the benchmarks we run.
>> I know there is ongoing interest in understanding and improving our
>> concurrency here.
>>
>> The scanner supports concurrency.  It will typically fetch multiple
>> files at once and, for each file, it will fetch multiple batches at
>> once (assuming the file has more than one batch).
>>
>> > I see a large difference between the total time to apply compute
>> functions to a single table (concatenated from many small tables) compared
>> to applying compute functions to each sub-table in the composition.
>>
>> Which one is better?  Can you share a reproducible example?
>>
>> On Thu, Mar 10, 2022 at 12:01 PM Aldrin <[email protected]> wrote:
>> >
>> > Hello!
>> >
>> > I'm wondering if there's any documentation that describes the
>> concurrency/parallelism architecture for the compute API. I'd also be
>> interested if there are recommended approaches for seeing performance of
>> threads used by Arrow--should I try to check a processor ID and infer
>> performance or are there particular tools that the community uses?
>> >
>> > Specifically, I am wondering if the concurrency is going to be
>> different when using a ChunkedArray as an input compared to an Array or for
>> ChunkedArrays with various chunk sizes (1 chunk vs tens or hundreds). I see
>> a large difference between the total time to apply compute functions to a
>> single table (concatenated from many small tables) compared to applying
>> compute functions to each sub-table in the composition. I'm trying to
>> figure out where that difference may come from and I'm wondering if it's
>> related to parallelism within Arrow.
>> >
>> > I tried using the github issues and JIRA issues (e.g.  [1]) as a way to
>> sleuth the info, but I couldn't find anything. The pyarrow API seems to
>> have functions I could try and use to figure it out (cpu_count and
>> set_cpu_count), but that seems like a vague road.
>> >
>> > [1]: https://issues.apache.org/jira/browse/ARROW-12726
>> >
>> >
>> > Thank you!
>> >
>> > Aldrin Montana
>> > Computer Science PhD Student
>> > UC Santa Cruz
>>
>

Re: Documentation of concurrency of the compute API?

Reply via email to