Re: Using arrow/compute/kernels/*internal.h headers

2020-11-18 Thread Wes McKinney
On #2, I think this discussion might be overly speculative until a full-fledged multithreaded hash aggregation is implemented in the Apache Arrow C++ library. There are other analytic database systems in the wild which might provide a blueprint for the way that we should approach this, and I don't

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-18 Thread Benjamin Kietzman
1: Excellent! 2: The open JIRA for grouped aggregation is https://issues.apache.org/jira/browse/ARROW-4124 (though it's out of date since it predates the addition of ScalarAggregateKernel). To summarize: for *grouped* aggregation we want the kernel to do the work of evaluating group condition(s) a

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-17 Thread Niranda Perera
1. This is great. I will follow this JIRA. (better yet, I'll see if I can make that contribution) 2. If we forget about the multithreading case for a moment, this requirement came up while implementing a "groupby + aggregation" operation (single-threaded). Let's assume that a table is not sorted.

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-17 Thread Benjamin Kietzman
Hi Niranda, hastebin: That looks generally correct, though I should warn you that a recent PR ( https://github.com/apache/arrow/pull/8574 ) changed the return type of DispatchExact to Kernel so you'll need to insert an explicit cast to ScalarAggregateKernel. 1: This seems like a feature which mig

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-16 Thread Niranda Perera
Hi Ben and Wes, Based on our discussion, I did the following. https://hastebin.com/ajadonados.cpp It seems to be working fine. Would love to get your feedback on this! :-) But I have a couple of concerns. 1. Say I want to communicate the intermediate state data across multiple processes. Unfortun

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-10 Thread Wes McKinney
Yes, open a Jira and propose a PR implementing the changes you need On Mon, Nov 9, 2020 at 8:31 PM Niranda Perera wrote: > > @wes How should I proceed with this nevertheless? should I open a JIRA? > > On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney wrote: > > > On Mon, Nov 9, 2020 at 9:32 AM Nirand

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-09 Thread Niranda Perera
@wes How should I proceed with this nevertheless? should I open a JIRA? On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney wrote: > On Mon, Nov 9, 2020 at 9:32 AM Niranda Perera > wrote: > > > > @Ben > > Thank you very much for the feedback. But unfortunately, I was unable to > > find a header that e

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-09 Thread Niranda Perera
I see. So, IIUC, I would have to get the "mean" Function from the function registry, cast it to a ScalarAggregateFunction, and access the kernels_ vector which ultimately holds the ScalarAggregateKernel? :-) Or can I use this ScalarAggregateFunction::DispatchExact function for this? On Mon, Nov 9

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-09 Thread Benjamin Kietzman
Hi Niranda, > unfortunately, I was unable to find a header that exposes a SumAggregateKernel Sorry, I misspelled that! Should have written ScalarAggregateKernel, which is in kernel.h On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney wrote: > On Mon, Nov 9, 2020 at 9:32 AM Niranda Perera > wrote: >

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-09 Thread Wes McKinney
On Mon, Nov 9, 2020 at 9:32 AM Niranda Perera wrote: > > @Ben > Thank you very much for the feedback. But unfortunately, I was unable to > find a header that exposes a SumAggregateKernel in the v2.0.0. Maybe I am > checking it wrong. I remember accessing them in v0.16 IINM. > > @Wes > Yes, that wo

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-09 Thread Niranda Perera
@Ben Thank you very much for the feedback. But unfortunately, I was unable to find a header that exposes a SumAggregateKernel in the v2.0.0. Maybe I am checking it wrong. I remember accessing them in v0.16 IINM. @Wes Yes, that would be great. How about adding a CMake compilation flag for such dev

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Wes McKinney
I'm not opposed to installing headers that provide access to some of the kernel implementation internals (with the caveat that changes won't go through a deprecation cycle, so caveat emptor). It might be more sustainable to think about what kind of stable-ish public API could be exported to support

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Ben Kietzman
Hi Niranda, SumImpl is a subclass of KernelState. Given a SumAggregateKernel, one can produce zeroed KernelState using the `init` member, then operate on data using the `consume`, `merge`, and `finalize` members. You can look at ScalarAggExecutor for an example of how to get from a compute functio

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Niranda Perera
Hi Ben, We are building a distributed table abstraction on top of Arrow dataframes called Cylon (https://github.com/cylondata/cylon). Currently we have a simple aggregation and group-by operation implementation. But we felt like we can give more functionality if we can import arrow kernels and sta

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Ben Kietzman
Ni Niranda, What is the context of your work? if you're working inside the arrow repository you shouldn't need to install headers before using them, and we welcome PRs for new kernels. Otherwise, could you provide some details about how your work is using Arrow as a dependency? Ben Kietzman On S

Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Niranda Perera
Hi, I was wondering if I could use the arrow/compute/kernels/*internal.h headers in my work? I would like to reuse some of the kernel implementations and kernel states. With -DARROW_COMPUTE=ON, those headers are not added into the include dir. I see that the *internal.h headers are skipped from t