Hi Gus!

Unfortunately at the moment, I haven't yet implemented the full expression
evaluation in the Go compute library. So while you can use the library at
first glance to define expressions, you can't execute them just yet. I've
ended up taking a different direction here and instead I've been working on
implementing execution using Substrait[1] expressions intending to replace
and deprecate the Expression types that currently exist in the Go compute
library in favor of the ones implemented in the substrait-go repo[2].

> Based on my limited understanding of this and reviewing the C++
documentation it seems I should pass this expression into a Filter node as
an argument of some sort.

This is part of the reason why I'm going directly to substrait for compute
definitions. Rather than trying to replicate the full execution framework
that Acero defines in C++, my plan is to allow executing Substrait plans as
they exist so that it isn't necessary to create nodes and pipelines for
compute in Go. (At least not yet).

In the meantime, if you are able to use CGO, you could theoretically
serialize the existing Expressions in Go and pass them using the C-Data API
to the C++ libacero library for execution, then use the C-Data API to bring
the results back into Go without having to copy the data (by passing the
pointers around). Sorry that this isn't more helpful. I promise this work
*is* coming, and will most likely have initial PRs of some basic
functionality within the next few weeks.

Take care!
--Matt

[1]: https://substrait.io
[2]: https://pkg.go.dev/github.com/substrait-io/[email protected]/expr

On Fri, Apr 14, 2023 at 8:47 AM Gus Minto-Cowcher <[email protected]> wrote:

> Hi,
>
> I am using the Golang implementation and am looking to do some basic data
> processing on arrow arrays read from a parquet file. I have been looking at
> using this package
> <https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/compute>.
>
> While I have figured out how to do a basic filter using CallFunction:
>
> fb := array.NewFloat64Builder(memory.DefaultAllocator)
> fb.AppendValues([]float64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, nil)
> array := fb.NewFloat64Array()
>
> c := compute.DefaultExecCtx()
> ctx := context.TODO()
> compute.SetExecCtx(ctx, c)
>
> out, err := compute.CallFunction(ctx, "greater_equal", nil, compute.
> NewDatum(array), compute.NewDatum(3))
> if err != nil {
> log.Fatal(err)
> }
> defer out.Release()
> filter := out.(*compute.ArrayDatum).MakeArray()
> result, err := compute.FilterArray(ctx, array, filter, *compute.
> DefaultFilterOptions())
> if err != nil {
> log.Fatal(err)
> }
>
> I was looking at attempting to use the Expressions built within the
> compute library as this appears at first glance to be a much more idiomatic
> way of using the compute library. IE. something like:
> expr := compute.GreaterEqual(compute.NewRef(compute.FieldRefIndex(0)),
> compute.NewLiteral(3))
>
> However, I cannot figure out how to actually execute the expression. Based
> on my limited understanding of this and reviewing the C++ documentation it
> seems I should pass this expression into a Filter node as an argument of
> some sort. But basically at this stage where I am actually trying to
> execute an expression on data I am lost.
>
> I would really appreciate any input/examples/pointers people might have :)
>
> Thanks in advance,
> Gus
>

Reply via email to