No and no. This filter will not be used for predicate pushdown now or in 8.0.0. It could possibly come after 8.0.0. If parquet stores statistics for each column of a struct array (don't know offhand if they do) then we should create a JIRA to expose this.
On Wed, Apr 20, 2022, 11:01 AM Partha Dutta <[email protected]> wrote: > That works! Thanks. Do you know off hand if this filter would be used in a > predicate pushdown for a parquet dataset? Or would it be possibly coming in > version 8.0.0? > > On Wed, Apr 20, 2022 at 3:49 PM Weston Pace <[email protected]> wrote: > >> The second argument to `call_function` should be a list (the args to >> the function). Since `arr3` is iterable it is interpreting it as a >> list of args and trying to treat each row as an argument to your call >> (this is the reason it thinks you have 3 arguments). This should >> work: >> >> pc.call_function("struct_field", [arr3], >> pc.StructFieldOptions(indices=[0])) >> >> Unfortunately, that evaluates the function immediately. If you want >> to create an expression then you need some way to create a call and I >> don't actually know how to do that. I can do something a little >> hackish: >> >> table = pa.Table.from_pydict({'values': arr3}) >> dataset = ds.dataset(table) >> sf_call = ds.field('')._call('struct_field', [ds.field('values')], >> pc.StructFieldOptions(indices=[0])) >> dataset.to_table(filter=sf_call < 200) >> >> However, I suspect there is probably a better way to create a call >> object than `ds.field('')._call(...)` >> >> On Wed, Apr 20, 2022 at 3:09 AM Partha Dutta <[email protected]> >> wrote: >> > >> > I'm trying to use the compute function struct_field in order to create >> an expression for dataset filtering. But running into an error. This is the >> code snippet: >> > >> > arr1 = pa.array([100, 200, 300]) >> > arr2 = pa.array([400, 500, 600]) >> > arr3 = pa.StructArray.from_arrays([arr1, arr2], ["one", "two"]) >> > e = pc.call_function("struct_field", arr3, >> pc.StructFieldOptions(indices=[0])) > 200 >> > Traceback (most recent call last): >> > File "<stdin>", line 1, in <module> >> > File "pyarrow/_compute.pyx", line 531, in >> pyarrow._compute.call_function >> > File "pyarrow/_compute.pyx", line 330, in >> pyarrow._compute.Function.call >> > File "pyarrow/error.pxi", line 143, in >> pyarrow.lib.pyarrow_internal_check_status >> > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status >> > pyarrow.lib.ArrowInvalid: Function 'struct_field' accepts 1 arguments >> but attempted to look up kernel(s) with 3 >> > >> > If I try to exclude the options, I get >> > pyarrow.lib.ArrowInvalid: Function 'struct_field' cannot be called >> without options >> > >> > Any advice? I am using pyarrow 7.0.0 >> > -- >> > Partha Dutta >> > [email protected] >> > > > -- > Partha Dutta > [email protected] >
