Kirill Lykov created ARROW-10197: ------------------------------------ Summary: [Gandiva][python] Execute expression on filtered data Key: ARROW-10197 URL: https://issues.apache.org/jira/browse/ARROW-10197 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva, Python Reporter: Kirill Lykov
Looks like there is no way to execute an expression on filtered data in python. Basically, I cannot pass `SelectionVector` to projector's `evaluate` method ```python import pyarrow as pa import pyarrow.gandiva as gandiva table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]), pa.array([5., 45., 36., 73., 83., 23., 76.])], ['a', 'b']) builder = gandiva.TreeExprBuilder() node_a = builder.make_field(table.schema.field("a")) node_b = builder.make_field(table.schema.field("b")) fifty = builder.make_literal(50.0, pa.float64()) eleven = builder.make_literal(11.0, pa.float64()) cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_()) cond_2 = builder.make_function("greater_than", [node_a, node_b], pa.bool_()) cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_()) cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3]) condition = builder.make_condition(cond) filter = gandiva.make_filter(table.schema, condition) # filterResult has type SelectionVector filterResult = filter.evaluate(table.to_batches()[0], pa.default_memory_pool()) print(result) sum = builder.make_function("add", [node_a, node_b], pa.float64()) field_result = pa.field("c", pa.float64()) expr = builder.make_expression(sum, field_result) projector = gandiva.make_projector( table.schema, [expr], pa.default_memory_pool()) ### Here there is a problem that I don't know how to use filterResult with projector r, = projector.evaluate(table.to_batches()[0], result) ``` In C++, I see that it is possible to pass SelectionVector as second argument to projector::Evaluate: [https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270] Meanwhile, it looks like it is impossible in `gandiva.pyx`: [https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154] -- This message was sent by Atlassian Jira (v8.3.4#803005)