[ https://issues.apache.org/jira/browse/ARROW-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alenka Frim updated ARROW-18433: -------------------------------- Summary: [C++][Python] Optimize aggregate functions to work with batches. (was: Optimize aggregate functions to work with batches.) > [C++][Python] Optimize aggregate functions to work with batches. > ---------------------------------------------------------------- > > Key: ARROW-18433 > URL: https://issues.apache.org/jira/browse/ARROW-18433 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Python > Affects Versions: 10.0.1 > Reporter: A. Coady > Priority: Major > > Most compute functions work with the dataset api and don't load columns. But > aggregate functions which are associative could also work: `min`, `max`, > `any`, `all`, `sum`, `product`. Even `unique` and `value_counts`. > A couple of implementation ideas: > * expand the dataset api to support expressions which return scalars > * add a `BatchedArray` type which is like a `ChunkedArray` but with lazy > loading -- This message was sent by Atlassian Jira (v8.20.10#820010)