[jira] [Updated] (ARROW-18433) [C++][Python] Optimize aggregate functions to work with batches.

Alenka Frim (Jira) Sun, 11 Dec 2022 21:49:06 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alenka Frim updated ARROW-18433:
--------------------------------
    Summary: [C++][Python] Optimize aggregate functions to work with batches.  
(was: Optimize aggregate functions to work with batches.)

> [C++][Python] Optimize aggregate functions to work with batches.
> ----------------------------------------------------------------
>
>                 Key: ARROW-18433
>                 URL: https://issues.apache.org/jira/browse/ARROW-18433
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>    Affects Versions: 10.0.1
>            Reporter: A. Coady
>            Priority: Major
>
> Most compute functions work with the dataset api and don't load columns. But 
> aggregate functions which are associative could also work: `min`, `max`, 
> `any`, `all`, `sum`, `product`. Even `unique` and `value_counts`.
> A couple of implementation ideas:
>  * expand the dataset api to support expressions which return scalars
>  * add a `BatchedArray` type which is like a `ChunkedArray` but with lazy 
> loading



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ARROW-18433) [C++][Python] Optimize aggregate functions to work with batches.

Reply via email to