Having the option of keeping the t-digest separate could be useful.  For
instance Google's SQL  dialect allows for tracking some  sketch data
structures separately [1]


[1]
https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions

On Mon, Mar 21, 2022 at 7:42 PM Yibo Cai <[email protected]> wrote:

> Do you mean you want to call pyarrow.compute.tdigest on different inputs
> over the time, and continuously merge the results into one tdigest?
>
>
>
> Pyarrow.compute.tdigest (python wrapper of c++ kernel) is an aggregate
> kernel to consume input array and output the wanted quantiles. It’s not
> suitable to return the internal tdigest structure (and how can one make use
> of the tdigest structure?).
>
>
>
> The c++ tdigest utility (not kernel) does supports merging tdigests. [1]
>
> Is it possible to use the tdigest utility directly?
>
>
>
> [1]
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/tdigest.h#L79
>
>
>
> Yibo
>
>
>
> *From:* [email protected] <[email protected]>
> *Sent:* Monday, March 21, 2022 10:06 PM
> *To:* [email protected]
> *Subject:* [Python] pyarrow.compute.tdigest return class
>
>
>
> Hello everyone,
>
>
>
> Is there any way for the pyarrow.compute.tdigest function to return a
> TDigest structure in such a way that it can be merged?
>
>
>
> I have a use case where I would like to store time series percentile
> distributions. The pyarrow function tdigest is very fast but the output is
> numbers and these cannot be aggregated.
>
>
>
> I have tried using TDigest (https://github.com/CamDavidsonPilon/tdigest)
> but it is very slow.
>
>
>
> Thank you very much.
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>

Reply via email to