Do you mean you want to call pyarrow.compute.tdigest on different inputs over 
the time, and continuously merge the results into one tdigest?

Pyarrow.compute.tdigest (python wrapper of c++ kernel) is an aggregate kernel 
to consume input array and output the wanted quantiles. It's not suitable to 
return the internal tdigest structure (and how can one make use of the tdigest 
structure?).

The c++ tdigest utility (not kernel) does supports merging tdigests. [1]
Is it possible to use the tdigest utility directly?

[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/tdigest.h#L79

Yibo

From: [email protected] <[email protected]>
Sent: Monday, March 21, 2022 10:06 PM
To: [email protected]
Subject: [Python] pyarrow.compute.tdigest return class

Hello everyone,

Is there any way for the pyarrow.compute.tdigest function to return a TDigest 
structure in such a way that it can be merged?

I have a use case where I would like to store time series percentile 
distributions. The pyarrow function tdigest is very fast but the output is 
numbers and these cannot be aggregated.

I have tried using TDigest (https://github.com/CamDavidsonPilon/tdigest) but it 
is very slow.

Thank you very much.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Reply via email to