Re: complex data structure aggregators?

2019-08-12 Thread Ted Dunning
Charles, That might work. The t-digest will give us a median estimate. On Mon, Aug 12, 2019 at 4:33 PM Charles Givre wrote: > HI Ted, > You might want to take a look at this repo: >

Re: complex data structure aggregators?

2019-08-12 Thread Charles Givre
HI Ted, You might want to take a look at this repo: https://github.com/cgivre/drill-stats-function/blob/master/src/main/java/org/apache/drill/contrib/function/DrillStatsFunctions.java

Re: complex data structure aggregators?

2019-08-12 Thread Paul Rogers
Hi Ted, You are now at the point that you'll have to experiment. Drill provides an annotation for aggregate state:  @Workspace. The value must be declared as a "holder". You'll have to check if VarBinaryHolder is allowed, and, if so, how you allocate memory and remember the offset into the

Re: complex data structure aggregators?

2019-08-12 Thread Ted Dunning
I am trying to figure out how to build an approximate percentile estimator. I have a fancy data structure that will do this. It can live in bounded memory with no allocation. I can add numbers to the digest easily enough. And the required results can be extracted from the structure. What I would

Re: complex data structure aggregators?

2019-08-12 Thread Ted Dunning
Can UDFs accumulate a fixed length binary value? On Mon, Aug 12, 2019 at 11:23 AM Paul Rogers wrote: > Hi Ted, > > Thanks for the link; I suspected there was some trick for stddev. The > point still stands that, if the algorithm requires multiple passes over the > data (ML, say), can't be

Re: complex data structure aggregators?

2019-08-12 Thread Charles Givre
Ted, Can we ask what it is you are trying to build a UDF for? --C > On Aug 12, 2019, at 2:23 PM, Paul Rogers wrote: > > Hi Ted, > > Thanks for the link; I suspected there was some trick for stddev. The point > still stands that, if the algorithm requires multiple passes over the data > (ML,

Re: complex data structure aggregators?

2019-08-12 Thread Paul Rogers
Hi Ted, Thanks for the link; I suspected there was some trick for stddev. The point still stands that, if the algorithm requires multiple passes over the data (ML, say), can't be done in Drill. Each UDF must return exactly one value. It can return a map if you want multiple values (though

Re: complex data structure aggregators?

2019-08-12 Thread Ted Dunning
Is it possible for a UDF to produce multiple scalar results? Can it produce a binary result? Also, as a nit, standard deviation doesn't require buffering all the data. It just requires that you have three accumulators, one for count, one for mean and one for mean squared deviation. There is a

Re: complex data structure aggregators?

2019-08-12 Thread Paul Rogers
Hi Ted, Last I checked (when we wrote the book chapter on the subject), aggregate state are limited to scalars and Drill-defined types. There is no support to spill aggregate state, so that state will be lost if spilling is required to handle large aggregate batches. The current solution works

complex data structure aggregators?

2019-08-12 Thread Ted Dunning
What is the current state of building aggregators that have complex state via UDFs? Is it possible to define multi-level aggregators in a UDF? Can the output of a UDF be a byte array? (these are three different questions)