No, weighted percentile is not supported at the moment.

A good enhancement candidate. :-)

On Fri, Oct 14, 2022 at 11:22 PM Will Glass-Husain <wgl...@forio.com> wrote:

> Hi Team,
>
> Thanks again for the recommendations and the articles.
>
> One quick question on the PERCENTILE function.   Does this support weighted
> percentiles?  I see that the t-digest library supports this.    Weighting
> is critical for accurate reporting of medians with survey data.
>
> WILL
>
> On Wed, Oct 12, 2022 at 2:03 AM Xiaoxiang Yu <x...@apache.org> wrote:
>
> > Hi, Will
> >   Glad to see you have complete the 'basic path' of kylin4_on_cloud,
> > which provided
> > some tools which make deployment of Kylin much easier than before. But I
> > think to
> > make Kylin provide satisfying performance(response time, concurrency),
> > user must
> > have enough knowledge of Apache Spark and Apache Kylin. I think this
> > article maybe
> > helpful:
> > https://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4 .
> >
> >
> > --
> > *Best wishes to you ! *
> > *From :**Xiaoxiang Yu*
> >
> >
> >
> > At 2022-10-12 02:21:13, "Will Glass-Husain" <wgl...@forio.com> wrote:
> > >Thank you -- very helpful.
> > >
> > >Regarding limits on the number of dimensions.    What are the
> > >compute/storage constraints on this?  For a given query:
> > >* Where is the data stored
> > >* Which nodes is the computation occurring on?
> > >
> > >I am trying to figure out -- if we have a large number of dimensions,
> what
> > >part of the cloud based kylin  needs to be increased (I'm doing the
> setup
> > >from the kylin4_on_cloud branch)
> > >
> > >Thanks, WILL
> > >
> > >On Tue, Oct 11, 2022 at 1:20 AM Xiaoxiang Yu <x...@apache.org> wrote:
> > >
> > >> 1) The criteria for filtering (e.g. selecting sex='male') and
> grouping (e.g.
> > >> group by state) should be dimensions - is this correct?
> > >> Yes, besides Kylin has limit of 63 dimensions at maximum.  But you
> should
> > >> be aware of 'The Curse of Dimensionality'.
> > >>
> > >> 2.1) Items that I would like to sum should be measures, is that right?
> > >> Yes.
> > >>
> > >> 2.2) Is there a limit to the number of measures?
> > >> No, there isn't such limit.
> > >>
> > >> 3) Did Kylin support sum(expression)?
> > >> From mysql doc
> > >>
> https://dev.mysql.com/doc/refman/5.7/en/aggregate-functions.html#function_sum
> ,
> > >> we know MySQL supports it.
> > >> For Kylin, Kylin should support it for Kylin 3.X and the future
> version
> > >> 5.x. But unluckily, Kylin 4.x didn't support sum exprssion, and Kylin
> 4.x
> > >> is the version you are using.
> > >>
> > >> 4) Does Kylin support MEDIAN?
> > >>
> > >> Yes, Kylin should support but I didn't test it. In fact, Kylin has a
> > >> measure PERCENTILE, and I think 50th percentile is equal to MEDIAN,
> am I
> > >> right?
> > >>
> > >> --
> > >> *Best wishes to you ! *
> > >> *From :**Xiaoxiang Yu*
> > >>
> > >>
> > >>
> > >> At 2022-10-11 14:03:14, "Will Glass-Husain" <wgl...@forio.com> wrote:
> > >> >Hi,
> > >> >
> > >> >Thanks for the recent help as I set up my first Kylin system.   I
> have a
> > >> >question regarding proper design of a cube to run some
> > >> >demographic queries.   I want to make this accessible in a webapp,
> with
> > >> >reasonable response time.
> > >> >
> > >> >I have a CSV file with about 80 columns on sex, race, state, age,
> internet
> > >> >access, job, etc.
> > >> >
> > >> >Can you advise regarding proper cube design?
> > >> >
> > >> >1) The criteria for filtering (e.g. selecting sex='male') and
> grouping
> > >> >(e.g. group by state) should be dimensions - is this correct?
> > >> >
> > >> >2) Items that I would like to sum should be measures, is that
> right?   Is
> > >> >there a limit to the number of measures?  I want to report out up to
> 300
> > >> >different measures aggregated by the dimensions.
> > >> >
> > >> >3)
> > >> >In MySQL, I am querying for different values like this
> > >> >
> > >> >select SUM((married=1) * weight) as MARRIED_1, SUM((married=2) *
> weight) as
> > >> >MARRIED_2 from data group by state;
> > >> >
> > >> >This returns the total number of weighted records for records where
> married
> > >> >is 1 and where married is 2.
> > >> >
> > >> >Question - is there a way to do this in the Kylin query?    Or do I
> need to
> > >> >pre-compute my weights and create columns MARRIED_1 and MARRIED_2 in
> the
> > >> >source data, then sum it in Kylin.
> > >> >
> > >> >4) This is a tricky one.  Does Kylin support MEDIAN?   In MySQL,
> there's no
> > >> >MEDIAN function but we can calculate it by counting all the records,
> then
> > >> >selecting the record at an offset of half the records.   I want to
> > >> >calculate "median" (not mean) for age and some other variables.
> > >> >
> > >> >Thanks for any tips.
> > >> >
> > >> >Best, WILL
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >--
> > >> >William Glass-Husain   /forio  |  +1 (415) 440 7500 x802  |
> forio.com
> > >> ><http://www.forio.com/>
> > >>
> > >>
> > >
> > >--
> > >William Glass-Husain   /forio  |  +1 (415) 440 7500 x802  |  forio.com
> > ><http://www.forio.com/>
> >
> >
>
> --
> William Glass-Husain   /forio  |  +1 (415) 440 7500 x802  |  forio.com
> <http://www.forio.com/>
>

Reply via email to