No, weighted percentile is not supported at the moment. A good enhancement candidate. :-)
On Fri, Oct 14, 2022 at 11:22 PM Will Glass-Husain <wgl...@forio.com> wrote: > Hi Team, > > Thanks again for the recommendations and the articles. > > One quick question on the PERCENTILE function. Does this support weighted > percentiles? I see that the t-digest library supports this. Weighting > is critical for accurate reporting of medians with survey data. > > WILL > > On Wed, Oct 12, 2022 at 2:03 AM Xiaoxiang Yu <x...@apache.org> wrote: > > > Hi, Will > > Glad to see you have complete the 'basic path' of kylin4_on_cloud, > > which provided > > some tools which make deployment of Kylin much easier than before. But I > > think to > > make Kylin provide satisfying performance(response time, concurrency), > > user must > > have enough knowledge of Apache Spark and Apache Kylin. I think this > > article maybe > > helpful: > > https://kylin.apache.org/blog/2021/06/17/Why-did-Youzan-choose-Kylin4 . > > > > > > -- > > *Best wishes to you ! * > > *From :**Xiaoxiang Yu* > > > > > > > > At 2022-10-12 02:21:13, "Will Glass-Husain" <wgl...@forio.com> wrote: > > >Thank you -- very helpful. > > > > > >Regarding limits on the number of dimensions. What are the > > >compute/storage constraints on this? For a given query: > > >* Where is the data stored > > >* Which nodes is the computation occurring on? > > > > > >I am trying to figure out -- if we have a large number of dimensions, > what > > >part of the cloud based kylin needs to be increased (I'm doing the > setup > > >from the kylin4_on_cloud branch) > > > > > >Thanks, WILL > > > > > >On Tue, Oct 11, 2022 at 1:20 AM Xiaoxiang Yu <x...@apache.org> wrote: > > > > > >> 1) The criteria for filtering (e.g. selecting sex='male') and > grouping (e.g. > > >> group by state) should be dimensions - is this correct? > > >> Yes, besides Kylin has limit of 63 dimensions at maximum. But you > should > > >> be aware of 'The Curse of Dimensionality'. > > >> > > >> 2.1) Items that I would like to sum should be measures, is that right? > > >> Yes. > > >> > > >> 2.2) Is there a limit to the number of measures? > > >> No, there isn't such limit. > > >> > > >> 3) Did Kylin support sum(expression)? > > >> From mysql doc > > >> > https://dev.mysql.com/doc/refman/5.7/en/aggregate-functions.html#function_sum > , > > >> we know MySQL supports it. > > >> For Kylin, Kylin should support it for Kylin 3.X and the future > version > > >> 5.x. But unluckily, Kylin 4.x didn't support sum exprssion, and Kylin > 4.x > > >> is the version you are using. > > >> > > >> 4) Does Kylin support MEDIAN? > > >> > > >> Yes, Kylin should support but I didn't test it. In fact, Kylin has a > > >> measure PERCENTILE, and I think 50th percentile is equal to MEDIAN, > am I > > >> right? > > >> > > >> -- > > >> *Best wishes to you ! * > > >> *From :**Xiaoxiang Yu* > > >> > > >> > > >> > > >> At 2022-10-11 14:03:14, "Will Glass-Husain" <wgl...@forio.com> wrote: > > >> >Hi, > > >> > > > >> >Thanks for the recent help as I set up my first Kylin system. I > have a > > >> >question regarding proper design of a cube to run some > > >> >demographic queries. I want to make this accessible in a webapp, > with > > >> >reasonable response time. > > >> > > > >> >I have a CSV file with about 80 columns on sex, race, state, age, > internet > > >> >access, job, etc. > > >> > > > >> >Can you advise regarding proper cube design? > > >> > > > >> >1) The criteria for filtering (e.g. selecting sex='male') and > grouping > > >> >(e.g. group by state) should be dimensions - is this correct? > > >> > > > >> >2) Items that I would like to sum should be measures, is that > right? Is > > >> >there a limit to the number of measures? I want to report out up to > 300 > > >> >different measures aggregated by the dimensions. > > >> > > > >> >3) > > >> >In MySQL, I am querying for different values like this > > >> > > > >> >select SUM((married=1) * weight) as MARRIED_1, SUM((married=2) * > weight) as > > >> >MARRIED_2 from data group by state; > > >> > > > >> >This returns the total number of weighted records for records where > married > > >> >is 1 and where married is 2. > > >> > > > >> >Question - is there a way to do this in the Kylin query? Or do I > need to > > >> >pre-compute my weights and create columns MARRIED_1 and MARRIED_2 in > the > > >> >source data, then sum it in Kylin. > > >> > > > >> >4) This is a tricky one. Does Kylin support MEDIAN? In MySQL, > there's no > > >> >MEDIAN function but we can calculate it by counting all the records, > then > > >> >selecting the record at an offset of half the records. I want to > > >> >calculate "median" (not mean) for age and some other variables. > > >> > > > >> >Thanks for any tips. > > >> > > > >> >Best, WILL > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> >-- > > >> >William Glass-Husain /forio | +1 (415) 440 7500 x802 | > forio.com > > >> ><http://www.forio.com/> > > >> > > >> > > > > > >-- > > >William Glass-Husain /forio | +1 (415) 440 7500 x802 | forio.com > > ><http://www.forio.com/> > > > > > > -- > William Glass-Husain /forio | +1 (415) 440 7500 x802 | forio.com > <http://www.forio.com/> >