Done. Github branch changed to kylin5. ------------------------ With warm regard Xiaoxiang Yu
On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> wrote: > A JIRA ticket has been opened, waiting for INFRA : > https://issues.apache.org/jira/browse/INFRA-25238 . > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > >> Thank you Xiaoxiang, please update me when you have changed your default >> branch. In case people are impressed by the numbers then I hope to turn >> this situation to reverse direction. >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> wrote: >> >>> The default branch is for 4.X which is a maintained branch, the active >>> branch is kylin5. >>> I will change the default branch to kylin5 later. >>> >>> ------------------------ >>> With warm regard >>> Xiaoxiang Yu >>> >>> >>> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>> wrote: >>> >>>> Hi Xiaoxiang, Sirs / Madams >>>> >>>> Can you see the atttached photo >>>> >>>> My boss asked that why druid commit code regularly but kylin had not >>>> been committed since July >>>> >>>> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: >>>> >>>>> I think so. >>>>> >>>>> Response time is not the only factor to make a decision. Kylin could >>>>> be cheaper >>>>> when the query pattern is suitable for the Kylin model, and Kylin can >>>>> guarantee >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc >>>>> query scenario. >>>>> >>>>> By the way, Youzan and Kyligence combine them together to provide >>>>> unified data analytics services for their customers. >>>>> >>>>> ------------------------ >>>>> With warm regard >>>>> Xiaoxiang Yu >>>>> >>>>> >>>>> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>>> wrote: >>>>> >>>>>> Hi Xiaoxiang, thank you >>>>>> >>>>>> In case my client uses cloud computing service like gcp or aws, which >>>>>> will cost more: precalculation feature of kylin or clickhouse (incase >>>>>> of >>>>>> kylin, I have a thought that the query execution has been done once >>>>>> and >>>>>> stored in cube to be used many times so kylin uses less cloud >>>>>> computation, >>>>>> is that true)? >>>>>> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>>>> >>>>>> > Following text is part of an article( >>>>>> > https://zhuanlan.zhihu.com/p/343394287) . >>>>>> > >>>>>> > >>>>>> > >>>>>> =============================================================================== >>>>>> > >>>>>> > Kylin is suitable for aggregation queries with fixed modes because >>>>>> of its >>>>>> > pre-calculated technology, for example, join, group by, and where >>>>>> condition >>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume >>>>>> is, the >>>>>> > more obvious the advantages of using Kylin are; in particular, >>>>>> Kylin is >>>>>> > particularly advantageous in the scenarios of de-emphasis (count >>>>>> distinct), >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in >>>>>> de-weighting >>>>>> > (count distinct), Top N, Percentile and other scenarios are >>>>>> especially >>>>>> > huge, and it is used in a large number of scenarios, such as >>>>>> Dashboard, all >>>>>> > kinds of reports, large-screen display, traffic statistics, and user >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin >>>>>> to build >>>>>> > their data service platforms, providing millions to tens of >>>>>> millions of >>>>>> > queries per day, and most of the queries can be completed within 2 >>>>>> - 3 >>>>>> > seconds. There is no better alternative for such a high concurrency >>>>>> > scenario. >>>>>> > >>>>>> > ClickHouse, because of its MPP architecture, has high computing >>>>>> power and >>>>>> > is more suitable when the query request is more flexible, or when >>>>>> there is >>>>>> > a need for detailed queries with low concurrency. Scenarios >>>>>> include: very >>>>>> > many columns and where conditions are arbitrarily combined with the >>>>>> user >>>>>> > label filtering, not a large amount of concurrency of complex >>>>>> on-the-spot >>>>>> > query and so on. If the amount of data and access is large, you >>>>>> need to >>>>>> > deploy a distributed ClickHouse cluster, which is a higher >>>>>> challenge for >>>>>> > operation and maintenance. >>>>>> > >>>>>> > If some queries are very flexible but infrequent, it is more >>>>>> > resource-efficient to use now-computing. Since the number of >>>>>> queries is >>>>>> > small, even if each query consumes a lot of computational >>>>>> resources, it is >>>>>> > still cost-effective overall. If some queries have a fixed pattern >>>>>> and the >>>>>> > query volume is large, it is more suitable for Kylin, because the >>>>>> query >>>>>> > volume is large, and by using large computational resources to save >>>>>> the >>>>>> > results, the upfront computational cost can be amortized over each >>>>>> query, >>>>>> > so it is the most economical. >>>>>> > >>>>>> > --- Translated with DeepL.com (free version) >>>>>> > >>>>>> > >>>>>> > ------------------------ >>>>>> > With warm regard >>>>>> > Xiaoxiang Yu >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>>>> wrote: >>>>>> > >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. >>>>>> That's >>>>>> >> great. >>>>>> >> >>>>>> >> This morning there has been a new challenge to my team: clickhouse >>>>>> offered >>>>>> >> us the speed of calculating 8 billion rows in millisecond which is >>>>>> faster >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion >>>>>> rows in >>>>>> >> 2.9 >>>>>> >> seconds) >>>>>> >> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so >>>>>> that I >>>>>> >> can defend my demonstration. >>>>>> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> >>>>>> wrote: >>>>>> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the reason >>>>>> here is >>>>>> >> > that >>>>>> >> > kylin has lag time due to model update of new segment build, is >>>>>> that >>>>>> >> > correct?" >>>>>> >> > >>>>>> >> > You are correct. >>>>>> >> > >>>>>> >> > 2. "If that is true, then can you suggest a work-around of >>>>>> combination >>>>>> >> of >>>>>> >> > ... " >>>>>> >> > >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed >>>>>> but not >>>>>> >> > released), >>>>>> >> > which can make the time-lag to about 3 minutes(that is my >>>>>> estimation >>>>>> >> but I >>>>>> >> > am >>>>>> >> > quite certain about it). >>>>>> >> > NRT stands for 'near real-time', it will run a job and do >>>>>> micro-batch >>>>>> >> > aggregation and persistence periodically. The price is that you >>>>>> need to >>>>>> >> run >>>>>> >> > and monitor a long-running >>>>>> >> > job. This feature is based on Spark Streaming, so you need >>>>>> knowledge of >>>>>> >> > it. >>>>>> >> > >>>>>> >> > I am curious about what is the maximum time-lag your customers >>>>>> >> > can tolerate? >>>>>> >> > Personally, I guess minute level time-lag is ok for most cases. >>>>>> >> > >>>>>> >> > ------------------------ >>>>>> >> > With warm regard >>>>>> >> > Xiaoxiang Yu >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy >>>>>> <na...@vnpay.vn.invalid> >>>>>> >> wrote: >>>>>> >> > >>>>>> >> > > Druid is better in >>>>>> >> > > - Have a real-time datasource like Kafka etc. >>>>>> >> > > >>>>>> >> > > ========================== >>>>>> >> > > >>>>>> >> > > Hi Xiaoxiang, thank you for your response. >>>>>> >> > > >>>>>> >> > > In this important scenario of realtime alalytics, the reason >>>>>> here is >>>>>> >> that >>>>>> >> > > kylin has lag time due to model update of new segment build, >>>>>> is that >>>>>> >> > > correct? >>>>>> >> > > >>>>>> >> > > If that is true, then can you suggest a work-around of >>>>>> combination of >>>>>> >> : >>>>>> >> > > >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide >>>>>> >> > > realtime capability ? >>>>>> >> > > >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and >>>>>> >> integrate it >>>>>> >> > > with (time - lag kylin cube). >>>>>> >> > > >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <x...@apache.org> >>>>>> wrote: >>>>>> >> > > >>>>>> >> > > > I researched and tested Druid two years ago(I don't know too >>>>>> much >>>>>> >> about >>>>>> >> > > > the change of Druid in these two years. New features that I >>>>>> know >>>>>> >> are : >>>>>> >> > > > new UI, fully on K8s etc). >>>>>> >> > > > >>>>>> >> > > > Here are some cases you should consider using Druid other >>>>>> than Kylin >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid >>>>>> which I >>>>>> >> used >>>>>> >> > two >>>>>> >> > > > years ago): >>>>>> >> > > > >>>>>> >> > > > - Have a real-time datasource like Kafka etc. >>>>>> >> > > > - Most queries are small(Based on my test result, I think >>>>>> Druid had >>>>>> >> > > better >>>>>> >> > > > response time for small queries two years ago.) >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the >>>>>> >> K8S/public >>>>>> >> > > > cloud platform as your deployment platform. >>>>>> >> > > > >>>>>> >> > > > But I do think there are many scenarios in which Kylin could >>>>>> be >>>>>> >> better, >>>>>> >> > > > like: >>>>>> >> > > > >>>>>> >> > > > - Better performance for complex/big queries. Kylin can have >>>>>> a more >>>>>> >> > > > exact-match/fine-grained >>>>>> >> > > > Index for queries containing different `Group By >>>>>> dimensions`. >>>>>> >> > > > - User-friendly UI for modeling. >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment) >>>>>> >> > > > - ODBC driver for different BI.(its website did not show it >>>>>> supports >>>>>> >> > ODBC >>>>>> >> > > > well) >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. >>>>>> >> > > > >>>>>> >> > > > >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it. >>>>>> >> > > > Hope to help you, or you are free to share your opinion. >>>>>> >> > > > >>>>>> >> > > > ------------------------ >>>>>> >> > > > With warm regard >>>>>> >> > > > Xiaoxiang Yu >>>>>> >> > > > >>>>>> >> > > > >>>>>> >> > > > >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy >>>>>> <na...@vnpay.vn.invalid> >>>>>> >> > > wrote: >>>>>> >> > > > >>>>>> >> > > >> Dear Xiaoxiang, >>>>>> >> > > >> Sirs/Madams, >>>>>> >> > > >> >>>>>> >> > > >> May I post my boss's question: >>>>>> >> > > >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin >>>>>> compared to >>>>>> >> > Pinot >>>>>> >> > > >> and >>>>>> >> > > >> Druid? >>>>>> >> > > >> >>>>>> >> > > >> Please kindly let me know >>>>>> >> > > >> >>>>>> >> > > >> Thank you very much and best regards >>>>>> >> > > >> >>>>>> >> > > > >>>>>> >> > > >>>>>> >> > >>>>>> >> >>>>>> > >>>>>> >>>>>