Since 2018 there are a lot of new features and code refactor. If you like, you can share your ppt to me privately, maybe I can give some comments.
Here is the reference of advantages of Kylin since 2018: - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/ - https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/ - https://kylin.apache.org/5.0/docs/development/roadmap ------------------------ With warm regard Xiaoxiang Yu On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in > my team. > > I found this article and would like you to update me the advantages of > Kylin since 2018 until now (especially with version 5 to be released) > > Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)? > < > https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/ > > > > On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote: > > > Thank you very much for your prompt response, I still have several > > questions to seek for your help later. > > > > Best regards and have a good day > > > > > > > > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote: > > > >> Done. Github branch changed to kylin5. > >> > >> ------------------------ > >> With warm regard > >> Xiaoxiang Yu > >> > >> > >> > >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> wrote: > >> > >> > A JIRA ticket has been opened, waiting for INFRA : > >> > https://issues.apache.org/jira/browse/INFRA-25238 . > >> > ------------------------ > >> > With warm regard > >> > Xiaoxiang Yu > >> > > >> > > >> > > >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> > >> wrote: > >> > > >> >> Thank you Xiaoxiang, please update me when you have changed your > >> default > >> >> branch. In case people are impressed by the numbers then I hope to > turn > >> >> this situation to reverse direction. > >> >> > >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> wrote: > >> >> > >> >>> The default branch is for 4.X which is a maintained branch, the > active > >> >>> branch is kylin5. > >> >>> I will change the default branch to kylin5 later. > >> >>> > >> >>> ------------------------ > >> >>> With warm regard > >> >>> Xiaoxiang Yu > >> >>> > >> >>> > >> >>> > >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> > >> >>> wrote: > >> >>> > >> >>>> Hi Xiaoxiang, Sirs / Madams > >> >>>> > >> >>>> Can you see the atttached photo > >> >>>> > >> >>>> My boss asked that why druid commit code regularly but kylin had > not > >> >>>> been committed since July > >> >>>> > >> >>>> > >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: > >> >>>> > >> >>>>> I think so. > >> >>>>> > >> >>>>> Response time is not the only factor to make a decision. Kylin > could > >> >>>>> be cheaper > >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin > >> can > >> >>>>> guarantee > >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc > >> >>>>> query scenario. > >> >>>>> > >> >>>>> By the way, Youzan and Kyligence combine them together to provide > >> >>>>> unified data analytics services for their customers. > >> >>>>> > >> >>>>> ------------------------ > >> >>>>> With warm regard > >> >>>>> Xiaoxiang Yu > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid > > > >> >>>>> wrote: > >> >>>>> > >> >>>>>> Hi Xiaoxiang, thank you > >> >>>>>> > >> >>>>>> In case my client uses cloud computing service like gcp or aws, > >> which > >> >>>>>> will cost more: precalculation feature of kylin or clickhouse > >> (incase > >> >>>>>> of > >> >>>>>> kylin, I have a thought that the query execution has been done > once > >> >>>>>> and > >> >>>>>> stored in cube to be used many times so kylin uses less cloud > >> >>>>>> computation, > >> >>>>>> is that true)? > >> >>>>>> > >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> > >> wrote: > >> >>>>>> > >> >>>>>> > Following text is part of an article( > >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) . > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > >> > =============================================================================== > >> >>>>>> > > >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes > >> because > >> >>>>>> of its > >> >>>>>> > pre-calculated technology, for example, join, group by, and > where > >> >>>>>> condition > >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data > >> volume > >> >>>>>> is, the > >> >>>>>> > more obvious the advantages of using Kylin are; in particular, > >> >>>>>> Kylin is > >> >>>>>> > particularly advantageous in the scenarios of de-emphasis > (count > >> >>>>>> distinct), > >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in > >> >>>>>> de-weighting > >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are > >> >>>>>> especially > >> >>>>>> > huge, and it is used in a large number of scenarios, such as > >> >>>>>> Dashboard, all > >> >>>>>> > kinds of reports, large-screen display, traffic statistics, and > >> user > >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use > Kylin > >> >>>>>> to build > >> >>>>>> > their data service platforms, providing millions to tens of > >> >>>>>> millions of > >> >>>>>> > queries per day, and most of the queries can be completed > within > >> 2 > >> >>>>>> - 3 > >> >>>>>> > seconds. There is no better alternative for such a high > >> concurrency > >> >>>>>> > scenario. > >> >>>>>> > > >> >>>>>> > ClickHouse, because of its MPP architecture, has high computing > >> >>>>>> power and > >> >>>>>> > is more suitable when the query request is more flexible, or > when > >> >>>>>> there is > >> >>>>>> > a need for detailed queries with low concurrency. Scenarios > >> >>>>>> include: very > >> >>>>>> > many columns and where conditions are arbitrarily combined with > >> the > >> >>>>>> user > >> >>>>>> > label filtering, not a large amount of concurrency of complex > >> >>>>>> on-the-spot > >> >>>>>> > query and so on. If the amount of data and access is large, you > >> >>>>>> need to > >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher > >> >>>>>> challenge for > >> >>>>>> > operation and maintenance. > >> >>>>>> > > >> >>>>>> > If some queries are very flexible but infrequent, it is more > >> >>>>>> > resource-efficient to use now-computing. Since the number of > >> >>>>>> queries is > >> >>>>>> > small, even if each query consumes a lot of computational > >> >>>>>> resources, it is > >> >>>>>> > still cost-effective overall. If some queries have a fixed > >> pattern > >> >>>>>> and the > >> >>>>>> > query volume is large, it is more suitable for Kylin, because > the > >> >>>>>> query > >> >>>>>> > volume is large, and by using large computational resources to > >> save > >> >>>>>> the > >> >>>>>> > results, the upfront computational cost can be amortized over > >> each > >> >>>>>> query, > >> >>>>>> > so it is the most economical. > >> >>>>>> > > >> >>>>>> > --- Translated with DeepL.com (free version) > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > ------------------------ > >> >>>>>> > With warm regard > >> >>>>>> > Xiaoxiang Yu > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy > <na...@vnpay.vn.invalid > >> > > >> >>>>>> wrote: > >> >>>>>> > > >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. > >> >>>>>> That's > >> >>>>>> >> great. > >> >>>>>> >> > >> >>>>>> >> This morning there has been a new challenge to my team: > >> clickhouse > >> >>>>>> offered > >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond > which > >> is > >> >>>>>> faster > >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 > billion > >> >>>>>> rows in > >> >>>>>> >> 2.9 > >> >>>>>> >> seconds) > >> >>>>>> >> > >> >>>>>> >> Can you briefly suggest the advantages of kylin over > clickhouse > >> so > >> >>>>>> that I > >> >>>>>> >> can defend my demonstration. > >> >>>>>> >> > >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> > >> >>>>>> wrote: > >> >>>>>> >> > >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the > >> reason > >> >>>>>> here is > >> >>>>>> >> > that > >> >>>>>> >> > kylin has lag time due to model update of new segment build, > >> is > >> >>>>>> that > >> >>>>>> >> > correct?" > >> >>>>>> >> > > >> >>>>>> >> > You are correct. > >> >>>>>> >> > > >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of > >> >>>>>> combination > >> >>>>>> >> of > >> >>>>>> >> > ... " > >> >>>>>> >> > > >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is > >> completed > >> >>>>>> but not > >> >>>>>> >> > released), > >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my > >> >>>>>> estimation > >> >>>>>> >> but I > >> >>>>>> >> > am > >> >>>>>> >> > quite certain about it). > >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do > >> >>>>>> micro-batch > >> >>>>>> >> > aggregation and persistence periodically. The price is that > >> you > >> >>>>>> need to > >> >>>>>> >> run > >> >>>>>> >> > and monitor a long-running > >> >>>>>> >> > job. This feature is based on Spark Streaming, so you need > >> >>>>>> knowledge of > >> >>>>>> >> > it. > >> >>>>>> >> > > >> >>>>>> >> > I am curious about what is the maximum time-lag your > customers > >> >>>>>> >> > can tolerate? > >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most > >> cases. > >> >>>>>> >> > > >> >>>>>> >> > ------------------------ > >> >>>>>> >> > With warm regard > >> >>>>>> >> > Xiaoxiang Yu > >> >>>>>> >> > > >> >>>>>> >> > > >> >>>>>> >> > > >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy > >> >>>>>> <na...@vnpay.vn.invalid> > >> >>>>>> >> wrote: > >> >>>>>> >> > > >> >>>>>> >> > > Druid is better in > >> >>>>>> >> > > - Have a real-time datasource like Kafka etc. > >> >>>>>> >> > > > >> >>>>>> >> > > ========================== > >> >>>>>> >> > > > >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response. > >> >>>>>> >> > > > >> >>>>>> >> > > In this important scenario of realtime alalytics, the > reason > >> >>>>>> here is > >> >>>>>> >> that > >> >>>>>> >> > > kylin has lag time due to model update of new segment > build, > >> >>>>>> is that > >> >>>>>> >> > > correct? > >> >>>>>> >> > > > >> >>>>>> >> > > If that is true, then can you suggest a work-around of > >> >>>>>> combination of > >> >>>>>> >> : > >> >>>>>> >> > > > >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide > >> >>>>>> >> > > realtime capability ? > >> >>>>>> >> > > > >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) > and > >> >>>>>> >> integrate it > >> >>>>>> >> > > with (time - lag kylin cube). > >> >>>>>> >> > > > >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu < > >> x...@apache.org> > >> >>>>>> wrote: > >> >>>>>> >> > > > >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know > >> too > >> >>>>>> much > >> >>>>>> >> about > >> >>>>>> >> > > > the change of Druid in these two years. New features > >> that I > >> >>>>>> know > >> >>>>>> >> are : > >> >>>>>> >> > > > new UI, fully on K8s etc). > >> >>>>>> >> > > > > >> >>>>>> >> > > > Here are some cases you should consider using Druid > other > >> >>>>>> than Kylin > >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid > >> >>>>>> which I > >> >>>>>> >> used > >> >>>>>> >> > two > >> >>>>>> >> > > > years ago): > >> >>>>>> >> > > > > >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc. > >> >>>>>> >> > > > - Most queries are small(Based on my test result, I > think > >> >>>>>> Druid had > >> >>>>>> >> > > better > >> >>>>>> >> > > > response time for small queries two years ago.) > >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use > the > >> >>>>>> >> K8S/public > >> >>>>>> >> > > > cloud platform as your deployment platform. > >> >>>>>> >> > > > > >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin > >> could > >> >>>>>> be > >> >>>>>> >> better, > >> >>>>>> >> > > > like: > >> >>>>>> >> > > > > >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can > >> have > >> >>>>>> a more > >> >>>>>> >> > > > exact-match/fine-grained > >> >>>>>> >> > > > Index for queries containing different `Group By > >> >>>>>> dimensions`. > >> >>>>>> >> > > > - User-friendly UI for modeling. > >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment) > >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show > >> it > >> >>>>>> supports > >> >>>>>> >> > ODBC > >> >>>>>> >> > > > well) > >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. > >> >>>>>> >> > > > > >> >>>>>> >> > > > > >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it. > >> >>>>>> >> > > > Hope to help you, or you are free to share your opinion. > >> >>>>>> >> > > > > >> >>>>>> >> > > > ------------------------ > >> >>>>>> >> > > > With warm regard > >> >>>>>> >> > > > Xiaoxiang Yu > >> >>>>>> >> > > > > >> >>>>>> >> > > > > >> >>>>>> >> > > > > >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy > >> >>>>>> <na...@vnpay.vn.invalid> > >> >>>>>> >> > > wrote: > >> >>>>>> >> > > > > >> >>>>>> >> > > >> Dear Xiaoxiang, > >> >>>>>> >> > > >> Sirs/Madams, > >> >>>>>> >> > > >> > >> >>>>>> >> > > >> May I post my boss's question: > >> >>>>>> >> > > >> > >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin > >> >>>>>> compared to > >> >>>>>> >> > Pinot > >> >>>>>> >> > > >> and > >> >>>>>> >> > > >> Druid? > >> >>>>>> >> > > >> > >> >>>>>> >> > > >> Please kindly let me know > >> >>>>>> >> > > >> > >> >>>>>> >> > > >> Thank you very much and best regards > >> >>>>>> >> > > >> > >> >>>>>> >> > > > > >> >>>>>> >> > > > >> >>>>>> >> > > >> >>>>>> >> > >> >>>>>> > > >> >>>>>> > >> >>>>> > >> > > >