A JIRA ticket has been opened, waiting for INFRA : https://issues.apache.org/jira/browse/INFRA-25238 . ------------------------ With warm regard Xiaoxiang Yu
On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Thank you Xiaoxiang, please update me when you have changed your default > branch. In case people are impressed by the numbers then I hope to turn > this situation to reverse direction. > > On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> wrote: > >> The default branch is for 4.X which is a maintained branch, the active >> branch is kylin5. >> I will change the default branch to kylin5 later. >> >> ------------------------ >> With warm regard >> Xiaoxiang Yu >> >> >> >> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: >> >>> Hi Xiaoxiang, Sirs / Madams >>> >>> Can you see the atttached photo >>> >>> My boss asked that why druid commit code regularly but kylin had not >>> been committed since July >>> >>> >>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: >>> >>>> I think so. >>>> >>>> Response time is not the only factor to make a decision. Kylin could be >>>> cheaper >>>> when the query pattern is suitable for the Kylin model, and Kylin can >>>> guarantee >>>> reasonable query latency. Clickhouse will be quicker in an ad hoc query >>>> scenario. >>>> >>>> By the way, Youzan and Kyligence combine them together to provide >>>> unified data analytics services for their customers. >>>> >>>> ------------------------ >>>> With warm regard >>>> Xiaoxiang Yu >>>> >>>> >>>> >>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>> wrote: >>>> >>>>> Hi Xiaoxiang, thank you >>>>> >>>>> In case my client uses cloud computing service like gcp or aws, which >>>>> will cost more: precalculation feature of kylin or clickhouse (incase >>>>> of >>>>> kylin, I have a thought that the query execution has been done once and >>>>> stored in cube to be used many times so kylin uses less cloud >>>>> computation, >>>>> is that true)? >>>>> >>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>>> >>>>> > Following text is part of an article( >>>>> > https://zhuanlan.zhihu.com/p/343394287) . >>>>> > >>>>> > >>>>> > >>>>> =============================================================================== >>>>> > >>>>> > Kylin is suitable for aggregation queries with fixed modes because >>>>> of its >>>>> > pre-calculated technology, for example, join, group by, and where >>>>> condition >>>>> > modes in SQL are relatively fixed, etc. The larger the data volume >>>>> is, the >>>>> > more obvious the advantages of using Kylin are; in particular, Kylin >>>>> is >>>>> > particularly advantageous in the scenarios of de-emphasis (count >>>>> distinct), >>>>> > Top N, and Percentile. In particular, Kylin's advantages in >>>>> de-weighting >>>>> > (count distinct), Top N, Percentile and other scenarios are >>>>> especially >>>>> > huge, and it is used in a large number of scenarios, such as >>>>> Dashboard, all >>>>> > kinds of reports, large-screen display, traffic statistics, and user >>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to >>>>> build >>>>> > their data service platforms, providing millions to tens of millions >>>>> of >>>>> > queries per day, and most of the queries can be completed within 2 - >>>>> 3 >>>>> > seconds. There is no better alternative for such a high concurrency >>>>> > scenario. >>>>> > >>>>> > ClickHouse, because of its MPP architecture, has high computing >>>>> power and >>>>> > is more suitable when the query request is more flexible, or when >>>>> there is >>>>> > a need for detailed queries with low concurrency. Scenarios include: >>>>> very >>>>> > many columns and where conditions are arbitrarily combined with the >>>>> user >>>>> > label filtering, not a large amount of concurrency of complex >>>>> on-the-spot >>>>> > query and so on. If the amount of data and access is large, you need >>>>> to >>>>> > deploy a distributed ClickHouse cluster, which is a higher challenge >>>>> for >>>>> > operation and maintenance. >>>>> > >>>>> > If some queries are very flexible but infrequent, it is more >>>>> > resource-efficient to use now-computing. Since the number of queries >>>>> is >>>>> > small, even if each query consumes a lot of computational resources, >>>>> it is >>>>> > still cost-effective overall. If some queries have a fixed pattern >>>>> and the >>>>> > query volume is large, it is more suitable for Kylin, because the >>>>> query >>>>> > volume is large, and by using large computational resources to save >>>>> the >>>>> > results, the upfront computational cost can be amortized over each >>>>> query, >>>>> > so it is the most economical. >>>>> > >>>>> > --- Translated with DeepL.com (free version) >>>>> > >>>>> > >>>>> > ------------------------ >>>>> > With warm regard >>>>> > Xiaoxiang Yu >>>>> > >>>>> > >>>>> > >>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>>> wrote: >>>>> > >>>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's >>>>> >> great. >>>>> >> >>>>> >> This morning there has been a new challenge to my team: clickhouse >>>>> offered >>>>> >> us the speed of calculating 8 billion rows in millisecond which is >>>>> faster >>>>> >> than my demonstration (I used Kylin to do calculating 1 billion >>>>> rows in >>>>> >> 2.9 >>>>> >> seconds) >>>>> >> >>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so >>>>> that I >>>>> >> can defend my demonstration. >>>>> >> >>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> >>>>> wrote: >>>>> >> >>>>> >> > 1. "In this important scenario of realtime analytics, the reason >>>>> here is >>>>> >> > that >>>>> >> > kylin has lag time due to model update of new segment build, is >>>>> that >>>>> >> > correct?" >>>>> >> > >>>>> >> > You are correct. >>>>> >> > >>>>> >> > 2. "If that is true, then can you suggest a work-around of >>>>> combination >>>>> >> of >>>>> >> > ... " >>>>> >> > >>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed >>>>> but not >>>>> >> > released), >>>>> >> > which can make the time-lag to about 3 minutes(that is my >>>>> estimation >>>>> >> but I >>>>> >> > am >>>>> >> > quite certain about it). >>>>> >> > NRT stands for 'near real-time', it will run a job and do >>>>> micro-batch >>>>> >> > aggregation and persistence periodically. The price is that you >>>>> need to >>>>> >> run >>>>> >> > and monitor a long-running >>>>> >> > job. This feature is based on Spark Streaming, so you need >>>>> knowledge of >>>>> >> > it. >>>>> >> > >>>>> >> > I am curious about what is the maximum time-lag your customers >>>>> >> > can tolerate? >>>>> >> > Personally, I guess minute level time-lag is ok for most cases. >>>>> >> > >>>>> >> > ------------------------ >>>>> >> > With warm regard >>>>> >> > Xiaoxiang Yu >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid >>>>> > >>>>> >> wrote: >>>>> >> > >>>>> >> > > Druid is better in >>>>> >> > > - Have a real-time datasource like Kafka etc. >>>>> >> > > >>>>> >> > > ========================== >>>>> >> > > >>>>> >> > > Hi Xiaoxiang, thank you for your response. >>>>> >> > > >>>>> >> > > In this important scenario of realtime alalytics, the reason >>>>> here is >>>>> >> that >>>>> >> > > kylin has lag time due to model update of new segment build, is >>>>> that >>>>> >> > > correct? >>>>> >> > > >>>>> >> > > If that is true, then can you suggest a work-around of >>>>> combination of >>>>> >> : >>>>> >> > > >>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide >>>>> >> > > realtime capability ? >>>>> >> > > >>>>> >> > > IMO, the point here is to find that (realtime DB update) and >>>>> >> integrate it >>>>> >> > > with (time - lag kylin cube). >>>>> >> > > >>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <x...@apache.org> >>>>> wrote: >>>>> >> > > >>>>> >> > > > I researched and tested Druid two years ago(I don't know too >>>>> much >>>>> >> about >>>>> >> > > > the change of Druid in these two years. New features that I >>>>> know >>>>> >> are : >>>>> >> > > > new UI, fully on K8s etc). >>>>> >> > > > >>>>> >> > > > Here are some cases you should consider using Druid other >>>>> than Kylin >>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid >>>>> which I >>>>> >> used >>>>> >> > two >>>>> >> > > > years ago): >>>>> >> > > > >>>>> >> > > > - Have a real-time datasource like Kafka etc. >>>>> >> > > > - Most queries are small(Based on my test result, I think >>>>> Druid had >>>>> >> > > better >>>>> >> > > > response time for small queries two years ago.) >>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the >>>>> >> K8S/public >>>>> >> > > > cloud platform as your deployment platform. >>>>> >> > > > >>>>> >> > > > But I do think there are many scenarios in which Kylin could >>>>> be >>>>> >> better, >>>>> >> > > > like: >>>>> >> > > > >>>>> >> > > > - Better performance for complex/big queries. Kylin can have >>>>> a more >>>>> >> > > > exact-match/fine-grained >>>>> >> > > > Index for queries containing different `Group By >>>>> dimensions`. >>>>> >> > > > - User-friendly UI for modeling. >>>>> >> > > > - Support 'Join' better? (Not sure at the moment) >>>>> >> > > > - ODBC driver for different BI.(its website did not show it >>>>> supports >>>>> >> > ODBC >>>>> >> > > > well) >>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. >>>>> >> > > > >>>>> >> > > > >>>>> >> > > > I don't know Pinot, so I have nothing to say about it. >>>>> >> > > > Hope to help you, or you are free to share your opinion. >>>>> >> > > > >>>>> >> > > > ------------------------ >>>>> >> > > > With warm regard >>>>> >> > > > Xiaoxiang Yu >>>>> >> > > > >>>>> >> > > > >>>>> >> > > > >>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy >>>>> <na...@vnpay.vn.invalid> >>>>> >> > > wrote: >>>>> >> > > > >>>>> >> > > >> Dear Xiaoxiang, >>>>> >> > > >> Sirs/Madams, >>>>> >> > > >> >>>>> >> > > >> May I post my boss's question: >>>>> >> > > >> >>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin >>>>> compared to >>>>> >> > Pinot >>>>> >> > > >> and >>>>> >> > > >> Druid? >>>>> >> > > >> >>>>> >> > > >> Please kindly let me know >>>>> >> > > >> >>>>> >> > > >> Thank you very much and best regards >>>>> >> > > >> >>>>> >> > > > >>>>> >> > > >>>>> >> > >>>>> >> >>>>> > >>>>> >>>>