Druid quick comparision

Xiaoxiang Yu Wed, 06 Dec 2023 18:26:38 -0800

Since 2018 there are a lot of new features and code refactor.
If you like, you can share your ppt to me privately, maybe I can
give some comments.


Here is the reference of advantages of Kylin since 2018:
- https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
-
https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
- https://kylin.apache.org/5.0/docs/development/roadmap

------------------------
With warm regard
Xiaoxiang Yu



On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <[email protected]> wrote:

> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
> my team.
>
> I found this article and would like you to update me the advantages of
> Kylin since 2018 until now (especially with version 5 to be released)
>
> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> <
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >
>
> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <[email protected]> wrote:
>
> > Thank you very much for your prompt response, I still have several
> > questions to seek for your help later.
> >
> > Best regards and have a good day
> >
> >
> >
> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <[email protected]> wrote:
> >
> >> Done. Github branch changed to kylin5.
> >>
> >> ------------------------
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <[email protected]> wrote:
> >>
> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <[email protected]>
> >> wrote:
> >> >
> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> default
> >> >> branch. In case people are impressed by the numbers then I hope to
> turn
> >> >> this situation to reverse direction.
> >> >>
> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <[email protected]> wrote:
> >> >>
> >> >>> The default branch is for 4.X which is a maintained branch, the
> active
> >> >>> branch is kylin5.
> >> >>> I will change the default branch to kylin5 later.
> >> >>>
> >> >>> ------------------------
> >> >>> With warm regard
> >> >>> Xiaoxiang Yu
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <[email protected]>
> >> >>> wrote:
> >> >>>
> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >>>>
> >> >>>> Can you see the atttached photo
> >> >>>>
> >> >>>> My boss asked that why druid commit code regularly but kylin had
> not
> >> >>>> been committed since July
> >> >>>>
> >> >>>>
> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <[email protected]> wrote:
> >> >>>>
> >> >>>>> I think so.
> >> >>>>>
> >> >>>>> Response time is not the only factor to make a decision. Kylin
> could
> >> >>>>> be cheaper
> >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
> >> can
> >> >>>>> guarantee
> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
> >> >>>>> query scenario.
> >> >>>>>
> >> >>>>> By the way, Youzan and Kyligence combine them together to provide
> >> >>>>> unified data analytics services for their customers.
> >> >>>>>
> >> >>>>> ------------------------
> >> >>>>> With warm regard
> >> >>>>> Xiaoxiang Yu
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <[email protected]
> >
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Hi Xiaoxiang, thank you
> >> >>>>>>
> >> >>>>>> In case my client uses cloud computing service like gcp or aws,
> >> which
> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> >> (incase
> >> >>>>>> of
> >> >>>>>> kylin, I have a thought that the query execution has been done
> once
> >> >>>>>> and
> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >> >>>>>> computation,
> >> >>>>>> is that true)?
> >> >>>>>>
> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <[email protected]>
> >> wrote:
> >> >>>>>>
> >> >>>>>> > Following text is part of an article(
> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>>
> >>
> ===============================================================================
> >> >>>>>> >
> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
> >> because
> >> >>>>>> of its
> >> >>>>>> > pre-calculated technology, for example, join, group by, and
> where
> >> >>>>>> condition
> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
> >> volume
> >> >>>>>> is, the
> >> >>>>>> > more obvious the advantages of using Kylin are; in particular,
> >> >>>>>> Kylin is
> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> (count
> >> >>>>>> distinct),
> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >>>>>> de-weighting
> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >> >>>>>> especially
> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >> >>>>>> Dashboard, all
> >> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
> >> user
> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
> Kylin
> >> >>>>>> to build
> >> >>>>>> > their data service platforms, providing millions to tens of
> >> >>>>>> millions of
> >> >>>>>> > queries per day, and most of the queries can be completed
> within
> >> 2
> >> >>>>>> - 3
> >> >>>>>> > seconds. There is no better alternative for such a high
> >> concurrency
> >> >>>>>> > scenario.
> >> >>>>>> >
> >> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
> >> >>>>>> power and
> >> >>>>>> > is more suitable when the query request is more flexible, or
> when
> >> >>>>>> there is
> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >> >>>>>> include: very
> >> >>>>>> > many columns and where conditions are arbitrarily combined with
> >> the
> >> >>>>>> user
> >> >>>>>> > label filtering, not a large amount of concurrency of complex
> >> >>>>>> on-the-spot
> >> >>>>>> > query and so on. If the amount of data and access is large, you
> >> >>>>>> need to
> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >> >>>>>> challenge for
> >> >>>>>> > operation and maintenance.
> >> >>>>>> >
> >> >>>>>> > If some queries are very flexible but infrequent, it is more
> >> >>>>>> > resource-efficient to use now-computing. Since the number of
> >> >>>>>> queries is
> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >>>>>> resources, it is
> >> >>>>>> > still cost-effective overall. If some queries have a fixed
> >> pattern
> >> >>>>>> and the
> >> >>>>>> > query volume is large, it is more suitable for Kylin, because
> the
> >> >>>>>> query
> >> >>>>>> > volume is large, and by using large computational resources to
> >> save
> >> >>>>>> the
> >> >>>>>> > results, the upfront computational cost can be amortized over
> >> each
> >> >>>>>> query,
> >> >>>>>> > so it is the most economical.
> >> >>>>>> >
> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > ------------------------
> >> >>>>>> > With warm regard
> >> >>>>>> > Xiaoxiang Yu
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> <[email protected]
> >> >
> >> >>>>>> wrote:
> >> >>>>>> >
> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
> >> >>>>>> That's
> >> >>>>>> >> great.
> >> >>>>>> >>
> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> clickhouse
> >> >>>>>> offered
> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
> which
> >> is
> >> >>>>>> faster
> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> billion
> >> >>>>>> rows in
> >> >>>>>> >> 2.9
> >> >>>>>> >> seconds)
> >> >>>>>> >>
> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> clickhouse
> >> so
> >> >>>>>> that I
> >> >>>>>> >> can defend my demonstration.
> >> >>>>>> >>
> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <[email protected]>
> >> >>>>>> wrote:
> >> >>>>>> >>
> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> >> reason
> >> >>>>>> here is
> >> >>>>>> >> > that
> >> >>>>>> >> > kylin has lag time due to model update of new segment build,
> >> is
> >> >>>>>> that
> >> >>>>>> >> > correct?"
> >> >>>>>> >> >
> >> >>>>>> >> > You are correct.
> >> >>>>>> >> >
> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
> >> >>>>>> combination
> >> >>>>>> >> of
> >> >>>>>> >> > ... "
> >> >>>>>> >> >
> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> completed
> >> >>>>>> but not
> >> >>>>>> >> > released),
> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >> >>>>>> estimation
> >> >>>>>> >> but I
> >> >>>>>> >> > am
> >> >>>>>> >> > quite certain about it).
> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >> >>>>>> micro-batch
> >> >>>>>> >> > aggregation and persistence periodically. The price is that
> >> you
> >> >>>>>> need to
> >> >>>>>> >> run
> >> >>>>>> >> > and monitor a long-running
> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
> >> >>>>>> knowledge of
> >> >>>>>> >> > it.
> >> >>>>>> >> >
> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> customers
> >> >>>>>> >> > can tolerate?
> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
> >> cases.
> >> >>>>>> >> >
> >> >>>>>> >> > ------------------------
> >> >>>>>> >> > With warm regard
> >> >>>>>> >> > Xiaoxiang Yu
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >>>>>> <[email protected]>
> >> >>>>>> >> wrote:
> >> >>>>>> >> >
> >> >>>>>> >> > > Druid is better in
> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >>>>>> >> > >
> >> >>>>>> >> > > ==========================
> >> >>>>>> >> > >
> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >>>>>> >> > >
> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
> reason
> >> >>>>>> here is
> >> >>>>>> >> that
> >> >>>>>> >> > > kylin has lag time due to model update of new segment
> build,
> >> >>>>>> is that
> >> >>>>>> >> > > correct?
> >> >>>>>> >> > >
> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >> >>>>>> combination of
> >> >>>>>> >> :
> >> >>>>>> >> > >
> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >> >>>>>> >> > > realtime capability ?
> >> >>>>>> >> > >
> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update)
> and
> >> >>>>>> >> integrate it
> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >>>>>> >> > >
> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> [email protected]>
> >> >>>>>> wrote:
> >> >>>>>> >> > >
> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
> >> too
> >> >>>>>> much
> >> >>>>>> >> about
> >> >>>>>> >> > > >  the change of Druid in these two years. New features
> >> that I
> >> >>>>>> know
> >> >>>>>> >> are :
> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > Here are some cases you should consider using Druid
> other
> >> >>>>>> than Kylin
> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
> >> >>>>>> which I
> >> >>>>>> >> used
> >> >>>>>> >> > two
> >> >>>>>> >> > > > years ago):
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
> think
> >> >>>>>> Druid had
> >> >>>>>> >> > > better
> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use
> the
> >> >>>>>> >> K8S/public
> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
> >> could
> >> >>>>>> be
> >> >>>>>> >> better,
> >> >>>>>> >> > > > like:
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
> >> have
> >> >>>>>> a more
> >> >>>>>> >> > > > exact-match/fine-grained
> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >>>>>> dimensions`.
> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show
> >> it
> >> >>>>>> supports
> >> >>>>>> >> > ODBC
> >> >>>>>> >> > > > well)
> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
> >> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > ------------------------
> >> >>>>>> >> > > > With warm regard
> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >>>>>> <[email protected]>
> >> >>>>>> >> > > wrote:
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >>>>>> >> > > >> Sirs/Madams,
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> May I post my boss's question:
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
> >> >>>>>> compared to
> >> >>>>>> >> > Pinot
> >> >>>>>> >> > > >> and
> >> >>>>>> >> > > >> Druid?
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> Please kindly let me know
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >
> >> >>>>>> >> > >
> >> >>>>>> >> >
> >> >>>>>> >>
> >> >>>>>> >
> >> >>>>>>
> >> >>>>>
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to