Druid quick comparision

Xiaoxiang Yu Mon, 04 Dec 2023 19:15:08 -0800

A JIRA ticket has been opened, waiting for INFRA :
https://issues.apache.org/jira/browse/INFRA-25238 .
------------------------
With warm regard
Xiaoxiang Yu




On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <[email protected]> wrote:

> Thank you Xiaoxiang, please update me when you have changed your default
> branch. In case people are impressed by the numbers then I hope to turn
> this situation to reverse direction.
>
> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <[email protected]> wrote:
>
>> The default branch is for 4.X which is a maintained branch, the active
>> branch is kylin5.
>> I will change the default branch to kylin5 later.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <[email protected]> wrote:
>>
>>> Hi Xiaoxiang, Sirs / Madams
>>>
>>> Can you see the atttached photo
>>>
>>> My boss asked that why druid commit code regularly but kylin had not
>>> been committed since July
>>>
>>>
>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <[email protected]> wrote:
>>>
>>>> I think so.
>>>>
>>>> Response time is not the only factor to make a decision. Kylin could be
>>>> cheaper
>>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>>> guarantee
>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc query
>>>> scenario.
>>>>
>>>> By the way, Youzan and Kyligence combine them together to provide
>>>> unified data analytics services for their customers.
>>>>
>>>> ------------------------
>>>> With warm regard
>>>> Xiaoxiang Yu
>>>>
>>>>
>>>>
>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Xiaoxiang, thank you
>>>>>
>>>>> In case my client uses cloud computing service like gcp or aws, which
>>>>> will cost more: precalculation feature of kylin or clickhouse (incase
>>>>> of
>>>>> kylin, I have a thought that the query execution has been done once and
>>>>> stored in cube to be used many times so kylin uses less cloud
>>>>> computation,
>>>>> is that true)?
>>>>>
>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <[email protected]> wrote:
>>>>>
>>>>> > Following text is part of an article(
>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>>> >
>>>>> >
>>>>> >
>>>>> ===============================================================================
>>>>> >
>>>>> > Kylin is suitable for aggregation queries with fixed modes because
>>>>> of its
>>>>> > pre-calculated technology, for example, join, group by, and where
>>>>> condition
>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>>> is, the
>>>>> > more obvious the advantages of using Kylin are; in particular, Kylin
>>>>> is
>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>>> distinct),
>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>>> de-weighting
>>>>> > (count distinct), Top N, Percentile and other scenarios are
>>>>> especially
>>>>> > huge, and it is used in a large number of scenarios, such as
>>>>> Dashboard, all
>>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>>>>> build
>>>>> > their data service platforms, providing millions to tens of millions
>>>>> of
>>>>> > queries per day, and most of the queries can be completed within 2 -
>>>>> 3
>>>>> > seconds. There is no better alternative for such a high concurrency
>>>>> > scenario.
>>>>> >
>>>>> > ClickHouse, because of its MPP architecture, has high computing
>>>>> power and
>>>>> > is more suitable when the query request is more flexible, or when
>>>>> there is
>>>>> > a need for detailed queries with low concurrency. Scenarios include:
>>>>> very
>>>>> > many columns and where conditions are arbitrarily combined with the
>>>>> user
>>>>> > label filtering, not a large amount of concurrency of complex
>>>>> on-the-spot
>>>>> > query and so on. If the amount of data and access is large, you need
>>>>> to
>>>>> > deploy a distributed ClickHouse cluster, which is a higher challenge
>>>>> for
>>>>> > operation and maintenance.
>>>>> >
>>>>> > If some queries are very flexible but infrequent, it is more
>>>>> > resource-efficient to use now-computing. Since the number of queries
>>>>> is
>>>>> > small, even if each query consumes a lot of computational resources,
>>>>> it is
>>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>>> and the
>>>>> > query volume is large, it is more suitable for Kylin, because the
>>>>> query
>>>>> > volume is large, and by using large computational resources to save
>>>>> the
>>>>> > results, the upfront computational cost can be amortized over each
>>>>> query,
>>>>> > so it is the most economical.
>>>>> >
>>>>> > --- Translated with DeepL.com (free version)
>>>>> >
>>>>> >
>>>>> > ------------------------
>>>>> > With warm regard
>>>>> > Xiaoxiang Yu
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>>>>> >> great.
>>>>> >>
>>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>>> offered
>>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>>> faster
>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>>>>> rows in
>>>>> >> 2.9
>>>>> >> seconds)
>>>>> >>
>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>>> that I
>>>>> >> can defend my demonstration.
>>>>> >>
>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <[email protected]>
>>>>> wrote:
>>>>> >>
>>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>>> here is
>>>>> >> > that
>>>>> >> > kylin has lag time due to model update of new segment build, is
>>>>> that
>>>>> >> > correct?"
>>>>> >> >
>>>>> >> > You are correct.
>>>>> >> >
>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>>> combination
>>>>> >> of
>>>>> >> > ... "
>>>>> >> >
>>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>>> but not
>>>>> >> > released),
>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>>> estimation
>>>>> >> but I
>>>>> >> > am
>>>>> >> > quite certain about it).
>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>>> micro-batch
>>>>> >> > aggregation and persistence periodically. The price is that you
>>>>> need to
>>>>> >> run
>>>>> >> > and monitor a long-running
>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>>> knowledge of
>>>>> >> > it.
>>>>> >> >
>>>>> >> > I am curious about what is the maximum time-lag your customers
>>>>> >> > can tolerate?
>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>>> >> >
>>>>> >> > ------------------------
>>>>> >> > With warm regard
>>>>> >> > Xiaoxiang Yu
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <[email protected]
>>>>> >
>>>>> >> wrote:
>>>>> >> >
>>>>> >> > > Druid is better in
>>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>>> >> > >
>>>>> >> > > ==========================
>>>>> >> > >
>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>>> >> > >
>>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>>> here is
>>>>> >> that
>>>>> >> > > kylin has lag time due to model update of new segment build, is
>>>>> that
>>>>> >> > > correct?
>>>>> >> > >
>>>>> >> > > If that is true, then can you suggest a work-around of
>>>>> combination of
>>>>> >> :
>>>>> >> > >
>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>>> >> > > realtime capability ?
>>>>> >> > >
>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>>> >> integrate it
>>>>> >> > > with (time - lag kylin cube).
>>>>> >> > >
>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <[email protected]>
>>>>> wrote:
>>>>> >> > >
>>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>>> much
>>>>> >> about
>>>>> >> > > >  the change of Druid in these two years. New features that I
>>>>> know
>>>>> >> are :
>>>>> >> > > > new UI, fully on K8s etc).
>>>>> >> > > >
>>>>> >> > > > Here are some cases you should consider using Druid other
>>>>> than Kylin
>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>>>>> which I
>>>>> >> used
>>>>> >> > two
>>>>> >> > > > years ago):
>>>>> >> > > >
>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>>> Druid had
>>>>> >> > > better
>>>>> >> > > > response time for small queries two years ago.)
>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>>> >> K8S/public
>>>>> >> > > >   cloud platform as your deployment platform.
>>>>> >> > > >
>>>>> >> > > > But I do think there are many scenarios in which Kylin could
>>>>> be
>>>>> >> better,
>>>>> >> > > > like:
>>>>> >> > > >
>>>>> >> > > > - Better performance for complex/big queries. Kylin can have
>>>>> a more
>>>>> >> > > > exact-match/fine-grained
>>>>> >> > > >   Index for queries containing different `Group By
>>>>> dimensions`.
>>>>> >> > > > - User-friendly UI for modeling.
>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>>> supports
>>>>> >> > ODBC
>>>>> >> > > > well)
>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>>> >> > > >
>>>>> >> > > > ------------------------
>>>>> >> > > > With warm regard
>>>>> >> > > > Xiaoxiang Yu
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>>> <[email protected]>
>>>>> >> > > wrote:
>>>>> >> > > >
>>>>> >> > > >> Dear Xiaoxiang,
>>>>> >> > > >> Sirs/Madams,
>>>>> >> > > >>
>>>>> >> > > >> May I post my boss's question:
>>>>> >> > > >>
>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>>> compared to
>>>>> >> > Pinot
>>>>> >> > > >> and
>>>>> >> > > >> Druid?
>>>>> >> > > >>
>>>>> >> > > >> Please kindly let me know
>>>>> >> > > >>
>>>>> >> > > >> Thank you very much and best regards
>>>>> >> > > >>
>>>>> >> > > >
>>>>> >> > >
>>>>> >> >
>>>>> >>
>>>>> >
>>>>>
>>>>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to