Druid quick comparision

Xiaoxiang Yu Tue, 05 Dec 2023 18:11:14 -0800

Done. Github branch changed to kylin5.

------------------------
With warm regard
Xiaoxiang Yu




On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <[email protected]> wrote:

> A JIRA ticket has been opened, waiting for INFRA :
> https://issues.apache.org/jira/browse/INFRA-25238 .
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <[email protected]> wrote:
>
>> Thank you Xiaoxiang, please update me when you have changed your default
>> branch. In case people are impressed by the numbers then I hope to turn
>> this situation to reverse direction.
>>
>> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <[email protected]> wrote:
>>
>>> The default branch is for 4.X which is a maintained branch, the active
>>> branch is kylin5.
>>> I will change the default branch to kylin5 later.
>>>
>>> ------------------------
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <[email protected]>
>>> wrote:
>>>
>>>> Hi Xiaoxiang, Sirs / Madams
>>>>
>>>> Can you see the atttached photo
>>>>
>>>> My boss asked that why druid commit code regularly but kylin had not
>>>> been committed since July
>>>>
>>>>
>>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <[email protected]> wrote:
>>>>
>>>>> I think so.
>>>>>
>>>>> Response time is not the only factor to make a decision. Kylin could
>>>>> be cheaper
>>>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>>>> guarantee
>>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>>>>> query scenario.
>>>>>
>>>>> By the way, Youzan and Kyligence combine them together to provide
>>>>> unified data analytics services for their customers.
>>>>>
>>>>> ------------------------
>>>>> With warm regard
>>>>> Xiaoxiang Yu
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Xiaoxiang, thank you
>>>>>>
>>>>>> In case my client uses cloud computing service like gcp or aws, which
>>>>>> will cost more: precalculation feature of kylin or clickhouse (incase
>>>>>> of
>>>>>> kylin, I have a thought that the query execution has been done once
>>>>>> and
>>>>>> stored in cube to be used many times so kylin uses less cloud
>>>>>> computation,
>>>>>> is that true)?
>>>>>>
>>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <[email protected]> wrote:
>>>>>>
>>>>>> > Following text is part of an article(
>>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> ===============================================================================
>>>>>> >
>>>>>> > Kylin is suitable for aggregation queries with fixed modes because
>>>>>> of its
>>>>>> > pre-calculated technology, for example, join, group by, and where
>>>>>> condition
>>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>>>> is, the
>>>>>> > more obvious the advantages of using Kylin are; in particular,
>>>>>> Kylin is
>>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>>>> distinct),
>>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>>>> de-weighting
>>>>>> > (count distinct), Top N, Percentile and other scenarios are
>>>>>> especially
>>>>>> > huge, and it is used in a large number of scenarios, such as
>>>>>> Dashboard, all
>>>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>>>>>> to build
>>>>>> > their data service platforms, providing millions to tens of
>>>>>> millions of
>>>>>> > queries per day, and most of the queries can be completed within 2
>>>>>> - 3
>>>>>> > seconds. There is no better alternative for such a high concurrency
>>>>>> > scenario.
>>>>>> >
>>>>>> > ClickHouse, because of its MPP architecture, has high computing
>>>>>> power and
>>>>>> > is more suitable when the query request is more flexible, or when
>>>>>> there is
>>>>>> > a need for detailed queries with low concurrency. Scenarios
>>>>>> include: very
>>>>>> > many columns and where conditions are arbitrarily combined with the
>>>>>> user
>>>>>> > label filtering, not a large amount of concurrency of complex
>>>>>> on-the-spot
>>>>>> > query and so on. If the amount of data and access is large, you
>>>>>> need to
>>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>>>>>> challenge for
>>>>>> > operation and maintenance.
>>>>>> >
>>>>>> > If some queries are very flexible but infrequent, it is more
>>>>>> > resource-efficient to use now-computing. Since the number of
>>>>>> queries is
>>>>>> > small, even if each query consumes a lot of computational
>>>>>> resources, it is
>>>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>>>> and the
>>>>>> > query volume is large, it is more suitable for Kylin, because the
>>>>>> query
>>>>>> > volume is large, and by using large computational resources to save
>>>>>> the
>>>>>> > results, the upfront computational cost can be amortized over each
>>>>>> query,
>>>>>> > so it is the most economical.
>>>>>> >
>>>>>> > --- Translated with DeepL.com (free version)
>>>>>> >
>>>>>> >
>>>>>> > ------------------------
>>>>>> > With warm regard
>>>>>> > Xiaoxiang Yu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <[email protected]>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>>>>>> That's
>>>>>> >> great.
>>>>>> >>
>>>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>>>> offered
>>>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>>>> faster
>>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>>>>>> rows in
>>>>>> >> 2.9
>>>>>> >> seconds)
>>>>>> >>
>>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>>>> that I
>>>>>> >> can defend my demonstration.
>>>>>> >>
>>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <[email protected]>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>>>> here is
>>>>>> >> > that
>>>>>> >> > kylin has lag time due to model update of new segment build, is
>>>>>> that
>>>>>> >> > correct?"
>>>>>> >> >
>>>>>> >> > You are correct.
>>>>>> >> >
>>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>>>> combination
>>>>>> >> of
>>>>>> >> > ... "
>>>>>> >> >
>>>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>>>> but not
>>>>>> >> > released),
>>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>>>> estimation
>>>>>> >> but I
>>>>>> >> > am
>>>>>> >> > quite certain about it).
>>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>>>> micro-batch
>>>>>> >> > aggregation and persistence periodically. The price is that you
>>>>>> need to
>>>>>> >> run
>>>>>> >> > and monitor a long-running
>>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>>>> knowledge of
>>>>>> >> > it.
>>>>>> >> >
>>>>>> >> > I am curious about what is the maximum time-lag your customers
>>>>>> >> > can tolerate?
>>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>>>> >> >
>>>>>> >> > ------------------------
>>>>>> >> > With warm regard
>>>>>> >> > Xiaoxiang Yu
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>>>>>> <[email protected]>
>>>>>> >> wrote:
>>>>>> >> >
>>>>>> >> > > Druid is better in
>>>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>>>> >> > >
>>>>>> >> > > ==========================
>>>>>> >> > >
>>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>>>> >> > >
>>>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>>>> here is
>>>>>> >> that
>>>>>> >> > > kylin has lag time due to model update of new segment build,
>>>>>> is that
>>>>>> >> > > correct?
>>>>>> >> > >
>>>>>> >> > > If that is true, then can you suggest a work-around of
>>>>>> combination of
>>>>>> >> :
>>>>>> >> > >
>>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>>>> >> > > realtime capability ?
>>>>>> >> > >
>>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>>>> >> integrate it
>>>>>> >> > > with (time - lag kylin cube).
>>>>>> >> > >
>>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <[email protected]>
>>>>>> wrote:
>>>>>> >> > >
>>>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>>>> much
>>>>>> >> about
>>>>>> >> > > >  the change of Druid in these two years. New features that I
>>>>>> know
>>>>>> >> are :
>>>>>> >> > > > new UI, fully on K8s etc).
>>>>>> >> > > >
>>>>>> >> > > > Here are some cases you should consider using Druid other
>>>>>> than Kylin
>>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>>>>>> which I
>>>>>> >> used
>>>>>> >> > two
>>>>>> >> > > > years ago):
>>>>>> >> > > >
>>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>>>> Druid had
>>>>>> >> > > better
>>>>>> >> > > > response time for small queries two years ago.)
>>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>>>> >> K8S/public
>>>>>> >> > > >   cloud platform as your deployment platform.
>>>>>> >> > > >
>>>>>> >> > > > But I do think there are many scenarios in which Kylin could
>>>>>> be
>>>>>> >> better,
>>>>>> >> > > > like:
>>>>>> >> > > >
>>>>>> >> > > > - Better performance for complex/big queries. Kylin can have
>>>>>> a more
>>>>>> >> > > > exact-match/fine-grained
>>>>>> >> > > >   Index for queries containing different `Group By
>>>>>> dimensions`.
>>>>>> >> > > > - User-friendly UI for modeling.
>>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>>>> supports
>>>>>> >> > ODBC
>>>>>> >> > > > well)
>>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>>>> >> > > >
>>>>>> >> > > > ------------------------
>>>>>> >> > > > With warm regard
>>>>>> >> > > > Xiaoxiang Yu
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>>>> <[email protected]>
>>>>>> >> > > wrote:
>>>>>> >> > > >
>>>>>> >> > > >> Dear Xiaoxiang,
>>>>>> >> > > >> Sirs/Madams,
>>>>>> >> > > >>
>>>>>> >> > > >> May I post my boss's question:
>>>>>> >> > > >>
>>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>>>> compared to
>>>>>> >> > Pinot
>>>>>> >> > > >> and
>>>>>> >> > > >> Druid?
>>>>>> >> > > >>
>>>>>> >> > > >> Please kindly let me know
>>>>>> >> > > >>
>>>>>> >> > > >> Thank you very much and best regards
>>>>>> >> > > >>
>>>>>> >> > > >
>>>>>> >> > >
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to