Druid quick comparision

Xiaoxiang Yu Mon, 04 Dec 2023 00:33:53 -0800

I think so.

Response time is not the only factor to make a decision. Kylin could be
cheaper
when the query pattern is suitable for the Kylin model, and Kylin can
guarantee
reasonable query latency. Clickhouse will be quicker in an ad hoc query
scenario.


By the way, Youzan and Kyligence combine them together to provide
unified data analytics services for their customers.

------------------------
With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi Xiaoxiang, thank you
>
> In case my client uses cloud computing service like gcp or aws, which
> will cost more: precalculation feature of kylin or clickhouse (incase of
> kylin, I have a thought that the query execution has been done once and
> stored in cube to be used many times so kylin uses less cloud computation,
> is that true)?
>
> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> wrote:
>
> > Following text is part of an article(
> > https://zhuanlan.zhihu.com/p/343394287) .
> >
> >
> >
> ===============================================================================
> >
> > Kylin is suitable for aggregation queries with fixed modes because of its
> > pre-calculated technology, for example, join, group by, and where
> condition
> > modes in SQL are relatively fixed, etc. The larger the data volume is,
> the
> > more obvious the advantages of using Kylin are; in particular, Kylin is
> > particularly advantageous in the scenarios of de-emphasis (count
> distinct),
> > Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> > (count distinct), Top N, Percentile and other scenarios are especially
> > huge, and it is used in a large number of scenarios, such as Dashboard,
> all
> > kinds of reports, large-screen display, traffic statistics, and user
> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
> build
> > their data service platforms, providing millions to tens of millions of
> > queries per day, and most of the queries can be completed within 2 - 3
> > seconds. There is no better alternative for such a high concurrency
> > scenario.
> >
> > ClickHouse, because of its MPP architecture, has high computing power and
> > is more suitable when the query request is more flexible, or when there
> is
> > a need for detailed queries with low concurrency. Scenarios include: very
> > many columns and where conditions are arbitrarily combined with the user
> > label filtering, not a large amount of concurrency of complex on-the-spot
> > query and so on. If the amount of data and access is large, you need to
> > deploy a distributed ClickHouse cluster, which is a higher challenge for
> > operation and maintenance.
> >
> > If some queries are very flexible but infrequent, it is more
> > resource-efficient to use now-computing. Since the number of queries is
> > small, even if each query consumes a lot of computational resources, it
> is
> > still cost-effective overall. If some queries have a fixed pattern and
> the
> > query volume is large, it is more suitable for Kylin, because the query
> > volume is large, and by using large computational resources to save the
> > results, the upfront computational cost can be amortized over each query,
> > so it is the most economical.
> >
> > --- Translated with DeepL.com (free version)
> >
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you Xiaoxiang for the near real time streaming feature. That's
> >> great.
> >>
> >> This morning there has been a new challenge to my team: clickhouse
> offered
> >> us the speed of calculating 8 billion rows in millisecond which is
> faster
> >> than my demonstration (I used Kylin to do calculating 1 billion rows in
> >> 2.9
> >> seconds)
> >>
> >> Can you briefly suggest the advantages of kylin over clickhouse so that
> I
> >> can defend my demonstration.
> >>
> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> wrote:
> >>
> >> > 1. "In this important scenario of realtime analytics, the reason here
> is
> >> > that
> >> > kylin has lag time due to model update of new segment build, is that
> >> > correct?"
> >> >
> >> > You are correct.
> >> >
> >> > 2. "If that is true, then can you suggest a work-around of combination
> >> of
> >> > ... "
> >> >
> >> > Kylin is planning to introduce NRT streaming(coding is completed but
> not
> >> > released),
> >> > which can make the time-lag to about 3 minutes(that is my estimation
> >> but I
> >> > am
> >> > quite certain about it).
> >> > NRT stands for 'near real-time', it will run a job and do micro-batch
> >> > aggregation and persistence periodically. The price is that you need
> to
> >> run
> >> > and monitor a long-running
> >> >  job. This feature is based on Spark Streaming, so you need knowledge
> of
> >> > it.
> >> >
> >> > I am curious about what is the maximum time-lag your customers
> >> > can tolerate?
> >> > Personally, I guess minute level time-lag is ok for most cases.
> >> >
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> wrote:
> >> >
> >> > > Druid is better in
> >> > > - Have a real-time datasource like Kafka etc.
> >> > >
> >> > > ==========================
> >> > >
> >> > > Hi Xiaoxiang, thank you for your response.
> >> > >
> >> > > In this important scenario of realtime alalytics, the reason here is
> >> that
> >> > > kylin has lag time due to model update of new segment build, is that
> >> > > correct?
> >> > >
> >> > > If that is true, then can you suggest a work-around of combination
> of
> >> :
> >> > >
> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >> > > realtime capability ?
> >> > >
> >> > > IMO, the point here is to find that (realtime DB update) and
> >> integrate it
> >> > > with (time - lag kylin cube).
> >> > >
> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <x...@apache.org>
> wrote:
> >> > >
> >> > > > I researched and tested Druid two years ago(I don't know too much
> >> about
> >> > > >  the change of Druid in these two years. New features that I know
> >> are :
> >> > > > new UI, fully on K8s etc).
> >> > > >
> >> > > > Here are some cases you should consider using Druid other than
> Kylin
> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
> >> used
> >> > two
> >> > > > years ago):
> >> > > >
> >> > > > - Have a real-time datasource like Kafka etc.
> >> > > > - Most queries are small(Based on my test result, I think Druid
> had
> >> > > better
> >> > > > response time for small queries two years ago.)
> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
> >> K8S/public
> >> > > >   cloud platform as your deployment platform.
> >> > > >
> >> > > > But I do think there are many scenarios in which Kylin could be
> >> better,
> >> > > > like:
> >> > > >
> >> > > > - Better performance for complex/big queries. Kylin can have a
> more
> >> > > > exact-match/fine-grained
> >> > > >   Index for queries containing different `Group By dimensions`.
> >> > > > - User-friendly UI for modeling.
> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> > > > - ODBC driver for different BI.(its website did not show it
> supports
> >> > ODBC
> >> > > > well)
> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >> > > >
> >> > > >
> >> > > > I don't know Pinot, so I have nothing to say about it.
> >> > > > Hope to help you, or you are free to share your opinion.
> >> > > >
> >> > > > ------------------------
> >> > > > With warm regard
> >> > > > Xiaoxiang Yu
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid
> >
> >> > > wrote:
> >> > > >
> >> > > >> Dear Xiaoxiang,
> >> > > >> Sirs/Madams,
> >> > > >>
> >> > > >> May I post my boss's question:
> >> > > >>
> >> > > >> What are the pros and cons of the OLAP platform Kylin compared to
> >> > Pinot
> >> > > >> and
> >> > > >> Druid?
> >> > > >>
> >> > > >> Please kindly let me know
> >> > > >>
> >> > > >> Thank you very much and best regards
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to