Re: IndexR, a new storage plugin for Drill

Robin Moffatt Wed, 04 Jan 2017 04:52:55 -0800

Hi Flow Wei,
This looks pretty interesting. Any comments on comparison of indexR with
Apache Kudu?


thanks, Robin.

On 4 January 2017 at 11:20, WeiWan <wei...@sunteng.com> wrote:

> Hi Nicolas,
>
> > 1)Does both drill and hive support predicat pushdown with indexR ? I mean
> > using the indexes and not scanning table.
>
> Of course we supports predicates pushdown.
> IndexR implements a special index so called Rough Set Index, which is very
> suitable for statistic queries. It can effectively filter out those
> irrelevant data chunks and cost very little comparing to other index form.
> The idea is original comes from Infobright (ICE). I'm sure you can find
> many useful links by google with “infobright rough set”. In some aspects
> you can think IndexR as another Infobright which is open source,
> distributed, on Hadoop and realtime ingest supported.
>
>
> > 2)Does it support join push down, sort etc ?
>
> It does not. Those job should be done by query layer, i.e. Drill.
> But we did hope Drill can support aggregation push down, which can really
> speed up queries in the cases like “select count(*), sum(a), max(b) form
> table"
>
> > 3)Can you elaborate why your team choose Drill versus equivalent (impala,
> > presto…)
>
> We are not very familiar with Impala, Presto. But we did tried Spark. We
> didn’t choose Spark because at that time, early 2016, Spark’s API for
> scanner is not stable enough, and we need the processes running on local
> machines, instead of running on Yarn. And most of all, we love Drill for
> its stability, efficiency, simplicity, and the nice interface for storage
> plugin.
>
> Regards
> Flow Wei
>
>
>
> > On Jan 4, 2017, at 16:32, Nicolas Paris <nipari...@gmail.com> wrote:
> >
> > Hi Weiwan,
> >
> > 1)Does both drill and hive support predicat pushdown with indexR ? I mean
> > using the indexes and not scanning table.
> > 2)Does it support join push down, sort etc ?
> > 3)Can you elaborate why your team choose Drill versus equivalent (impala,
> > presto...)
> >
> > Thanks !
> >
> >
> >
> > 2017-01-04 2:59 GMT+01:00 WeiWan <wei...@sunteng.com>:
> >
> >> Hi,
> >>
> >> It will take some time for IndexR plugin to merge into Drill. But you
> can
> >> try it out already by following those documents.
> >>
> >> Compilation:  https://github.com/shunfei/indexr/wiki/Compilation <
> >> https://github.com/shunfei/indexr/wiki/Compilation>
> >> Deployment:  https://github.com/shunfei/indexr/wiki/Deployment <
> >> https://github.com/shunfei/indexr/wiki/Deployment>
> >> User Guide:  https://github.com/shunfei/indexr/wiki/User-Guide <
> >> https://github.com/shunfei/indexr/wiki/User-Guide>
> >> Regards
> >> Flow Wei
> >>
> >>
> >>
> >>> On Jan 4, 2017, at 00:22, Jinfeng Ni <j...@apache.org> wrote:
> >>>
> >>> Looks like IndexR is very interesting storage plugin. Although I have
> >>> not looked into the detail, I'm looking forward to seeing the PR and
> >>> hopefully getting this into Drill!
> >>>
> >>> Thanks,
> >>>
> >>> Jinfeng
> >>>
> >>>
> >>> On Tue, Jan 3, 2017 at 7:30 AM, WeiWan <wei...@sunteng.com> wrote:
> >>>> Hi Charles,
> >>>>
> >>>> It would be great if IndexR plugin can be merged into official Drill
> >> project. I will do some more tests based on latest Drill version and
> submit
> >> a PR.
> >>>>
> >>>> Regards
> >>>> Flow Wei
> >>>>
> >>>>
> >>>>
> >>>>> On Jan 3, 2017, at 23:18, Charles Givre <cgi...@gmail.com> wrote:
> >>>>>
> >>>>> This sounds really interesting.  Will you be submitting a PR to
> >> integrate this into the main Drill codebase?
> >>>>> — C
> >>>>>
> >>>>>> On Jan 3, 2017, at 03:35, WeiWan <wei...@sunteng.com> wrote:
> >>>>>>
> >>>>>> IndexR is a distributed, columnar storage system based on HDFS,
> which
> >> focus on fast analyse, both for massive static(historical) data and
> rapidly
> >> ingesting realtime data. IndexR is designed for OLAP.
> >>>>>>
> >>>>>> Fast analyze on large dataset
> >>>>>> Realtime ingestion with zero delay for query
> >>>>>> Deep integration with Hadoop ecosystem
> >>>>>> Hardware efficiency
> >>>>>> Highly avaliable, scalable, manageable and simple
> >>>>>> Adapted with popular query engines like Apache Drill, Apache Hive,
> >> etc.
> >>>>>>
> >>>>>> And now it is open source.
> >>>>>>
> >>>>>> Project: https://github.com/shunfei/indexr <
> >> https://github.com/shunfei/indexr>
> >>>>>> Wiki: https://github.com/shunfei/indexr/wiki <
> >> https://github.com/shunfei/indexr/wiki>
> >>>>>>
> >>>>>> IndexR is original developed by Sunteng Tech. This project started a
> >> year ago and now has been deployed to several productions in our
> company.
> >> The whole cluster consumes over 30 billions events each day in realtime
> >> from Kafka. The largest table contains over 10 billions rows (after
> rollup)
> >> and rapidly increasing. Most of the statistic/analyze queries’ latency
> is
> >> less than 3 seconds in real world production environment.
> >>>>>>
> >>>>>> Currently it is mainly used as Drill and Hive storage plugin. It
> >> should be quite easy to master.
> >>>>>>
> >>>>>> We hope IndexR be a favor to you and make it better.
> >>>>>>
> >>>>>> Regards
> >>>>>> Flow Wei
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>

Re: IndexR, a new storage plugin for Drill

Reply via email to