Hi Flow Wei, This looks pretty interesting. Any comments on comparison of indexR with Apache Kudu?
thanks, Robin. On 4 January 2017 at 11:20, WeiWan <wei...@sunteng.com> wrote: > Hi Nicolas, > > > 1)Does both drill and hive support predicat pushdown with indexR ? I mean > > using the indexes and not scanning table. > > Of course we supports predicates pushdown. > IndexR implements a special index so called Rough Set Index, which is very > suitable for statistic queries. It can effectively filter out those > irrelevant data chunks and cost very little comparing to other index form. > The idea is original comes from Infobright (ICE). I'm sure you can find > many useful links by google with “infobright rough set”. In some aspects > you can think IndexR as another Infobright which is open source, > distributed, on Hadoop and realtime ingest supported. > > > > 2)Does it support join push down, sort etc ? > > It does not. Those job should be done by query layer, i.e. Drill. > But we did hope Drill can support aggregation push down, which can really > speed up queries in the cases like “select count(*), sum(a), max(b) form > table" > > > 3)Can you elaborate why your team choose Drill versus equivalent (impala, > > presto…) > > We are not very familiar with Impala, Presto. But we did tried Spark. We > didn’t choose Spark because at that time, early 2016, Spark’s API for > scanner is not stable enough, and we need the processes running on local > machines, instead of running on Yarn. And most of all, we love Drill for > its stability, efficiency, simplicity, and the nice interface for storage > plugin. > > Regards > Flow Wei > > > > > On Jan 4, 2017, at 16:32, Nicolas Paris <nipari...@gmail.com> wrote: > > > > Hi Weiwan, > > > > 1)Does both drill and hive support predicat pushdown with indexR ? I mean > > using the indexes and not scanning table. > > 2)Does it support join push down, sort etc ? > > 3)Can you elaborate why your team choose Drill versus equivalent (impala, > > presto...) > > > > Thanks ! > > > > > > > > 2017-01-04 2:59 GMT+01:00 WeiWan <wei...@sunteng.com>: > > > >> Hi, > >> > >> It will take some time for IndexR plugin to merge into Drill. But you > can > >> try it out already by following those documents. > >> > >> Compilation: https://github.com/shunfei/indexr/wiki/Compilation < > >> https://github.com/shunfei/indexr/wiki/Compilation> > >> Deployment: https://github.com/shunfei/indexr/wiki/Deployment < > >> https://github.com/shunfei/indexr/wiki/Deployment> > >> User Guide: https://github.com/shunfei/indexr/wiki/User-Guide < > >> https://github.com/shunfei/indexr/wiki/User-Guide> > >> Regards > >> Flow Wei > >> > >> > >> > >>> On Jan 4, 2017, at 00:22, Jinfeng Ni <j...@apache.org> wrote: > >>> > >>> Looks like IndexR is very interesting storage plugin. Although I have > >>> not looked into the detail, I'm looking forward to seeing the PR and > >>> hopefully getting this into Drill! > >>> > >>> Thanks, > >>> > >>> Jinfeng > >>> > >>> > >>> On Tue, Jan 3, 2017 at 7:30 AM, WeiWan <wei...@sunteng.com> wrote: > >>>> Hi Charles, > >>>> > >>>> It would be great if IndexR plugin can be merged into official Drill > >> project. I will do some more tests based on latest Drill version and > submit > >> a PR. > >>>> > >>>> Regards > >>>> Flow Wei > >>>> > >>>> > >>>> > >>>>> On Jan 3, 2017, at 23:18, Charles Givre <cgi...@gmail.com> wrote: > >>>>> > >>>>> This sounds really interesting. Will you be submitting a PR to > >> integrate this into the main Drill codebase? > >>>>> — C > >>>>> > >>>>>> On Jan 3, 2017, at 03:35, WeiWan <wei...@sunteng.com> wrote: > >>>>>> > >>>>>> IndexR is a distributed, columnar storage system based on HDFS, > which > >> focus on fast analyse, both for massive static(historical) data and > rapidly > >> ingesting realtime data. IndexR is designed for OLAP. > >>>>>> > >>>>>> Fast analyze on large dataset > >>>>>> Realtime ingestion with zero delay for query > >>>>>> Deep integration with Hadoop ecosystem > >>>>>> Hardware efficiency > >>>>>> Highly avaliable, scalable, manageable and simple > >>>>>> Adapted with popular query engines like Apache Drill, Apache Hive, > >> etc. > >>>>>> > >>>>>> And now it is open source. > >>>>>> > >>>>>> Project: https://github.com/shunfei/indexr < > >> https://github.com/shunfei/indexr> > >>>>>> Wiki: https://github.com/shunfei/indexr/wiki < > >> https://github.com/shunfei/indexr/wiki> > >>>>>> > >>>>>> IndexR is original developed by Sunteng Tech. This project started a > >> year ago and now has been deployed to several productions in our > company. > >> The whole cluster consumes over 30 billions events each day in realtime > >> from Kafka. The largest table contains over 10 billions rows (after > rollup) > >> and rapidly increasing. Most of the statistic/analyze queries’ latency > is > >> less than 3 seconds in real world production environment. > >>>>>> > >>>>>> Currently it is mainly used as Drill and Hive storage plugin. It > >> should be quite easy to master. > >>>>>> > >>>>>> We hope IndexR be a favor to you and make it better. > >>>>>> > >>>>>> Regards > >>>>>> Flow Wei > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >> > >> >