That's awesome Yan. I was considering Phoenix for SQL calls to HBase since Cassandra supports CQL but HBase QL support was lacking. I will get back to you as I start using it on our loads.
I am assuming the latencies won't be much different from accessing HBase through tsdb asynchbase as that's one more option I am looking into. On Mon, Jul 27, 2015 at 10:12 PM, Yan Zhou.sc <yan.zhou...@huawei.com> wrote: > HBase in this case is no different from any other Spark SQL data > sources, so yes you should be able to access HBase data through Astro from > Spark SQL’s JDBC interface. > > > > Graphically, the access path is as follows: > > > > Spark SQL JDBC Interface -> Spark SQL Parser/Analyzer/Optimizer->Astro > Optimizer-> HBase Scans/Gets -> … -> HBase Region server > > > > > > Regards, > > > > Yan > > > > *From:* Debasish Das [mailto:debasish.da...@gmail.com] > *Sent:* Monday, July 27, 2015 10:02 PM > *To:* Yan Zhou.sc > *Cc:* Bing Xiao (Bing); dev; user > *Subject:* RE: Package Release Annoucement: Spark SQL on HBase "Astro" > > > > Hi Yan, > > Is it possible to access the hbase table through spark sql jdbc layer ? > > Thanks. > Deb > > On Jul 22, 2015 9:03 PM, "Yan Zhou.sc" <yan.zhou...@huawei.com> wrote: > > Yes, but not all SQL-standard insert variants . > > > > *From:* Debasish Das [mailto:debasish.da...@gmail.com] > *Sent:* Wednesday, July 22, 2015 7:36 PM > *To:* Bing Xiao (Bing) > *Cc:* user; dev; Yan Zhou.sc > *Subject:* Re: Package Release Annoucement: Spark SQL on HBase "Astro" > > > > Does it also support insert operations ? > > On Jul 22, 2015 4:53 PM, "Bing Xiao (Bing)" <bing.x...@huawei.com> wrote: > > We are happy to announce the availability of the Spark SQL on HBase 1.0.0 > release. > http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase > > The main features in this package, dubbed “Astro”, include: > > · Systematic and powerful handling of data pruning and > intelligent scan, based on partial evaluation technique > > · HBase pushdown capabilities like custom filters and coprocessor > to support ultra low latency processing > > · SQL, Data Frame support > > · More SQL capabilities made possible (Secondary index, bloom > filter, Primary Key, Bulk load, Update) > > · Joins with data from other sources > > · Python/Java/Scala support > > · Support latest Spark 1.4.0 release > > > > The tests by Huawei team and community contributors covered the areas: > bulk load; projection pruning; partition pruning; partial evaluation; code > generation; coprocessor; customer filtering; DML; complex filtering on keys > and non-keys; Join/union with non-Hbase data; Data Frame; multi-column > family test. We will post the test results including performance tests the > middle of August. > > You are very welcomed to try out or deploy the package, and help improve > the integration tests with various combinations of the settings, extensive > Data Frame tests, complex join/union test and extensive performance tests. > Please use the “Issues” “Pull Requests” links at this package homepage, if > you want to report bugs, improvement or feature requests. > > Special thanks to project owner and technical leader Yan Zhou, Huawei > global team, community contributors and Databricks. Databricks has been > providing great assistance from the design to the release. > > “Astro”, the Spark SQL on HBase package will be useful for ultra low > latency* query and analytics of large scale data sets in vertical > enterprises**.* We will continue to work with the community to develop > new features and improve code base. Your comments and suggestions are > greatly appreciated. > > > > Yan Zhou / Bing Xiao > > Huawei Big Data team > > >