I am a developer at Qubole and I want to introduce an open source project - Quark - https://github.com/qubole/quark. If you are using Apache Hive with data warehouses like Vertica, Greenplum or Redshift, Quark will simplify access to data for data analysts. Two concrete examples where Quark is useful are: 1. Hot data is stored in a data warehouse (Redshift, Vertica etc) and cold data is stored in HDFS and accessed through Apache Hive. 2. Cubes are stored in Redshift and the base tables are stored HDFS.
Data analysts will submit queries to quark through a JDBC application like Apache Zeppelin or sqlLine. Quark reroutes queries to the optimal dataset. Note that Quark is *not* a federation engine. It does not join data across databases. It can integrate with Presto or Hive for federation but the preferred option is to run a query in a single datastore. Here is an example of Quark with Hive on EMR & Redshift: https://github.com/qubole/quark/blob/master/examples/EMR.md . If this sounds interesting, we want to hear from you. We are available at [email protected] and https://gitter.im/qubole/quark
