I am a developer at Qubole and I want to introduce an open source project -
Quark - https://github.com/qubole/quark. If you are using Apache Hive with
data warehouses like Vertica, Greenplum or Redshift, Quark will simplify
access to data for data analysts. Two concrete examples where Quark is
useful are:
1. Hot data is stored in a data warehouse (Redshift, Vertica etc) and cold
data is stored in HDFS and accessed through Apache Hive.
2. Cubes are stored in Redshift and the base tables are stored HDFS.

Data analysts will submit queries to quark through a JDBC application like
Apache Zeppelin or sqlLine. Quark reroutes queries to the optimal dataset.
Note that Quark is *not* a federation engine. It does not join data across
databases. It can integrate with Presto or Hive for federation but the
preferred option is to run a query in a single datastore.

Here is an example of Quark with Hive on EMR & Redshift:
https://github.com/qubole/quark/blob/master/examples/EMR.md .

If this sounds interesting, we want to hear from you. We are available at
[email protected] and https://gitter.im/qubole/quark

Reply via email to