Hi, I am seeing similar issues when I was working on Oracle with Tableau as the dashboard.
Currently I have a batch layer that gets streaming data from source -> Kafka -> Flume -> HDFS It stored on HDFS as text files and a cron process sinks Hive table with the the external table build on the directory. I tried both ORC and Parquet but I don't think the query itself is the issue. Meaning it does not matter how clever your execution engine is, the fact you still have to do considerable amount of Physical IO (PIO) as opposed to Logical IO (LIO) to get the data to Zeppelin is on the critical path. One option is to limit the amount of data in Zeppelin to certain number of rows or something similar. However, you cannot tell a user he/she cannot see the full data. We resolved this with Oracle by using Oracle TimesTen <http://www.oracle.com/technetwork/database/database-technologies/timesten/overview/index.html>IMDB to cache certain tables in memory and get them refreshed (depending on refresh frequency) from the underlying table in Oracle when data is updated). That is done through cache fusion. I was looking around and came across Alluxio <http://www.alluxio.org/>. Ideally I like to utilise such concept like TimesTen. Can one distribute Hive table data (or any table data) across the nodes cached. In that case we will be doing Logical IO which is about 20 times or more lightweight compared to Physical IO. Anyway this is the concept. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.