Thanks Jorn for the input, our users want to run queries that perform large
aggregations of data from different tables as well as simple ad hockey queries
over 1 table. The tables are all in orc format, they're currently using the
hive plus tez architecture that you mention but experiencing
I think this is a rather simplistic view. All the tools to computation
in-memory in the end. For certain type of computation and usage patterns it
makes sense to keep them in memory. For example, most of the machine learning
approaches require to include the same data in several iterative
Folks,
I'm embarking on a project to build a POC around spark sql, I was wondering if
anyone has experience in comparing spark sql with hive or interactive hive and
data points around the types of queries suited for both, I am naively assuming
that spark sql will beat hive in all queries given