Re: Spark Improvement Proposals

2017-02-11 Thread Xiao Li
During the summit, I also had a lot of discussions over similar topics with multiple Committers and active users. I heard many fantastic ideas. I believe Spark improvement proposals are good channels to collect the requirements/designs. IMO, we also need to consider the priority when working on

Re: Spark Improvement Proposals

2017-02-11 Thread Cody Koeninger
At the spark summit this week, everyone from PMC members to users I had never met before were asking me about the Spark improvement proposals idea. It's clear that it's a real community need. But it's been almost half a year, and nothing visible has been done. Reynold, are you going to do this?

Re: spark sql versus interactive hive versus hive

2017-02-11 Thread Saikat Kanjilal
Thanks Jorn for the input, our users want to run queries that perform large aggregations of data from different tables as well as simple ad hockey queries over 1 table. The tables are all in orc format, they're currently using the hive plus tez architecture that you mention but experiencing

Re: spark sql versus interactive hive versus hive

2017-02-11 Thread Jörn Franke
I think this is a rather simplistic view. All the tools to computation in-memory in the end. For certain type of computation and usage patterns it makes sense to keep them in memory. For example, most of the machine learning approaches require to include the same data in several iterative