Hi, I am using Hive 1.1.0 and Spark 1.5.1 and creating hive context in spark-shell.
Now, I am experiencing reversed performance by Spark-Sql over Hive. By default Hive gives result back in 27 seconds for plain select * query on 1 GB dataset containing 3623203 records, while spark-sql gives back in 2 mins on collect operation. Cluster Config: Hive : 6 Node : 16 GB Memory, 4 cores each Spark : 4 Nodes : 16 GB Memory, 4 cores each My dataset has 45 partitions and spark-sql creates 82 jobs. I have tried all memory and garbage collection optimizations suggested on official website but failed to get better performance and its worth to mention that sometimes I get OOM error when I allocate executor memory less than 10G. Can somebody tell whats actually going on ?