Thomas Graves created SPARK-6904: ------------------------------------ Summary: SparkSql - HiveContext - optimize reading partition data from metastore Key: SPARK-6904 URL: https://issues.apache.org/jira/browse/SPARK-6904 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Thomas Graves
I was trying out spark sql using the HiveContext and doing a select on a partitioned table with lots of partitions (16,000+). It took over 6 minutes before it even started the job. It looks like it was querying the Hive metastore and got a good chunk of data back. Which I'm guessing is info on the partitions. Running the same query using hive takes 45 seconds for the entire job. It would be nice if we could optimize on the partitions when reading from the metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org