in our hive warehouse there are many tables with a lot of partitions, such as scala> hiveContext.sql("use db_external") scala> val result = hiveContext.sql("show partitions et_fullorders").count result: Long = 5879
i noticed that this part of code: https://github.com/apache/spark/blob/9d006c97371ddf357e0b821d5c6d1535d9b6fe41/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L55-L56 reads the whole partitions info at the beginning of plan phase, i added a logInfo around this val partitions = ... it shows: scala> val result = hiveContext.sql("select * from db_external.et_fullorders limit 5") 14/09/02 16:15:56 INFO ParseDriver: Parsing command: select * from db_external.et_fullorders limit 5 14/09/02 16:15:56 INFO ParseDriver: Parse Completed 14/09/02 16:15:56 INFO HiveContext$$anon$1: getAllPartitionsForPruner started 14/09/02 16:17:35 INFO HiveContext$$anon$1: getAllPartitionsForPruner finished it took about 2min to get all partitions... is there any possible way to avoid this operation? such as only fetch the requested partition somehow? Thanks -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/hive-client-getAllPartitions-in-lookupRelation-can-take-a-very-long-time-tp8186.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org