I use related spark config value but not works like below(success in spark 2.1.1) : spark.hive.mapred.supports.subdirectories=true spark.hive.supports.subdirectories=true spark.mapred.input.dir.recursive=true spark.hive.mapred.supports.subdirectories=true
And when I query, I also use related hive config but not works like below: mapred.input.dir.recursive=true hive.mapred.supports.subdirectories=true I already know if load the path like '/user/test/warehouse/somedb.db/dt=20200312/*/' as Dataframein pyspark, it works. But for complex business logic, I should use spark.sql(). Please give me advise. Thanks ! * Code from pyspark import SparkConf, SparkContext from pyspark.sql import HiveContext, SparkSession if __name__ == "__main__": spark = SparkSession \ .builder \ .appName("Sub-Directory Test") \ .enableHiveSupport() \ .getOrCreate() spark.sql("select * from somedb.table where dt = '20200301' limit 10").show() * Hive table directory path /user/test/warehouse/somedb.db/dt=20200312/1/000000_0 /user/test/warehouse/somedb.db/dt=20200312/1/000000_1 . . /user/test/warehouse/somedb.db/dt=20200312/2/000000_0 /user/test/warehouse/somedb.db/dt=20200312/3/000000_0 -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org