I use related spark config value but not works like below(success in spark
2.1.1) :
spark.hive.mapred.supports.subdirectories=true
spark.hive.supports.subdirectories=true
spark.mapred.input.dir.recursive=true
spark.hive.mapred.supports.subdirectories=true

And when I query, I also use related hive config but not works like below:
mapred.input.dir.recursive=true
hive.mapred.supports.subdirectories=true

I already know if load the path like
'/user/test/warehouse/somedb.db/dt=20200312/*/' as Dataframein pyspark, it
works. But for complex business logic, I should use spark.sql().

Please give me advise.
Thanks !

* Code
from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext, SparkSession
if __name__ == "__main__":
spark = SparkSession \
.builder \
.appName("Sub-Directory Test") \
.enableHiveSupport() \
.getOrCreate()
spark.sql("select * from somedb.table where dt = '20200301' limit
10").show()

* Hive table directory path
/user/test/warehouse/somedb.db/dt=20200312/1/000000_0
/user/test/warehouse/somedb.db/dt=20200312/1/000000_1
.
.
/user/test/warehouse/somedb.db/dt=20200312/2/000000_0
/user/test/warehouse/somedb.db/dt=20200312/3/000000_0



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to