Hi All, I am running hive in spark-sql in yarn client mode, the sql is pretty simple load dynamic partitions to target parquet table.
I used hive configurations parameters such as (set hive.merge.smallfiles.avgsize=256000000;set hive.merge.size.per.task=2560000000;) which usually merges small files to 256mb block size these parameters are supported in spark-sql is there other way around to merge number of small parquet files to large one. if its a scala application I can use collasece() function or repartition but here we are not using spark-scala application its just plain spark-sql. Thanks Sri -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org