spark parquet too many small files ?

kali.tumm...@gmail.com Fri, 01 Jul 2016 17:17:59 -0700

Hi All, 

I am running hive in spark-sql in yarn client mode, the sql is pretty simple
load dynamic partitions to target parquet table.


I used hive configurations parameters such as  (set
hive.merge.smallfiles.avgsize=256000000;set
hive.merge.size.per.task=2560000000;) which usually merges small files to
256mb block size these parameters are supported in spark-sql is there other
way around to merge number of small parquet files to large one.

if its a scala application I can use collasece() function or repartition but
here we are not using spark-scala application its just plain spark-sql.


Thanks
Sri 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

spark parquet too many small files ?

Reply via email to