Persist Dataframe to HDFS considering HDFS Block Size.

Shivam Sharma Sat, 19 Jan 2019 07:43:16 -0800

Hi All,

I wanted to persist dataframe on HDFS. Basically, I am inserting data into
a HIVE table using Spark. Currently, at the time of writing to HIVE table I
have set total shuffle partitions = 400 so total 400 files are being
created which is not even considering HDFS block size. How can I tell spark
to persist according to HDFS Blocks.


We have something like this HIVE which solves this problem:

set hive.merge.sparkfiles=true;
set hive.merge.smallfiles.avgsize=2048000000;
set hive.merge.size.per.task=4096000000;

Thanks

-- 
Shivam Sharma
Indian Institute Of Information Technology, Design and Manufacturing
Jabalpur
Mobile No- (+91) 8882114744
Email:- 28shivamsha...@gmail.com
LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
<https://www.linkedin.com/in/28shivamsharma>*

Persist Dataframe to HDFS considering HDFS Block Size.

Reply via email to