Don't we have any property for it?

One more quick question that if files created by Spark is less than HDFS
block size then the rest of Block space will become unavailable and remain
unutilized or it will be shared with other files?

On Mon, Jan 21, 2019 at 1:30 PM Shivam Sharma <28shivamsha...@gmail.com>
wrote:

> Don't we have any property for it?
>
> One more quick question that if files created by Spark is less than HDFS
> block size then the rest of Block space will become unavailable and remain
> unutilized or it will be shared with other files?
>
> On Sun, Jan 20, 2019 at 12:47 AM Hichame El Khalfi <hich...@elkhalfi.com>
> wrote:
>
>> You can do this in 2 passes (not one)
>> A) save you dataset into hdfs with what you have.
>> B) calculate number of partition, n= (size of your dataset)/hdfs block
>> size
>> Then run simple spark job to read and partition based on 'n'.
>>
>> Hichame
>>
>> *From:* felixcheun...@hotmail.com
>> *Sent:* January 19, 2019 2:06 PM
>> *To:* 28shivamsha...@gmail.com; user@spark.apache.org
>> *Subject:* Re: Persist Dataframe to HDFS considering HDFS Block Size.
>>
>> You can call coalesce to combine partitions..
>>
>>
>> ------------------------------
>> *From:* Shivam Sharma <28shivamsha...@gmail.com>
>> *Sent:* Saturday, January 19, 2019 7:43 AM
>> *To:* user@spark.apache.org
>> *Subject:* Persist Dataframe to HDFS considering HDFS Block Size.
>>
>> Hi All,
>>
>> I wanted to persist dataframe on HDFS. Basically, I am inserting data
>> into a HIVE table using Spark. Currently, at the time of writing to HIVE
>> table I have set total shuffle partitions = 400 so total 400 files are
>> being created which is not even considering HDFS block size. How can I tell
>> spark to persist according to HDFS Blocks.
>>
>> We have something like this HIVE which solves this problem:
>>
>> set hive.merge.sparkfiles=true;
>> set hive.merge.smallfiles.avgsize=2048000000;
>> set hive.merge.size.per.task=4096000000;
>>
>> Thanks
>>
>> --
>> Shivam Sharma
>> Indian Institute Of Information Technology, Design and Manufacturing
>> Jabalpur
>> Mobile No- (+91) 8882114744
>> Email:- 28shivamsha...@gmail.com
>> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
>> <https://www.linkedin.com/in/28shivamsharma>*
>>
>
>
> --
> Shivam Sharma
> Indian Institute Of Information Technology, Design and Manufacturing
> Jabalpur
> Mobile No- (+91) 8882114744
> Email:- 28shivamsha...@gmail.com
> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
> <https://www.linkedin.com/in/28shivamsharma>*
>


-- 
Shivam Sharma
Indian Institute Of Information Technology, Design and Manufacturing
Jabalpur
Mobile No- (+91) 8882114744
Email:- 28shivamsha...@gmail.com
LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
<https://www.linkedin.com/in/28shivamsharma>*

Reply via email to