I mentioned parquet as input format.
On Oct 10, 2016 11:06 PM, "ayan guha" <guha.a...@gmail.com> wrote:

> It really depends on the input format used.
> On 11 Oct 2016 08:46, "Selvam Raman" <sel...@gmail.com> wrote:
>
>> Hi,
>>
>> How spark reads data from s3 and runs parallel task.
>>
>> Assume I have a s3 bucket size of 35 GB( parquet file).
>>
>> How the sparksession will read the data and process the data parallel.
>> How it splits the s3 data and assign to each executor task.
>>
>> ​Please share me your points.
>>
>> Note:
>> if we have RDD , then we can look at the partitions.size or length to
>> check how many partition for a file. But how this will be accomplished in
>> terms of S3 bucket.​
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>

Reply via email to