It really depends on the input format used.
On 11 Oct 2016 08:46, "Selvam Raman" <sel...@gmail.com> wrote:

> Hi,
>
> How spark reads data from s3 and runs parallel task.
>
> Assume I have a s3 bucket size of 35 GB( parquet file).
>
> How the sparksession will read the data and process the data parallel. How
> it splits the s3 data and assign to each executor task.
>
> ​Please share me your points.
>
> Note:
> if we have RDD , then we can look at the partitions.size or length to
> check how many partition for a file. But how this will be accomplished in
> terms of S3 bucket.​
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>

Reply via email to