It really depends on the input format used. On 11 Oct 2016 08:46, "Selvam Raman" <sel...@gmail.com> wrote:
> Hi, > > How spark reads data from s3 and runs parallel task. > > Assume I have a s3 bucket size of 35 GB( parquet file). > > How the sparksession will read the data and process the data parallel. How > it splits the s3 data and assign to each executor task. > > Please share me your points. > > Note: > if we have RDD , then we can look at the partitions.size or length to > check how many partition for a file. But how this will be accomplished in > terms of S3 bucket. > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" >