Hi, How spark reads data from s3 and runs parallel task.
Assume I have a s3 bucket size of 35 GB( parquet file). How the sparksession will read the data and process the data parallel. How it splits the s3 data and assign to each executor task. Please share me your points. Note: if we have RDD , then we can look at the partitions.size or length to check how many partition for a file. But how this will be accomplished in terms of S3 bucket. -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"