: Understanding Spark S3 Read Performance Hi,I'm
trying to set up a Spark pipeline which reads data from S3 and writes it into
Google Big Query.Environment Details:---Java 8AWS
EMR-6.10.0Spark v3.3.12 m5.xlarge executor nodesS3 Directory
structure:--- bucket-name:|---folder1
Hi,
I'm trying to set up a Spark pipeline which reads data from S3 and writes
it into Google Big Query.
Environment Details:
---
Java 8
AWS EMR-6.10.0
Spark v3.3.1
2 m5.xlarge executor nodes
S3 Directory structure:
---
bucket-name:
|---folder1:
|---folder2: