Thanks. My Spark applications run on nodes based on docker images but this is a standalone mode (1 driver - n workers) Can we use S3 directly with consistency addon like s3guard (s3a) or AWS Consistent view <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html> ?
Le mer. 23 déc. 2020 à 17:48, Lalwani, Jayesh <jlalw...@amazon.com> a écrit : > Yes. It is necessary to have a distributed file system because all the > workers need to read/write to the checkpoint. The distributed file system > has to be immediately consistent: When one node writes to it, the other > nodes should be able to read it immediately > > The solutions/workarounds depend on where you are hosting your Spark > application. > > > > *From: *David Morin <morin.david....@gmail.com> > *Date: *Wednesday, December 23, 2020 at 11:08 AM > *To: *"user@spark.apache.org" <user@spark.apache.org> > *Subject: *[EXTERNAL] Spark 3.0.1 Structured streaming - checkpoints fail > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > Hello, > > > > I have an issue with my Pyspark job related to checkpoint. > > > > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 3 in stage 16997.0 failed 4 times, most recent failure: Lost > task 3.3 in stage 16997.0 (TID 206609, 10.XXX, executor 4): > java.lang.IllegalStateException: Error reading delta file > file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta of > HDFSStateStoreProvider[id = (op=0,part=3),dir = > file:/opt/spark/workdir/query6/checkpointlocation/state/0/3]: > *file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta > does not exist* > > > > This job is based on Spark 3.0.1 and Structured Streaming > > This Spark cluster (1 driver and 6 executors) works without hdfs. And we > don't want to manage an hdfs cluster if possible. > > Is it necessary to have a distributed filesystem ? What are the different > solutions/workarounds ? > > > > Thanks in advance > > David >