guihuawen created SPARK-48309: --------------------------------- Summary: Stop am retry, in situations where some errors and retries may not be successful Key: SPARK-48309 URL: https://issues.apache.org/jira/browse/SPARK-48309 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 4.0.0 Reporter: guihuawen Fix For: 4.0.0
In yarn cluster mode, spark.yarn.maxAppAttempts will be configured. In our production environment, it is configured as 2 If the first execution fails, AM will retry. However, in some scenarios, even attempting a second task may fail. For example: org. apache. park. SQL AnalysisException: Table or view not found: test.testxxxx_xxxxx; Line 1 pos 14; Project +-Unresolved Relationship [bigdata_qa, testxxxxx_xxxxx], [], false Other example: Caused by: org. apache. hadoop. hdfs. protocol NSQuotaExceededException: The NameSpace quota (directories and files) of directory/tmp/xxx_file/xxxx is exceeded: quota=1000000 file count=1000001 Would it be more appropriate to try capturing these exceptions and stopping retry? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org