guihuawen created SPARK-48309:
---------------------------------

             Summary: Stop am retry, in situations where some errors and 
retries may not be successful
                 Key: SPARK-48309
                 URL: https://issues.apache.org/jira/browse/SPARK-48309
             Project: Spark
          Issue Type: Improvement
          Components: YARN
    Affects Versions: 4.0.0
            Reporter: guihuawen
             Fix For: 4.0.0


In yarn cluster mode, spark.yarn.maxAppAttempts will be configured. In our 
production environment, it is configured as 2 If the first execution fails, AM 
will retry. However, in some scenarios, even attempting a second task may fail.

For example:

org. apache. park. SQL AnalysisException: Table or view not found: 
test.testxxxx_xxxxx; Line 1 pos 14;
Project
+-Unresolved Relationship [bigdata_qa, testxxxxx_xxxxx], [], false

 


Other example:
Caused by: org. apache. hadoop. hdfs. protocol NSQuotaExceededException: The 
NameSpace quota (directories and files) of directory/tmp/xxx_file/xxxx is 
exceeded: quota=1000000 file count=1000001


Would it be more appropriate to try capturing these exceptions and stopping 
retry?

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to