Any updates on this bug ? Why Spark log results & Job final status does not match ? (one saying that job has failed, another stating that job has succeeded)
Thanks. On Thu, Jul 23, 2015 at 4:43 PM, Elkhan Dadashov <elkhan8...@gmail.com> wrote: > Hi all, > > While running Spark Word count python example with intentional mistake in > *Yarn > cluster mode*, Spark terminal states final status as SUCCEEDED, but log > files state correct results indicating that the job failed. > > Why terminal log output & application log output contradict each other ? > > If i run same job on *local mode* then terminal logs and application logs > match, where both state that job has failed to expected error in python > script. > > More details: Scenario > > While running Spark Word count python example on *Yarn cluster mode*, if > I make intentional error in wordcount.py by changing this line (I'm using > Spark 1.4.1, but this problem exists in Spark 1.4.0 and in 1.3.0 versions - > which i tested): > > lines = sc.textFile(sys.argv[1], 1) > > into this line: > > lines = sc.textFile(*nonExistentVariable*,1) > > where nonExistentVariable variable was never created and initialized. > > then i run that example with this command (I put README.md into HDFS > before running this command): > > *./bin/spark-submit --master yarn-cluster wordcount.py /README.md* > > The job runs and finishes successfully according the log printed in the > terminal : > *Terminal logs*: > ... > 15/07/23 16:19:17 INFO yarn.Client: Application report for > application_1437612288327_0013 (state: RUNNING) > 15/07/23 16:19:18 INFO yarn.Client: Application report for > application_1437612288327_0013 (state: RUNNING) > 15/07/23 16:19:19 INFO yarn.Client: Application report for > application_1437612288327_0013 (state: RUNNING) > 15/07/23 16:19:20 INFO yarn.Client: Application report for > application_1437612288327_0013 (state: RUNNING) > 15/07/23 16:19:21 INFO yarn.Client: Application report for > application_1437612288327_0013 (state: FINISHED) > 15/07/23 16:19:21 INFO yarn.Client: > client token: N/A > diagnostics: Shutdown hook called before final status was reported. > ApplicationMaster host: 10.0.53.59 > ApplicationMaster RPC port: 0 > queue: default > start time: 1437693551439 > final status: *SUCCEEDED* > tracking URL: > http://localhost:8088/proxy/application_1437612288327_0013/history/application_1437612288327_0013/1 > user: edadashov > 15/07/23 16:19:21 INFO util.Utils: Shutdown hook called > 15/07/23 16:19:21 INFO util.Utils: Deleting directory > /tmp/spark-eba0a1b5-a216-4afa-9c54-a3cb67b16444 > > But if look at log files generated for this application in HDFS - it > indicates failure of the job with correct reason: > *Application log files*: > ... > \00 stdout\00 179Traceback (most recent call last): > File "wordcount.py", line 32, in <module> > lines = sc.textFile(nonExistentVariable,1) > *NameError: name 'nonExistentVariable' is not defined* > > > Why terminal output - final status: *SUCCEEDED , *is not matching > application log results - failure of the job (NameError: name > 'nonExistentVariable' is not defined) ? > > Is this bug ? Is there Jira ticket related to this issue ? (Is someone > assigned to this issue ?) > > If i run this wordcount .py example (with mistake line) in local mode, > then terminal log states that the job has failed in terminal logs too. > > *./bin/spark-submit wordcount.py /README.md* > > *Terminal logs*: > > ... > 15/07/23 16:31:55 INFO scheduler.EventLoggingListener: Logging events to > hdfs:///app-logs/local-1437694314943 > Traceback (most recent call last): > File "/home/edadashov/tools/myspark/spark/wordcount.py", line 32, in > <module> > lines = sc.textFile(nonExistentVariable,1) > NameError: name 'nonExistentVariable' is not defined > 15/07/23 16:31:55 INFO spark.SparkContext: Invoking stop() from shutdown > hook > > > Thanks. > -- Best regards, Elkhan Dadashov