Any updates on this bug ?

Why Spark log results & Job final status does not match ? (one saying that
job has failed, another stating that job has succeeded)

Thanks.


On Thu, Jul 23, 2015 at 4:43 PM, Elkhan Dadashov <elkhan8...@gmail.com>
wrote:

> Hi all,
>
> While running Spark Word count python example with intentional mistake in 
> *Yarn
> cluster mode*, Spark terminal states final status as SUCCEEDED, but log
> files state correct results indicating that the job failed.
>
> Why terminal log output & application log output contradict each other ?
>
> If i run same job on *local mode* then terminal logs and application logs
> match, where both state that job has failed to expected error in python
> script.
>
> More details: Scenario
>
> While running Spark Word count python example on *Yarn cluster mode*, if
> I make intentional error in wordcount.py by changing this line (I'm using
> Spark 1.4.1, but this problem exists in Spark 1.4.0 and in 1.3.0 versions -
> which i tested):
>
> lines = sc.textFile(sys.argv[1], 1)
>
> into this line:
>
> lines = sc.textFile(*nonExistentVariable*,1)
>
> where nonExistentVariable variable was never created and initialized.
>
> then i run that example with this command (I put README.md into HDFS
> before running this command):
>
> *./bin/spark-submit --master yarn-cluster wordcount.py /README.md*
>
> The job runs and finishes successfully according the log printed in the
> terminal :
> *Terminal logs*:
> ...
> 15/07/23 16:19:17 INFO yarn.Client: Application report for
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:18 INFO yarn.Client: Application report for
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:19 INFO yarn.Client: Application report for
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:20 INFO yarn.Client: Application report for
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:21 INFO yarn.Client: Application report for
> application_1437612288327_0013 (state: FINISHED)
> 15/07/23 16:19:21 INFO yarn.Client:
>  client token: N/A
>  diagnostics: Shutdown hook called before final status was reported.
>  ApplicationMaster host: 10.0.53.59
>  ApplicationMaster RPC port: 0
>  queue: default
>  start time: 1437693551439
>  final status: *SUCCEEDED*
>  tracking URL:
> http://localhost:8088/proxy/application_1437612288327_0013/history/application_1437612288327_0013/1
>  user: edadashov
> 15/07/23 16:19:21 INFO util.Utils: Shutdown hook called
> 15/07/23 16:19:21 INFO util.Utils: Deleting directory
> /tmp/spark-eba0a1b5-a216-4afa-9c54-a3cb67b16444
>
> But if look at log files generated for this application in HDFS - it
> indicates failure of the job with correct reason:
> *Application log files*:
> ...
> \00 stdout\00 179Traceback (most recent call last):
>   File "wordcount.py", line 32, in <module>
>     lines = sc.textFile(nonExistentVariable,1)
> *NameError: name 'nonExistentVariable' is not defined*
>
>
> Why terminal output - final status: *SUCCEEDED , *is not matching
> application log results - failure of the job (NameError: name
> 'nonExistentVariable' is not defined) ?
>
> Is this bug ? Is there Jira ticket related to this issue ? (Is someone
> assigned to this issue ?)
>
> If i run this wordcount .py example (with mistake line) in local mode,
> then terminal log states that the job has failed in terminal logs too.
>
> *./bin/spark-submit wordcount.py /README.md*
>
> *Terminal logs*:
>
> ...
> 15/07/23 16:31:55 INFO scheduler.EventLoggingListener: Logging events to
> hdfs:///app-logs/local-1437694314943
> Traceback (most recent call last):
>   File "/home/edadashov/tools/myspark/spark/wordcount.py", line 32, in
> <module>
>     lines = sc.textFile(nonExistentVariable,1)
> NameError: name 'nonExistentVariable' is not defined
> 15/07/23 16:31:55 INFO spark.SparkContext: Invoking stop() from shutdown
> hook
>
>
> Thanks.
>



-- 

Best regards,
Elkhan Dadashov

Reply via email to