While you wait for a fix on that JIRA ticket, you may be able to add an
intermediary step in your AirFlow graph, calling Spark's REST API after
submitting the job, and dig into the actual status of the application,
and make a success/fail decision accordingly. You can make repeated
calls in a loop to the REST API with few seconds delay between each call
while the execution is in progress until the application fails or succeeds.
https://spark.apache.org/docs/latest/monitoring.html#rest-api
Hope this helps.
Masood
__________________
Masood Krohy, Ph.D.
Data Science Advisor|Platform Architect
https://www.analytical.works
On 4/3/20 8:23 AM, Marshall Markham wrote:
Hi Team,
My team recently conducted a POC of Kubernetes/Airflow/Spark with
great success. The major concern we have about this system, after the
completion of our POC is a behavior of spark-submit. When called with
a Kubernetes API endpoint as master spark-submit seems to always
return exit status 0. This is obviously a major issue preventing us
from conditioning job graphs on the success or failure of our Spark
jobs. I found Jira ticket SPARK-27697 under the Apache issues covering
this bug. The ticket is listed as minor and does not seem to have any
activity recently. I would like to up vote it and ask if there is
anything I can do to move this forward. This could be the one thing
standing between my team and our preferred batch workload
implementation. Thank you.
*Marshall Markham*
Data Engineer
PrecisionLender, a Q2 Company
NOTE: This communication and any attachments are for the sole use of
the intended recipient(s) and may contain confidential and/or
privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
please contact the sender by replying to this email, and destroy all
copies of the original message.