Not sure if you are aware of this new feature in Airflow https://issues.apache.org/jira/browse/AIRFLOW-6542. It's a way to use Airflow to orchestrate spark applications run using the Spark K8S operator ( https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).
On Sun, Apr 5, 2020 at 8:25 AM Masood Krohy <masood.krohy@analytical.works> wrote: > Another, simpler solution that I just thought of: just add an operation at > the end of your Spark program to write an empty file somewhere, with > filename SUCCESS for example. Add a stage to your AirFlow graph to check > the existence of this file after running spark-submit. If the file is > absent, then the Spark app must have failed. > > The above should work if you want to avoid dealing with the REST API for > monitoring. > > Masood > > __________________ > > Masood Krohy, Ph.D. > Data Science Advisor|Platform Architecthttps://www.analytical.works > > On 4/4/20 10:54 AM, Masood Krohy wrote: > > I'm not in the Spark dev team, so cannot tell you why that priority was > chosen for the JIRA issue or if anyone is about to finish the work on that; > I'll let others jump in if they know. > > Just wanted to offer a potential solution so that you can move ahead in > the meantime. > > Masood > > __________________ > > Masood Krohy, Ph.D. > Data Science Advisor|Platform Architecthttps://www.analytical.works > > On 4/4/20 7:49 AM, Marshall Markham wrote: > > Thank you very much Masood for your fast response. Last question, is the > current status in Jira representative of the status of the ticket within > the project team? This seems like a big deal for the K8s implementation and > we were surprised to find it marked as priority low. Is there any > discussion of picking up this work in the near future? > > > > Thanks, > > Marshall > > > > *From:* Masood Krohy <masood.krohy@analytical.works> > <masood.krohy@analytical.works> > *Sent:* Friday, April 3, 2020 9:34 PM > *To:* Marshall Markham <mmark...@precisionlender.com> > <mmark...@precisionlender.com>; user <user@spark.apache.org> > <user@spark.apache.org> > *Subject:* Re: spark-submit exit status on k8s > > > > While you wait for a fix on that JIRA ticket, you may be able to add an > intermediary step in your AirFlow graph, calling Spark's REST API after > submitting the job, and dig into the actual status of the application, and > make a success/fail decision accordingly. You can make repeated calls in a > loop to the REST API with few seconds delay between each call while the > execution is in progress until the application fails or succeeds. > > https://spark.apache.org/docs/latest/monitoring.html#rest-api > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fmonitoring.html%23rest-api&data=02%7C01%7Cmmarkham%40precisionlender.com%7C5de463febcd142287ba208d7d8384f1c%7Cf06d459bd9354ad7a9d3a82343c4c9da%7C0%7C1%7C637215608668550345&sdata=VeYtrGQ2yfkYvxuEvqgaTVoTf2ap5krWlmtR8OJBcr0%3D&reserved=0> > > Hope this helps. > > Masood > > __________________ > > > > Masood Krohy, Ph.D. > > Data Science Advisor|Platform Architect > > https://www.analytical.works > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.analytical.works%2F&data=02%7C01%7Cmmarkham%40precisionlender.com%7C5de463febcd142287ba208d7d8384f1c%7Cf06d459bd9354ad7a9d3a82343c4c9da%7C0%7C1%7C637215608668550345&sdata=1e07VVnMzpaUTR4ppvZxY5XCEcfRzCX7gA6YgdlWWaU%3D&reserved=0> > > On 4/3/20 8:23 AM, Marshall Markham wrote: > > Hi Team, > > > > My team recently conducted a POC of Kubernetes/Airflow/Spark with great > success. The major concern we have about this system, after the completion > of our POC is a behavior of spark-submit. When called with a Kubernetes API > endpoint as master spark-submit seems to always return exit status 0. This > is obviously a major issue preventing us from conditioning job graphs on > the success or failure of our Spark jobs. I found Jira ticket SPARK-27697 > under the Apache issues covering this bug. The ticket is listed as minor > and does not seem to have any activity recently. I would like to up vote it > and ask if there is anything I can do to move this forward. This could be > the one thing standing between my team and our preferred batch workload > implementation. Thank you. > > > > *Marshall Markham* > > Data Engineer > > PrecisionLender, a Q2 Company > > > > NOTE: This communication and any attachments are for the sole use of the > intended recipient(s) and may contain confidential and/or privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by replying to this email, and destroy all copies of the original > message. > > NOTE: This communication and any attachments are for the sole use of the > intended recipient(s) and may contain confidential and/or privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by replying to this email, and destroy all copies of the original > message. > >