I have a docker based cluster. In my cluster, I try to schedule spark jobs by using Airflow. Airflow and Spark are running separately in *different containers*. However, I cannot run a spark job by using airflow.
Below the code is my airflow script: from airflow import DAG from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator from datetime import datetime, timedelta args = {'owner': 'airflow', 'start_date': datetime(2018, 7, 31) } dag = DAG('spark_example_new', default_args=args, schedule_interval="@once") operator = SparkSubmitOperator(task_id='spark_submit_job', conn_id='spark_default', java_class='Main', application='/SimpleSpark.jar', name='airflow-spark-example', dag=dag) I also configure spark_default in Airflow UI: [image: Screenshot from 2018-09-24 12-00-46.png] However, it produce following error: [Errno 2] No such file or directory: 'spark-submit': 'spark-submit' I think, airflow try to run spark job in own. How can I configure that it runs spark code on spark master. -- Uğur Sopaoğlu