Hi William, because it is the only job that is running I don't think it is resource contention. We have configured capacity scheduler which means using yarn queues. As it is the only job I cant see that it is waiting somehow in the queue.
Br, Dennis Von meinem iPhone gesendet > Am 20.07.2019 um 01:48 schrieb William Shen <wills...@marinsoftware.com>: > > Dennis, do you know what’s taking the additional time? Is it the Spark Job, > or oozie waiting for allocation from YARN? Do you have resource contention > issue in YARN? > >> On Fri, Jul 19, 2019 at 12:24 AM Bartek Dobija <bartek.dob...@gmail.com> >> wrote: >> Hi Dennis, >> >> Oozie jobs shouldn't take that long in a well configured cluster. Oozie >> allocates it's own resources in Yarn which may require fine tuning. Check if >> YARN gives resources to the Oozie job immediately which may be one of the >> reasons and change jobs priorities in YARN scheduling configuration. >> >> Alternatively check the Apache Airflow project which is a good alternative >> to Oozie. >> >> Regards, >> Bartek >> >>> On Fri, Jul 19, 2019, 09:09 Dennis Suhari <d.suh...@icloud.com.invalid> >>> wrote: >>> >>> Dear experts, >>> >>> I am using Spark for processing data from HDFS (hadoop). These Spark >>> application are data pipelines, data wrangling and machine learning >>> applications. Thus Spark submits its job using YARN. >>> This also works well. For scheduling I am now trying to use Apache Oozie, >>> but I am facing performqnce impacts. A Spark job which tooks 44 seconds >>> when submitting it via CLI now takes nearly 3 Minutes. >>> >>> Have you faced similar experiences in using Oozie for scheduling Spark >>> application jobs ? What alternative workflow tools are you using for >>> scheduling Spark jobs on Hadoop ? >>> >>> >>> Br, >>> >>> Dennis >>> >>> Von meinem iPhone gesendet >>> Von meinem iPhone gesendet >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org