I am trying to submit a spark job through oozie, the job is marked successful, but it does not do anything.
I have it working through spark_submit on the command line. Also the problem may be around creating the spark context, because I added logging before/after/finally creating the SparkContext, and I only see "test 1" in the logs. Anyone have a suggestion on debugging this? Or where I may be able to get additional logs? try { this.log.warning("test 1") sc = new SparkContext(conf) this.log.warning("test 2") } finally { this.log.warning("test 3") } We are using Yarn/CDH5.5 When I compare the logs below is where they start to diverge. ==========Broken=================== server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SelectChannelConnector@0.0.0.0:33827 util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'SparkUI' on port xxxxx. ui.SparkUI (Logging.scala:logInfo(59)) - Started SparkUI at http://xxxxxxxx:xxxx cluster.YarnClusterScheduler (Logging.scala:logInfo(59)) - Created YarnClusterScheduler metrics.MetricsSystem (Logging.scala:logWarning(71)) - Using default name DAGScheduler for source because spark.app.id is not set. util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 16611. netty.NettyBlockTransferService (Logging.scala:logInfo(59)) - Server created on xxxxx storage.BlockManager (Logging.scala:logInfo(59)) - external shuffle service port = xxxx storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Trying to register BlockManager ==========Working=================== AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 Utils: Successfully started service 'SparkUI' on port 4040. SparkUI: Started SparkUI at http://xxxxxxx:4040 SparkContext: Added JAR ... SparkContext: Added JAR file:/xxxxxxx.jar at http://xxxxxxx.jar with timestamp 1450320527893 MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. ConfiguredRMFailoverProxyProvider: Failing over to rm238 Client: Requesting a new application from cluster with 25 NodeManagers Client: Verifying our application has not requested more than the maximum memory capability of the cluster (65536 MB per container)