Thanks Akhil, that will help a lot !
It turned out that spark-jobserver does not work in "development mode" but if you deploy a server it works (looks like the dependencies when running jobserver from sbt are not right) On Thu, Jan 1, 2015 at 5:22 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Hi Fernando, > > Here's a <https://github.com/sigmoidanalytics/Test> simple log > parser/analyser written in scala (you can run it without > spark-shell/submit). https://github.com/sigmoidanalytics/Test > > Basically to run a spark job without spark-submit or shell you need a build > file <https://github.com/sigmoidanalytics/Test/blob/master/build.sbt> > which will pull in all the dependecies, and the main program > <https://github.com/sigmoidanalytics/Test/blob/master/src/main/scala/LogAnalyser.scala#L35> > in which you will specify your cluster details while creating the > SparkContext. > > Thanks > Best Regards > > On Wed, Dec 31, 2014 at 10:54 PM, Fernando O. <fot...@gmail.com> wrote: > >> Before jumping into a sea of dependencies and bash files: >> Does anyone have an example of how to run a spark job without using >> spark-submit or shell ? >> >> On Tue, Dec 30, 2014 at 3:23 PM, Fernando O. <fot...@gmail.com> wrote: >> >>> Hi all, >>> I'm investigating spark for a new project and I'm trying to use >>> spark-jobserver because... I need to reuse and share RDDs and from what I >>> read in the forum that's the "standard" :D >>> >>> Turns out that spark-jobserver doesn't seem to work on yarn, or at least >>> it does not on 1.1.1 >>> >>> My config is spark 1.1.1 (moving to 1.2.0 soon), hadoop 2.6 (which seems >>> compatible with 2.4 from spark point of view... at least I was able to run >>> spark-submit and shell tasks both in yarn-client and yarn-cluster modes) >>> >>> >>> >>> >>> going back to my original point, I did some changes in spark-jobserver >>> and how I can submit a job but I get: >>> >>> .... >>> [2014-12-30 18:20:19,769] INFO e.spark.deploy.yarn.Client [] >>> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] >>> - Max mem capabililty of a single resource in this cluster 15000 >>> [2014-12-30 18:20:19,770] INFO e.spark.deploy.yarn.Client [] >>> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] >>> - Preparing Local resources >>> [2014-12-30 18:20:20,041] INFO e.spark.deploy.yarn.Client [] >>> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] >>> - Prepared Local resources Map(__spark__.jar -> resource { scheme: "file" >>> port: -1 file: >>> "/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar" >>> } size: 343226 timestamp: 1416429031000 type: FILE visibility: PRIVATE) >>> >>> [...] >>> >>> [2014-12-30 18:20:20,139] INFO e.spark.deploy.yarn.Client [] >>> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] >>> - Yarn AM launch context: >>> [2014-12-30 18:20:20,140] INFO e.spark.deploy.yarn.Client [] >>> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] >>> - class: org.apache.spark.deploy.yarn.ExecutorLauncher >>> [2014-12-30 18:20:20,140] INFO e.spark.deploy.yarn.Client [] >>> [akka://JobServer/user/context-supervisor/f983d86e-spark.jobserver.WordCountExample] >>> - env: Map(CLASSPATH -> >>> $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/*, >>> SPARK_YARN_CACHE_FILES_FILE_SIZES -> 343226, SPARK_YARN_STAGING_DIR -> >>> .sparkStaging/application_1419963137232_0001/, >>> SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE, SPARK_USER -> ec2-user, >>> SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> >>> 1416429031000, SPARK_YARN_CACHE_FILES -> >>> file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar#__spark__.jar) >>> >>> [...] >>> >>> [2014-12-30 18:03:04,474] INFO YarnClientSchedulerBackend [] >>> [akka://JobServer/user/context-supervisor/ebac0153-spark.jobserver.WordCountExample] >>> - Application report from ASM: >>> appMasterRpcPort: -1 >>> appStartTime: 1419962580444 >>> yarnAppState: FAILED >>> >>> [2014-12-30 18:03:04,475] ERROR .jobserver.JobManagerActor [] >>> [akka://JobServer/user/context-supervisor/ebac0153-spark.jobserver.WordCountExample] >>> - Failed to create context ebac0153-spark.jobserver.WordCountExample, >>> shutting down actor >>> org.apache.spark.SparkException: Yarn application already ended,might be >>> killed or not able to launch application master. >>> at >>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:117) >>> at >>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:93) >>> >>> >>> >>> In the hadoop console I can get the detailed issue >>> >>> Diagnostics: File >>> file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar >>> does not exist >>> java.io.FileNotFoundException: File >>> file:/home/ec2-user/.ivy2/cache/org.apache.spark/spark-yarn_2.10/jars/spark-yarn_2.10-1.1.1.jar >>> does not exist >>> >>> now... it seems like spark is actually use a file I used for launching >>> the task in other nodes >>> >>> Can anyone point me in the right direction of where that might be being >>> set? >>> >>> >> >