I have a mesos cluster which runs marathon. I am using marathon to launch a long running spark streaming job which consumes a Kafka Input Stream.
With one worker node in the cluster, I can successsfully launch the driver job in marathon, which in turn launches a task in mesos via spark (spark is using the coarse mode driver), which consumes just fine from kafka. When I add nodes to the cluster, and start the driver job the first spark mesos task runs fine for a few minutes, then exits (exit code 1) and the tasks that are subsequently launched just sit there doing nothing, with this being the last log output 14/03/26 19:06:47 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/03/26 19:06:47 INFO Remoting: Starting remoting 14/03/26 19:06:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@ip-xxx.ec2.internaxl:34488] 14/03/26 19:06:47 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@xxxx.ec2.internal:34488] 14/03/26 19:06:47 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@xxxx.ec2.internal:35332/user/CoarseGrainedScheduler 14/03/26 19:06:47 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver 14/03/26 19:06:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/03/26 19:06:48 INFO Remoting: Starting remoting 14/03/26 19:06:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@xxxx.ec2.internal:59070] 14/03/26 19:06:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@xxxx.ec2.internal:59070] 14/03/26 19:06:48 INFO spark.SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@xxxx.ec2.internal:35332/user/BlockManagerMaster 14/03/26 19:06:48 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140326190648-fa77 14/03/26 19:06:48 INFO storage.MemoryStore: MemoryStore started with capacity 294.4 MB. 14/03/26 19:06:48 INFO network.ConnectionManager: Bound socket to port 55018 with id = ConnectionManagerId(xxxx.ec2.internal,55018) 14/03/26 19:06:48 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/03/26 19:06:48 INFO storage.BlockManagerMaster: Registered BlockManager 14/03/26 19:06:48 INFO spark.SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@xxxx.ec2.internal:35332/user/MapOutputTracker 14/03/26 19:06:48 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-15f43b7b-6f7c-48dd-8bd8-5663a00fd314 14/03/26 19:06:48 INFO spark.HttpServer: Starting HTTP Server 14/03/26 19:06:48 INFO server.Server: jetty-7.x.y-SNAPSHOT 14/03/26 19:06:48 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:45286 Any pointers as to un-stick this?? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Kafka-Mesos-Marathon-strangeness-tp3285.html Sent from the Apache Spark User List mailing list archive at Nabble.com.