Thanks Timothy, Setting these four environment variables as you suggested has got the Spark running
LIBPROCESS_ADVERTISE_IP=<host ip>LIBPROCESS_ADVERTISE_PORT=40286 LIBPROCESS_IP=0.0.0.0 LIBPROCESS_PORT=40286 After that, it seems that Spark cannot accept any offer from mesos. If I run the same script outside the docker container, Spark can get resource and the Spark job runs successfully to end. Here is the mesos master log for running the Spark job inside the Docker container I1230 14:29:55.710973 9557 master.cpp:2500] Subscribing framework eval.py with checkpointing disabled and capabilities [ GPU_RESOURCES ] I1230 14:29:55.712379 9567 hierarchical.cpp:271] Added framework 993198d1-7393-4656-9f75-4f22702609d0-0251 I1230 14:29:55.713717 9550 master.cpp:5709] Sending 1 offers to framework 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286 I1230 14:29:55.829774 9549 master.cpp:3951] Processing DECLINE call for offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1384 ] for framework 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286 I1230 14:30:01.055359 9569 http.cpp:381] HTTP GET for /master/state from 172.16.8.140:49406 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36' I1230 14:30:01.457598 9553 master.cpp:5709] Sending 1 offers to framework 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286 I1230 14:30:01.463732 9542 master.cpp:3951] Processing DECLINE call for offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1385 ] for framework 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286 I1230 14:30:02.300915 9562 http.cpp:381] HTTP GET for /master/state from 172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36' I1230 14:30:03.847647 9553 http.cpp:381] HTTP GET for /master/state from 172.16.8.140:49406 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36' I1230 14:30:04.431270 9551 http.cpp:381] HTTP GET for /master/state from 172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36' I1230 14:30:07.465801 9549 master.cpp:5709] Sending 1 offers to framework 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286 I1230 14:30:07.470860 9542 master.cpp:3951] Processing DECLINE call for offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1386 ] for framework 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286 I1230 14:30:11.077518 9572 http.cpp:381] HTTP GET for /master/state from 172.16.8.140:59764 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36' I1230 14:30:12.387562 9560 http.cpp:381] HTTP GET for /master/state from 172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36' I1230 14:30:12.473937 9572 master.cpp:5709] Sending 1 offers to framework 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286 On Fri, Dec 30, 2016 at 1:35 PM, Timothy Chen <tnac...@gmail.com> wrote: > Hi Ji, > > One way to make it fixed is to set LIBPROCESS_PORT environment variable on > the executor when it is launched. > > Tim > > > On Dec 30, 2016, at 1:23 PM, Ji Yan <ji...@drive.ai> wrote: > > Dear Spark Users, > > We are trying to launch Spark on Mesos from within a docker container. We > have found that since the Spark executors need to talk back at the Spark > driver, there is need to do a lot of port mapping to make that happen. We > seemed to have mapped the ports on what we could find from the > documentation page on spark configuration. > > spark-2.1.0-bin-spark-2.1/bin/spark-submit \ >> --conf 'spark.driver.host'=<host server ip> \ >> --conf 'spark.blockManager.port'='40285' \ >> --conf 'spark.driver.bindAddress'='0.0.0.0' \ >> --conf 'spark.driver.port'='40284' \ >> --conf 'spark.mesos.executor.docker.volumes'='spark-2.1.0-bin- >> spark-2.1:/spark-2.1.0-bin-spark-2.1' \ >> --conf 'spark.mesos.gpus.max'='2' \ >> --conf 'spark.mesos.containerizer'='docker' \ >> --conf 'spark.mesos.executor.docker.image'='docker.drive.ai/spark_ >> gpu_experiment:latest' \ >> --master 'mesos://mesos_master_dev:5050' \ >> -v eval.py > > > When we launched Spark this way, from the Mesos master log. It seems that > the mesos master is trying to make the offer back to the framework at port > 33978 which turns out to be a dynamic port. The job failed at this point > because it looks like that the offer cannot reach back to the container. In > order to expose that port in the container, we'll need to make it fixed > first, does anyone know how to make that port fixed in spark configuration? > Any other advice on how to launch Spark on mesos from within docker > container is greatly appreciated > > I1230 12:53:54.758297 9571 master.cpp:2424] Received SUBSCRIBE call for > framework 'eval.py' at > scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 > I1230 12:53:54.758608 9571 master.cpp:2500] Subscribing framework eval.py > with checkpointing disabled and capabilities [ GPU_RESOURCES ] > I1230 12:53:54.760036 9569 hierarchical.cpp:271] Added framework > 993198d1-7393-4656-9f75-4f22702609d0-0233I1230 12:53:54.761533 9549 > master.cpp:5709] Sending 1 offers to framework > 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at > scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@<some ip>:33978 > E1230 12:53:57.757814 9573 process.cpp:2105] Failed to shutdown socket with > fd 22: Transport endpoint is not connectedI1230 12:53:57.758314 9543 > master.cpp:1284] Framework 993198d1-7393-4656-9f75-4f22702609d0-0233 > (eval.py) at scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 > disconnected > I1230 12:53:57.758378 9543 master.cpp:2725] Disconnecting framework > 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at > scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 > I1230 12:53:57.758411 9543 master.cpp:2749] Deactivating framework > 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at > scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 > I1230 12:53:57.758582 9548 hierarchical.cpp:382] Deactivated framework > 993198d1-7393-4656-9f75-4f22702609d0-0233 > W1230 12:53:57.758915 9543 master.hpp:2113] Master attempted to send message > to disconnected framework 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) > at scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 > I1230 12:53:57.759140 9543 master.cpp:1297] Giving framework > 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at > scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 0ns to > failover > I1230 12:53:57.760573 9561 master.cpp:5561] Framework failover timeout, > removing framework 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at > scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 > I1230 12:53:57.760648 9561 master.cpp:6296] Removing framework > 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at > scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 > I1230 12:53:57.761493 9571 hierarchical.cpp:333] Removed framework > 993198d1-7393-4656-9f75-4f22702609d0-0233 > > > The information in this email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. > > -- The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.