It seems like it's getting offer decline calls, which seems like it's
getting the offer calls and was able to reply.

Can you turn on TRACE logging in Spark with the Mesos coarse grain
scheduler and see if it says if it is processing the offers?

Tim

On Fri, Dec 30, 2016 at 2:35 PM, Ji Yan <ji...@drive.ai> wrote:
> Thanks Timothy,
>
> Setting these four environment variables as you suggested has got the Spark
> running
>
> LIBPROCESS_ADVERTISE_IP=<host ip>LIBPROCESS_ADVERTISE_PORT=40286
> LIBPROCESS_IP=0.0.0.0 LIBPROCESS_PORT=40286
>
> After that, it seems that Spark cannot accept any offer from mesos. If I run
> the same script outside the docker container, Spark can get resource and the
> Spark job runs successfully to end.
>
> Here is the mesos master log for running the Spark job inside the Docker
> container
>
> I1230 14:29:55.710973  9557 master.cpp:2500] Subscribing framework eval.py
> with checkpointing disabled and capabilities [ GPU_RESOURCES ]
>
> I1230 14:29:55.712379  9567 hierarchical.cpp:271] Added framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251
>
> I1230 14:29:55.713717  9550 master.cpp:5709] Sending 1 offers to framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
> scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286
>
> I1230 14:29:55.829774  9549 master.cpp:3951] Processing DECLINE call for
> offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1384 ] for framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
> scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286
>
> I1230 14:30:01.055359  9569 http.cpp:381] HTTP GET for /master/state from
> 172.16.8.140:49406 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
> Safari/537.36'
>
> I1230 14:30:01.457598  9553 master.cpp:5709] Sending 1 offers to framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
> scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286
>
> I1230 14:30:01.463732  9542 master.cpp:3951] Processing DECLINE call for
> offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1385 ] for framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
> scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286
>
> I1230 14:30:02.300915  9562 http.cpp:381] HTTP GET for /master/state from
> 172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
> Safari/537.36'
>
> I1230 14:30:03.847647  9553 http.cpp:381] HTTP GET for /master/state from
> 172.16.8.140:49406 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
> Safari/537.36'
>
> I1230 14:30:04.431270  9551 http.cpp:381] HTTP GET for /master/state from
> 172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
> Safari/537.36'
>
> I1230 14:30:07.465801  9549 master.cpp:5709] Sending 1 offers to framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
> scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286
>
> I1230 14:30:07.470860  9542 master.cpp:3951] Processing DECLINE call for
> offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1386 ] for framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
> scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286
>
> I1230 14:30:11.077518  9572 http.cpp:381] HTTP GET for /master/state from
> 172.16.8.140:59764 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
> Safari/537.36'
>
> I1230 14:30:12.387562  9560 http.cpp:381] HTTP GET for /master/state from
> 172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
> Safari/537.36'
>
> I1230 14:30:12.473937  9572 master.cpp:5709] Sending 1 offers to framework
> 993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
> scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286
>
>
>
> On Fri, Dec 30, 2016 at 1:35 PM, Timothy Chen <tnac...@gmail.com> wrote:
>>
>> Hi Ji,
>>
>> One way to make it fixed is to set LIBPROCESS_PORT environment variable on
>> the executor when it is launched.
>>
>> Tim
>>
>>
>> On Dec 30, 2016, at 1:23 PM, Ji Yan <ji...@drive.ai> wrote:
>>
>> Dear Spark Users,
>>
>> We are trying to launch Spark on Mesos from within a docker container. We
>> have found that since the Spark executors need to talk back at the Spark
>> driver, there is need to do a lot of port mapping to make that happen. We
>> seemed to have mapped the ports on what we could find from the documentation
>> page on spark configuration.
>>
>>> spark-2.1.0-bin-spark-2.1/bin/spark-submit \
>>>   --conf 'spark.driver.host'=<host server ip> \
>>>   --conf 'spark.blockManager.port'='40285' \
>>>   --conf 'spark.driver.bindAddress'='0.0.0.0' \
>>>   --conf 'spark.driver.port'='40284' \
>>>   --conf
>>> 'spark.mesos.executor.docker.volumes'='spark-2.1.0-bin-spark-2.1:/spark-2.1.0-bin-spark-2.1'
>>> \
>>>   --conf 'spark.mesos.gpus.max'='2' \
>>>   --conf 'spark.mesos.containerizer'='docker' \
>>>   --conf
>>> 'spark.mesos.executor.docker.image'='docker.drive.ai/spark_gpu_experiment:latest'
>>> \
>>>   --master 'mesos://mesos_master_dev:5050' \
>>>   -v eval.py
>>
>>
>> When we launched Spark this way, from the Mesos master log. It seems that
>> the mesos master is trying to make the offer back to the framework at port
>> 33978 which turns out to be a dynamic port. The job failed at this point
>> because it looks like that the offer cannot reach back to the container. In
>> order to expose that port in the container, we'll need to make it fixed
>> first, does anyone know how to make that port fixed in spark configuration?
>> Any other advice on how to launch Spark on mesos from within docker
>> container is greatly appreciated
>>
>> I1230 12:53:54.758297  9571 master.cpp:2424] Received SUBSCRIBE call for
>> framework 'eval.py' at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
>> I1230 12:53:54.758608  9571 master.cpp:2500] Subscribing framework eval.py
>> with checkpointing disabled and capabilities [ GPU_RESOURCES ]
>> I1230 12:53:54.760036  9569 hierarchical.cpp:271] Added framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233
>> I1230 12:53:54.761533  9549 master.cpp:5709] Sending 1 offers to framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@<some ip>:33978
>> E1230 12:53:57.757814  9573 process.cpp:2105] Failed to shutdown socket
>> with fd 22: Transport endpoint is not connected
>> I1230 12:53:57.758314  9543 master.cpp:1284] Framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
>> disconnected
>> I1230 12:53:57.758378  9543 master.cpp:2725] Disconnecting framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
>> I1230 12:53:57.758411  9543 master.cpp:2749] Deactivating framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
>> I1230 12:53:57.758582  9548 hierarchical.cpp:382] Deactivated framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233
>> W1230 12:53:57.758915  9543 master.hpp:2113] Master attempted to send
>> message to disconnected framework 993198d1-7393-4656-9f75-4f22702609d0-0233
>> (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
>> I1230 12:53:57.759140  9543 master.cpp:1297] Giving framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 0ns to
>> failover
>> I1230 12:53:57.760573  9561 master.cpp:5561] Framework failover timeout,
>> removing framework 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
>> I1230 12:53:57.760648  9561 master.cpp:6296] Removing framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at
>> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
>> I1230 12:53:57.761493  9571 hierarchical.cpp:333] Removed framework
>> 993198d1-7393-4656-9f75-4f22702609d0-0233
>>
>>
>> The information in this email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this email by
>> anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be taken
>> in reliance on it, is prohibited and may be unlawful.
>
>
>
> The information in this email is confidential and may be legally privileged.
> It is intended solely for the addressee. Access to this email by anyone else
> is unauthorized. If you are not the intended recipient, any disclosure,
> copying, distribution or any action taken or omitted to be taken in reliance
> on it, is prohibited and may be unlawful.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to