Re: Share GPU resources via attributes or as custom resources (INTERNAL)

2016-01-14 Thread haosdent
>Then, if a job is sent to the machine when the 4 GPUs are already busy, the job will fail to start, right? I not sure this. But if job fail, Marathon would retry as you said. >a job is sent to the machine, all 4 GPUs will become busy If you specify your task only use 1 gpu in resources field. I

Re: Powered by mesos list

2016-01-14 Thread o...@magnetic.io
tnx! Would like to hear opinions on how to categorise our solution and maybe restructure/rephrase this page: Vamp is not so much “built on Mesos” but “makes use of Mesos” as it offers higher-level features using our Mesos/Marathon-driver. Maybe it’s semantics but i just wanted to check with

Tasks failing when restarting slave on Mesos 0.23.1

2016-01-14 Thread Matthias Bach
Hi all, We are using Mesos 0.23.1 in combination with Aurora 0.10.0. So far we have been using the JSON format for Mesos' credential files. However, because of MESOS-3695 we decided to switch to the plain text format before updating to 0.24.1. Our understanding is that this should be a NOOP.

Tasks failing when restarting slave on Mesos 0.23.1

2016-01-14 Thread Matthias Bach
Hi all, We are using Mesos 0.23.1 in combination with Aurora 0.10.0. So far we have been using the JSON format for Mesos' credential files. However, because of MESOS-3695 we decided to switch to the plain text format before updating to 0.24.1. Our understanding is that this should be a NOOP.

Tasks failing when restarting slave on Mesos 0.23.1

2016-01-14 Thread Bach, Matthias
Hi all, We are using Mesos 0.23.1 in combination with Aurora 0.10.0. So far we have been using the JSON format for Mesos' credential files. However, because of MESOS-3695 we decided to switch to the plain text format before updating to 0.24.1. Our understanding is that this should be a NOOP.

Share GPU resources via attributes or as custom resources (INTERNAL)

2016-01-14 Thread Humberto.Castejon
I have a machine with 4 GPUs and want to use Mesos+Marathon to schedule the jobs to be run in the machine. Each job will use maximum 1 GPU and sharing 1 GPU between small jobs would be ok. I know Mesos does not directly support GPUs, but it seems I might use custom resources or attributes to do

Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
7.0.1 -host-port 53 -container-ip 172.17.0.2 -container-port 53 root 5287 3823 0 13:57 ?00:00:00 docker-proxy -proto udp -host-ip 172.17.0.1 -host-port 53 -container-ip 172.17.0.2 -container-port 53 root 7119 4967 0 14:00 ?00:00:01 mesos-docker-executor --container=m

Re: slave nodes are living in two cluster and can not remove correctly.

2016-01-14 Thread X Brick
sorry for the wrong api response of cluster A { > "active": true, > "attributes": { > "apps": "logstash", > "colo": "cn5", > "type": "prod" > }, > "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com", > "id": "20151230-034049-3282655242-5050-1802-S7", > "pid":

Re: slave nodes are living in two cluster and can not remove correctly.

2016-01-14 Thread Shuai Lin
Based on your description, you have two clusters: - old cluster B, with mesos 0.25, and the master ip is 10.88.169.195 - new cluster A, with mesos 0.22, and the master ip is 10.90.12.29 Also you have a slave S, 10.90.5.19, which was originally in cluster B, and you have reconfigured it to join

slave nodes are living in two cluster and can not remove correctly.

2016-01-14 Thread X Brick
Hi folks, I meet a very strange issue when I migrated two nodes from one cluster to another about one week ago. Two nodes: l-bu128g3-10k10.ops.cn2 l-bu128g5-10k10.ops.cn2 I did not clean the mesos data dir before they join the another cluster, then I found the nodes live in two cluster at the

Re: Help needed (alas, urgently)

2016-01-14 Thread Tim Chen
> root 5279 3823 0 13:57 ?00:00:00 docker-proxy -proto tcp > -host-ip 172.17.0.1 -host-port 53 -container-ip 172.17.0.2 -container-port > 53 > root 5287 3823 0 13:57 ?00:00:00 docker-proxy -proto udp > -host-ip 172.17.0.1 -host-port 53 -container-ip 172.17

Re: Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
modifying docker_stop_timeout. Back shortly Thanks again. -Paul PS: what do you make of the "broken pipe" error in the docker.log? *from /var/log/upstart/docker.log* [34mINFO[3054] GET /v1.15/images/mongo:2.6.8/json INFO[3054] GET /v1.21/images/mesos-20160114-153418-1674208327-5050-3

Re: Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
st-port 6783 -container-ip 172.17.0.2 -container-port >> 6783 >> root 5271 3823 0 13:57 ?00:00:00 docker-proxy -proto udp >> -host-ip 0.0.0.0 -host-port 6783 -container-ip 172.17.0.2 -container-port >> 6783 >> root 5279 3823 0 13:57 ?00:

Re: Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
ing >> at 100% CPU. >> >> I will try modifying docker_stop_timeout. Back shortly >> >> Thanks again. >> >> -Paul >> >> PS: what do you make of the "broken pipe" error in the docker.log? >> >> *from /var/log/upstart/docke