Hi Pradeep, the problem is with IP your slave advertise - mesos by default resolves your hostname - there are several solutions (let say your node ip is 192.168.56.128)
1) export LIBPROCESS_IP=192.168.56.128 2) set mesos options - ip, hostname one way to do this is to create files echo "192.168.56.128" > /etc/mesos-slave/ip echo "abc.mesos.com" > /etc/mesos-slave/hostname for more configuration options see http://mesos.apache.org/documentation/latest/configuration 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale <pradeepkiruv...@gmail.com>: > Hi Guangya, > > Thanks for reply. I found one interesting log message. > > 7410 master.cpp:5977] Removed slave > 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave > registered at the same address > > Mostly because of this issue, the systems/slave nodes are getting > registered and de-registered to make a room for the next node. I can even > see this on > the UI interface, for some time one node got added and after some time > that will be replaced with the new slave node. > > The above log is followed by the below log messages. > > > I1002 10:01:12.753865 7416 leveldb.cpp:343] Persisting action (18 bytes) > to leveldb took 104089ns > I1002 10:01:12.753885 7416 replica.cpp:679] Persisted action at 384 > E1002 10:01:12.753891 7417 process.cpp:1912] Failed to shutdown socket > with fd 15: Transport endpoint is not connected > I1002 10:01:12.753988 7413 master.cpp:3930] Registered slave > 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 > (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578; > ports(*):[31000-32000] > I1002 10:01:12.754065 7413 master.cpp:1080] Slave > 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 > (192.168.0.116) disconnected > I1002 10:01:12.754072 7416 hierarchical.hpp:675] Added slave > 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8; > mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: ) > I1002 10:01:12.754084 7413 master.cpp:2534] Disconnecting slave > 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 > (192.168.0.116) > E1002 10:01:12.754118 7417 process.cpp:1912] Failed to shutdown socket > with fd 16: Transport endpoint is not connected > I1002 10:01:12.754132 7413 master.cpp:2553] Deactivating slave > 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 > (192.168.0.116) > I1002 10:01:12.754237 7416 hierarchical.hpp:768] Slave > 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated > I1002 10:01:12.754240 7413 replica.cpp:658] Replica received learned > notice for position 384 > I1002 10:01:12.754360 7413 leveldb.cpp:343] Persisting action (20 bytes) > to leveldb took 95171ns > I1002 10:01:12.754395 7413 leveldb.cpp:401] Deleting ~2 keys from leveldb > took 20333ns > I1002 10:01:12.754406 7413 replica.cpp:679] Persisted action at 384 > > > Thanks, > Pradeep > > > > > > > > > > > > > > > > > > > > On 2 October 2015 at 02:35, Guangya Liu <gyliu...@gmail.com> wrote: > >> Hi Pradeep, >> >> Please check some of my questions in line. >> >> Thanks, >> >> Guangya >> >> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale < >> pradeepkiruv...@gmail.com> wrote: >> >>> Hi All, >>> >>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3 >>> Slaves. >>> >>> One slave runs on the Master Node itself and Other slaves run on >>> different nodes. Here node means the physical boxes. >>> >>> I tried running the tasks by configuring one Node cluster. Tested the >>> task scheduling using mesos-execute, works fine. >>> >>> When I configure three Node cluster (1master and 3 slaves) and try to >>> see the resources on the master (in GUI) only the Master node resources are >>> visible. >>> The other nodes resources are not visible. Some times visible but in a >>> de-actived state. >>> >> Can you please append some logs from mesos-slave and mesos-master? There >> should be some logs in either master or slave telling you what is wrong. >> >>> >>> *Please let me know what could be the reason. All the nodes are in the >>> same network. * >>> >>> When I try to schedule a task using >>> >>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test" >>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P" >>> --resources="cpus(*):3;mem(*):2560" >>> >>> The tasks always get scheduled on the same node. The resources from the >>> other nodes are not getting used to schedule the tasks. >>> >> Based on your previous question, there is only one node in your cluster, >> that's why other nodes are not available. We need first identify what is >> wrong with other three nodes first. >> >>> >>> I*s it required to register the frameworks from every slave node on the >>> Master?* >>> >> It is not required. >> >>> >>> *I have configured this cluster using the git-hub code.* >>> >>> >>> Thanks & Regards, >>> Pradeep >>> >>> >> >