Hi Ondrej, Thanks for your reply
I did solve that issue, yes you are right there was an issue with slave IP address setting. Now I am facing issue with the scheduling the tasks. When I try to schedule a task using /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560" The tasks always get scheduled on the same node. The resources from the other nodes are not getting used to schedule the tasks. I just start the mesos slaves like below ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos --hostname=slave1 If I submit the task using the above (mesos-execute) command from same as one of the slave it runs on that system. But when I submit the task from some different system. It uses just that system and queues the tasks not runs on the other slaves. Some times I see the message "Failed to getgid: unknown user" Do I need to start some process to push the task on all the slaves equally? Am I missing something here? Regards, Pradeep On 2 October 2015 at 15:07, Ondrej Smola <ondrej.sm...@gmail.com> wrote: > Hi Pradeep, > > the problem is with IP your slave advertise - mesos by default resolves > your hostname - there are several solutions (let say your node ip is > 192.168.56.128) > > 1) export LIBPROCESS_IP=192.168.56.128 > 2) set mesos options - ip, hostname > > one way to do this is to create files > > echo "192.168.56.128" > /etc/mesos-slave/ip > echo "abc.mesos.com" > /etc/mesos-slave/hostname > > for more configuration options see > http://mesos.apache.org/documentation/latest/configuration > > > > > > 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale <pradeepkiruv...@gmail.com>: > >> Hi Guangya, >> >> Thanks for reply. I found one interesting log message. >> >> 7410 master.cpp:5977] Removed slave >> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave >> registered at the same address >> >> Mostly because of this issue, the systems/slave nodes are getting >> registered and de-registered to make a room for the next node. I can even >> see this on >> the UI interface, for some time one node got added and after some time >> that will be replaced with the new slave node. >> >> The above log is followed by the below log messages. >> >> >> I1002 10:01:12.753865 7416 leveldb.cpp:343] Persisting action (18 bytes) >> to leveldb took 104089ns >> I1002 10:01:12.753885 7416 replica.cpp:679] Persisted action at 384 >> E1002 10:01:12.753891 7417 process.cpp:1912] Failed to shutdown socket >> with fd 15: Transport endpoint is not connected >> I1002 10:01:12.753988 7413 master.cpp:3930] Registered slave >> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578; >> ports(*):[31000-32000] >> I1002 10:01:12.754065 7413 master.cpp:1080] Slave >> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >> (192.168.0.116) disconnected >> I1002 10:01:12.754072 7416 hierarchical.hpp:675] Added slave >> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8; >> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: ) >> I1002 10:01:12.754084 7413 master.cpp:2534] Disconnecting slave >> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >> (192.168.0.116) >> E1002 10:01:12.754118 7417 process.cpp:1912] Failed to shutdown socket >> with fd 16: Transport endpoint is not connected >> I1002 10:01:12.754132 7413 master.cpp:2553] Deactivating slave >> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051 >> (192.168.0.116) >> I1002 10:01:12.754237 7416 hierarchical.hpp:768] Slave >> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated >> I1002 10:01:12.754240 7413 replica.cpp:658] Replica received learned >> notice for position 384 >> I1002 10:01:12.754360 7413 leveldb.cpp:343] Persisting action (20 bytes) >> to leveldb took 95171ns >> I1002 10:01:12.754395 7413 leveldb.cpp:401] Deleting ~2 keys from >> leveldb took 20333ns >> I1002 10:01:12.754406 7413 replica.cpp:679] Persisted action at 384 >> >> >> Thanks, >> Pradeep >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On 2 October 2015 at 02:35, Guangya Liu <gyliu...@gmail.com> wrote: >> >>> Hi Pradeep, >>> >>> Please check some of my questions in line. >>> >>> Thanks, >>> >>> Guangya >>> >>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale < >>> pradeepkiruv...@gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3 >>>> Slaves. >>>> >>>> One slave runs on the Master Node itself and Other slaves run on >>>> different nodes. Here node means the physical boxes. >>>> >>>> I tried running the tasks by configuring one Node cluster. Tested the >>>> task scheduling using mesos-execute, works fine. >>>> >>>> When I configure three Node cluster (1master and 3 slaves) and try to >>>> see the resources on the master (in GUI) only the Master node resources are >>>> visible. >>>> The other nodes resources are not visible. Some times visible but in a >>>> de-actived state. >>>> >>> Can you please append some logs from mesos-slave and mesos-master? There >>> should be some logs in either master or slave telling you what is wrong. >>> >>>> >>>> *Please let me know what could be the reason. All the nodes are in the >>>> same network. * >>>> >>>> When I try to schedule a task using >>>> >>>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test" >>>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P" >>>> --resources="cpus(*):3;mem(*):2560" >>>> >>>> The tasks always get scheduled on the same node. The resources from the >>>> other nodes are not getting used to schedule the tasks. >>>> >>> Based on your previous question, there is only one node in your cluster, >>> that's why other nodes are not available. We need first identify what is >>> wrong with other three nodes first. >>> >>>> >>>> I*s it required to register the frameworks from every slave node on >>>> the Master?* >>>> >>> It is not required. >>> >>>> >>>> *I have configured this cluster using the git-hub code.* >>>> >>>> >>>> Thanks & Regards, >>>> Pradeep >>>> >>>> >>> >> >