Re: Running a task in Mesos cluster

Pradeep Kiruvale Mon, 05 Oct 2015 04:30:59 -0700

Hi Guangya,

Hmm!...That is strange in my case!


If I run from the mesos-execute on one of the slave/master node then the
tasks get their resources and they get scheduled well.
But if I start the mesos-execute on another node which is neither
slave/master then I have this issue.

I am using an lxc container on master as a client to launch the tasks. This
is also in the same network as master/slaves.
And I just launch the task as you did. But the tasks are not getting
scheduled.


On master the logs are same as I sent you before

Deactivating framework 77539063-89ce-4efa-a20b-ca788abbd912-0066

On both of the slaves I can see the below logs

I1005 13:23:32.547987  4831 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0060 by master@192.168.0.102:5050
W1005 13:23:32.548135  4831 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0060
I1005 13:23:33.697707  4833 slave.cpp:3926] Current disk usage 3.60%. Max
allowed age: 6.047984349521910days
I1005 13:23:34.098599  4829 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0061 by master@192.168.0.102:5050
W1005 13:23:34.098740  4829 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0061
I1005 13:23:35.274569  4831 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0062 by master@192.168.0.102:5050
W1005 13:23:35.274683  4831 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0062
I1005 13:23:36.193964  4829 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0063 by master@192.168.0.102:5050
W1005 13:23:36.194090  4829 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0063
I1005 13:24:01.914788  4827 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0064 by master@192.168.0.102:5050
W1005 13:24:01.914937  4827 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0064
I1005 13:24:03.469974  4833 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0065 by master@192.168.0.102:5050
W1005 13:24:03.470118  4833 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0065
I1005 13:24:04.642654  4826 slave.cpp:1980] Asked to shut down framework
77539063-89ce-4efa-a20b-ca788abbd912-0066 by master@192.168.0.102:5050
W1005 13:24:04.642812  4826 slave.cpp:1995] Cannot shut down unknown
framework 77539063-89ce-4efa-a20b-ca788abbd912-0066



On 5 October 2015 at 13:09, Guangya Liu <gyliu...@gmail.com> wrote:

> Hi Pradeep,
>
> From your log, seems that the master process is exiting and this caused
> the framework fail over to another mesos master. Can you please show more
> detail for your issue reproduced steps?
>
> I did some test by running mesos-execute on a client host which does not
> have any mesos service and the task can schedule well.
>
> root@mesos008:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master=
> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 10"
> --resources="cpus(*):1;mem(*):256"
> I1005 18:59:47.974123  1233 sched.cpp:164] Version: 0.26.0
> I1005 18:59:47.990890  1248 sched.cpp:262] New master detected at
> master@192.168.0.107:5050
> I1005 18:59:47.993074  1248 sched.cpp:272] No credentials provided.
> Attempting to register without authentication
> I1005 18:59:48.001194  1249 sched.cpp:641] Framework registered with
> 04b9af5e-e9b6-4c59-8734-eba407163922-0002
> Framework registered with 04b9af5e-e9b6-4c59-8734-eba407163922-0002
> task cluster-test submitted to slave
> c0e5fdde-595e-4768-9d04-25901d4523b6-S0
> Received status update TASK_RUNNING for task cluster-test
> Received status update TASK_FINISHED for task cluster-test
> I1005 18:59:58.431144  1249 sched.cpp:1771] Asked to stop the driver
> I1005 18:59:58.431591  1249 sched.cpp:1040] Stopping framework
> '04b9af5e-e9b6-4c59-8734-eba407163922-0002'
> root@mesos008:~/src/mesos/m1/mesos/build# ps -ef | grep mesos
> root      1259  1159  0 19:06 pts/0    00:00:00 grep --color=auto mesos
>
> Thanks,
>
> Guangya
>
>
> On Mon, Oct 5, 2015 at 6:50 PM, Pradeep Kiruvale <
> pradeepkiruv...@gmail.com> wrote:
>
>> Hi Guangya,
>>
>> I am facing one more issue. If I try to schedule the tasks from some
>> external client system running the same cli mesos-execute.
>> The tasks are not getting launched. The tasks reach the Master and it
>> just drops the requests, below are the logs related to that
>>
>> I1005 11:33:35.025594 21369 master.cpp:2250] Subscribing framework  with
>> checkpointing disabled and capabilities [  ]
>> E1005 11:33:35.026100 21373 process.cpp:1912] Failed to shutdown socket
>> with fd 14: Transport endpoint is not connected
>> I1005 11:33:35.026129 21372 hierarchical.hpp:515] Added framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055
>> I1005 11:33:35.026298 21369 master.cpp:1119] Framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
>> scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
>> disconnected
>> I1005 11:33:35.026329 21369 master.cpp:2475] Disconnecting framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
>> scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
>> I1005 11:33:35.026340 21369 master.cpp:2499] Deactivating framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
>> scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
>> E1005 11:33:35.026345 21373 process.cpp:1912] Failed to shutdown socket
>> with fd 14: Transport endpoint is not connected
>> I1005 11:33:35.026376 21369 master.cpp:1143] Giving framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
>> scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259 0ns to
>> failover
>> I1005 11:33:35.026743 21372 hierarchical.hpp:599] Deactivated framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055
>> W1005 11:33:35.026757 21368 master.cpp:4828] Master returning resources
>> offered to framework 77539063-89ce-4efa-a20b-ca788abbd912-0055 because the
>> framework has terminated or is inactive
>> I1005 11:33:35.027014 21371 hierarchical.hpp:1103] Recovered cpus(*):8;
>> mem(*):14868; disk(*):218835; ports(*):[31000-32000] (total: cpus(*):8;
>> mem(*):14868; disk(*):218835; ports(*):[31000-32000], allocated: ) on slave
>> 77539063-89ce-4efa-a20b-ca788abbd912-S2 from framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055
>> I1005 11:33:35.027159 21371 hierarchical.hpp:1103] Recovered cpus(*):8;
>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (total: cpus(*):8;
>> mem(*):14930; disk(*):218578; ports(*):[31000-32000], allocated: ) on slave
>> 77539063-89ce-4efa-a20b-ca788abbd912-S1 from framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055
>> I1005 11:33:35.027668 21366 master.cpp:4815] Framework failover timeout,
>> removing framework 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
>> scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
>> I1005 11:33:35.027715 21366 master.cpp:5571] Removing framework
>> 77539063-89ce-4efa-a20b-ca788abbd912-0055 () at
>> scheduler-b1bc0243-b5be-44ae-894c-ca318c24ce6d@127.0.1.1:47259
>>
>>
>> Can you please tell me what is the reason? The client is in the same
>> network as well. But it does not run any master or slave processes.
>>
>> Thanks & Regards,
>> Pradeeep
>>
>> On 5 October 2015 at 12:13, Guangya Liu <gyliu...@gmail.com> wrote:
>>
>>> Hi Pradeep,
>>>
>>> Glad it finally works! Not sure if you are using systemd.slice or not,
>>> are you running to this issue:
>>> https://issues.apache.org/jira/browse/MESOS-1195
>>>
>>> Hope Jie Yu can give you some help on this ;-)
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Mon, Oct 5, 2015 at 5:25 PM, Pradeep Kiruvale <
>>> pradeepkiruv...@gmail.com> wrote:
>>>
>>>> Hi Guangya,
>>>>
>>>>
>>>> Thanks for sharing the information.
>>>>
>>>> Now I could launch the tasks. The problem was with the permission. If I
>>>> start all the slaves and Master as root it works fine.
>>>> Else I have problem with launching the tasks.
>>>>
>>>> But on one of the slave I could not launch the slave as root, I am
>>>> facing the following issue.
>>>>
>>>> Failed to create a containerizer: Could not create MesosContainerizer:
>>>> Failed to create launcher: Failed to create Linux launcher: Failed to mount
>>>> cgroups hierarchy at '/sys/fs/cgroup/freezer': 'freezer' is already
>>>> attached to another hierarchy
>>>>
>>>> I took that out from the cluster for now. The tasks are getting
>>>> scheduled on the other two slave nodes.
>>>>
>>>> Thanks for your timely help
>>>>
>>>> -Pradeep
>>>>
>>>> On 5 October 2015 at 10:54, Guangya Liu <gyliu...@gmail.com> wrote:
>>>>
>>>>> Hi Pradeep,
>>>>>
>>>>> My steps was pretty simple just as
>>>>> https://github.com/apache/mesos/blob/master/docs/getting-started.md#examples
>>>>>
>>>>> On Master node: root@mesos1:~/src/mesos/m1/mesos/build# GLOG_v=1
>>>>>  ./bin/mesos-master.sh --ip=192.168.0.107 --work_dir=/var/lib/mesos
>>>>> On 3 Slave node: root@mesos007:~/src/mesos/m1/mesos/build# GLOG_v=1
>>>>> ./bin/mesos-slave.sh --master=192.168.0.107:5050
>>>>>
>>>>> Then schedule a task on any of the node, here I was using slave node
>>>>> mesos007, you can see that the two tasks was launched on different host.
>>>>>
>>>>> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute
>>>>> --master=192.168.0.107:5050 --name="cluster-test"
>>>>> --command="/bin/sleep 100" --resources="cpus(*):1;mem(*):256"
>>>>> I1005 16:49:11.013432  2971 sched.cpp:164] Version: 0.26.0
>>>>> I1005 16:49:11.027802  2992 sched.cpp:262] New master detected at
>>>>> master@192.168.0.107:5050
>>>>> I1005 16:49:11.029579  2992 sched.cpp:272] No credentials provided.
>>>>> Attempting to register without authentication
>>>>> I1005 16:49:11.038182  2985 sched.cpp:641] Framework registered with
>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-0002
>>>>> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0002
>>>>> task cluster-test submitted to slave
>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-S0  <<<<<<<<<<<<<<<<<<
>>>>> Received status update TASK_RUNNING for task cluster-test
>>>>> ^C
>>>>> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute
>>>>> --master=192.168.0.107:5050 --name="cluster-test"
>>>>> --command="/bin/sleep 100" --resources="cpus(*):1;mem(*):256"
>>>>> I1005 16:50:18.346984  3036 sched.cpp:164] Version: 0.26.0
>>>>> I1005 16:50:18.366114  3055 sched.cpp:262] New master detected at
>>>>> master@192.168.0.107:5050
>>>>> I1005 16:50:18.368010  3055 sched.cpp:272] No credentials provided.
>>>>> Attempting to register without authentication
>>>>> I1005 16:50:18.376338  3056 sched.cpp:641] Framework registered with
>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-0003
>>>>> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0003
>>>>> task cluster-test submitted to slave
>>>>> c0e5fdde-595e-4768-9d04-25901d4523b6-S1 <<<<<<<<<<<<<<<<<<<<
>>>>> Received status update TASK_RUNNING for task cluster-test
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Guangya
>>>>>
>>>>> On Mon, Oct 5, 2015 at 4:21 PM, Pradeep Kiruvale <
>>>>> pradeepkiruv...@gmail.com> wrote:
>>>>>
>>>>>> Hi Guangya,
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> I just want to know how did you launch the tasks.
>>>>>>
>>>>>> 1. What processes you have started on Master?
>>>>>> 2. What are the processes you have started on Slaves?
>>>>>>
>>>>>> I am missing something here, otherwise all my slave have enough
>>>>>> memory and cpus to launch the tasks I mentioned.
>>>>>> What I am missing is some configuration steps.
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Pradeep
>>>>>>
>>>>>>
>>>>>> On 3 October 2015 at 13:14, Guangya Liu <gyliu...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Pradeep,
>>>>>>>
>>>>>>> I did some test with your case and found that the task can run
>>>>>>> randomly on the three slave hosts, every time may have different result.
>>>>>>> The logic is here:
>>>>>>> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.hpp#L1263-#L1266
>>>>>>> The allocator will help random shuffle the slaves every time when
>>>>>>> allocate resources for offers.
>>>>>>>
>>>>>>> I see that every of your task need the minimum resources as "
>>>>>>> resources="cpus(*):3;mem(*):2560", can you help check if all of
>>>>>>> your slaves have enough resources? If you want your task run on other
>>>>>>> slaves, then those slaves need to have at least 3 cpus and 2550M memory.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Fri, Oct 2, 2015 at 9:26 PM, Pradeep Kiruvale <
>>>>>>> pradeepkiruv...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Ondrej,
>>>>>>>>
>>>>>>>> Thanks for your reply
>>>>>>>>
>>>>>>>> I did solve that issue, yes you are right there was an issue with
>>>>>>>> slave IP address setting.
>>>>>>>>
>>>>>>>> Now I am facing issue with the scheduling the tasks. When I try to
>>>>>>>> schedule a task using
>>>>>>>>
>>>>>>>> /src/mesos-execute --master=192.168.0.102:5050
>>>>>>>> --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l 
>>>>>>>> 10845760 -g
>>>>>>>> 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560"
>>>>>>>>
>>>>>>>> The tasks always get scheduled on the same node. The resources from
>>>>>>>> the other nodes are not getting used to schedule the tasks.
>>>>>>>>
>>>>>>>>  I just start the mesos slaves like below
>>>>>>>>
>>>>>>>> ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos
>>>>>>>>  --hostname=slave1
>>>>>>>>
>>>>>>>> If I submit the task using the above (mesos-execute) command from
>>>>>>>> same as one of the slave it runs on that system.
>>>>>>>>
>>>>>>>> But when I submit the task from some different system. It uses just
>>>>>>>> that system and queues the tasks not runs on the other slaves.
>>>>>>>> Some times I see the message "Failed to getgid: unknown user"
>>>>>>>>
>>>>>>>> Do I need to start some process to push the task on all the slaves
>>>>>>>> equally? Am I missing something here?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Pradeep
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2 October 2015 at 15:07, Ondrej Smola <ondrej.sm...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Pradeep,
>>>>>>>>>
>>>>>>>>> the problem is with IP your slave advertise - mesos by default
>>>>>>>>> resolves your hostname - there are several solutions  (let say your 
>>>>>>>>> node ip
>>>>>>>>> is 192.168.56.128)
>>>>>>>>>
>>>>>>>>> 1)  export LIBPROCESS_IP=192.168.56.128
>>>>>>>>> 2)  set mesos options - ip, hostname
>>>>>>>>>
>>>>>>>>> one way to do this is to create files
>>>>>>>>>
>>>>>>>>> echo "192.168.56.128" > /etc/mesos-slave/ip
>>>>>>>>> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>>>>>>>>>
>>>>>>>>> for more configuration options see
>>>>>>>>> http://mesos.apache.org/documentation/latest/configuration
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale <
>>>>>>>>> pradeepkiruv...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi Guangya,
>>>>>>>>>>
>>>>>>>>>> Thanks for reply. I found one interesting log message.
>>>>>>>>>>
>>>>>>>>>>  7410 master.cpp:5977] Removed slave
>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>>>>>>>>>> registered at the same address
>>>>>>>>>>
>>>>>>>>>> Mostly because of this issue, the systems/slave nodes are getting
>>>>>>>>>> registered and de-registered to make a room for the next node. I can 
>>>>>>>>>> even
>>>>>>>>>> see this on
>>>>>>>>>> the UI interface, for some time one node got added and after some
>>>>>>>>>> time that will be replaced with the new slave node.
>>>>>>>>>>
>>>>>>>>>> The above log is followed by the below log messages.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action
>>>>>>>>>> (18 bytes) to leveldb took 104089ns
>>>>>>>>>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at
>>>>>>>>>> 384
>>>>>>>>>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown
>>>>>>>>>> socket with fd 15: Transport endpoint is not connected
>>>>>>>>>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@
>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116) with cpus(*):8; mem(*):14930;
>>>>>>>>>> disk(*):218578; ports(*):[31000-32000]
>>>>>>>>>> I1002 10:01:12.754065  7413 master.cpp:1080] Slave
>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@
>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116) disconnected
>>>>>>>>>> I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with 
>>>>>>>>>> cpus(*):8;
>>>>>>>>>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
>>>>>>>>>> I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@
>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116)
>>>>>>>>>> E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown
>>>>>>>>>> socket with fd 16: Transport endpoint is not connected
>>>>>>>>>> I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@
>>>>>>>>>> 127.0.1.1:5051 (192.168.0.116)
>>>>>>>>>> I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
>>>>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
>>>>>>>>>> I1002 10:01:12.754240  7413 replica.cpp:658] Replica received
>>>>>>>>>> learned notice for position 384
>>>>>>>>>> I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action
>>>>>>>>>> (20 bytes) to leveldb took 95171ns
>>>>>>>>>> I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys
>>>>>>>>>> from leveldb took 20333ns
>>>>>>>>>> I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at
>>>>>>>>>> 384
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Pradeep
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2 October 2015 at 02:35, Guangya Liu <gyliu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Pradeep,
>>>>>>>>>>>
>>>>>>>>>>> Please check some of my questions in line.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Guangya
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
>>>>>>>>>>> pradeepkiruv...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I am new to Mesos. I have set up a Mesos cluster with 1 Master
>>>>>>>>>>>> and 3 Slaves.
>>>>>>>>>>>>
>>>>>>>>>>>> One slave runs on the Master Node itself and Other slaves run
>>>>>>>>>>>> on different nodes. Here node means the physical boxes.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried running the tasks by configuring one Node cluster.
>>>>>>>>>>>> Tested the task scheduling using mesos-execute, works fine.
>>>>>>>>>>>>
>>>>>>>>>>>> When I configure three Node cluster (1master and 3 slaves) and
>>>>>>>>>>>> try to see the resources on the master (in GUI) only the Master 
>>>>>>>>>>>> node
>>>>>>>>>>>> resources are visible.
>>>>>>>>>>>>  The other nodes resources are not visible. Some times visible
>>>>>>>>>>>> but in a de-actived state.
>>>>>>>>>>>>
>>>>>>>>>>> Can you please append some logs from mesos-slave and
>>>>>>>>>>> mesos-master? There should be some logs in either master or slave 
>>>>>>>>>>> telling
>>>>>>>>>>> you what is wrong.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Please let me know what could be the reason. All the nodes are
>>>>>>>>>>>> in the same network. *
>>>>>>>>>>>>
>>>>>>>>>>>> When I try to schedule a task using
>>>>>>>>>>>>
>>>>>>>>>>>> /src/mesos-execute --master=192.168.0.102:5050
>>>>>>>>>>>> --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l 
>>>>>>>>>>>> 10845760 -g
>>>>>>>>>>>> 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560"
>>>>>>>>>>>>
>>>>>>>>>>>> The tasks always get scheduled on the same node. The resources
>>>>>>>>>>>> from the other nodes are not getting used to schedule the tasks.
>>>>>>>>>>>>
>>>>>>>>>>> Based on your previous question, there is only one node in your
>>>>>>>>>>> cluster, that's why other nodes are not available. We need first 
>>>>>>>>>>> identify
>>>>>>>>>>> what is wrong with other three nodes first.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I*s it required to register the frameworks from every slave
>>>>>>>>>>>> node on the Master?*
>>>>>>>>>>>>
>>>>>>>>>>> It is not required.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *I have configured this cluster using the git-hub code.*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>>> Pradeep
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Running a task in Mesos cluster

Reply via email to