Hi Pradeep,

Glad it finally works! Not sure if you are using systemd.slice or not, are
you running to this issue: https://issues.apache.org/jira/browse/MESOS-1195

Hope Jie Yu can give you some help on this ;-)

Thanks,

Guangya

On Mon, Oct 5, 2015 at 5:25 PM, Pradeep Kiruvale <pradeepkiruv...@gmail.com>
wrote:

> Hi Guangya,
>
>
> Thanks for sharing the information.
>
> Now I could launch the tasks. The problem was with the permission. If I
> start all the slaves and Master as root it works fine.
> Else I have problem with launching the tasks.
>
> But on one of the slave I could not launch the slave as root, I am facing
> the following issue.
>
> Failed to create a containerizer: Could not create MesosContainerizer:
> Failed to create launcher: Failed to create Linux launcher: Failed to mount
> cgroups hierarchy at '/sys/fs/cgroup/freezer': 'freezer' is already
> attached to another hierarchy
>
> I took that out from the cluster for now. The tasks are getting scheduled
> on the other two slave nodes.
>
> Thanks for your timely help
>
> -Pradeep
>
> On 5 October 2015 at 10:54, Guangya Liu <gyliu...@gmail.com> wrote:
>
>> Hi Pradeep,
>>
>> My steps was pretty simple just as
>> https://github.com/apache/mesos/blob/master/docs/getting-started.md#examples
>>
>> On Master node: root@mesos1:~/src/mesos/m1/mesos/build# GLOG_v=1
>>  ./bin/mesos-master.sh --ip=192.168.0.107 --work_dir=/var/lib/mesos
>> On 3 Slave node: root@mesos007:~/src/mesos/m1/mesos/build# GLOG_v=1
>> ./bin/mesos-slave.sh --master=192.168.0.107:5050
>>
>> Then schedule a task on any of the node, here I was using slave node
>> mesos007, you can see that the two tasks was launched on different host.
>>
>> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master=
>> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 100"
>> --resources="cpus(*):1;mem(*):256"
>> I1005 16:49:11.013432  2971 sched.cpp:164] Version: 0.26.0
>> I1005 16:49:11.027802  2992 sched.cpp:262] New master detected at
>> master@192.168.0.107:5050
>> I1005 16:49:11.029579  2992 sched.cpp:272] No credentials provided.
>> Attempting to register without authentication
>> I1005 16:49:11.038182  2985 sched.cpp:641] Framework registered with
>> c0e5fdde-595e-4768-9d04-25901d4523b6-0002
>> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0002
>> task cluster-test submitted to slave
>> c0e5fdde-595e-4768-9d04-25901d4523b6-S0  <<<<<<<<<<<<<<<<<<
>> Received status update TASK_RUNNING for task cluster-test
>> ^C
>> root@mesos007:~/src/mesos/m1/mesos/build# ./src/mesos-execute --master=
>> 192.168.0.107:5050 --name="cluster-test" --command="/bin/sleep 100"
>> --resources="cpus(*):1;mem(*):256"
>> I1005 16:50:18.346984  3036 sched.cpp:164] Version: 0.26.0
>> I1005 16:50:18.366114  3055 sched.cpp:262] New master detected at
>> master@192.168.0.107:5050
>> I1005 16:50:18.368010  3055 sched.cpp:272] No credentials provided.
>> Attempting to register without authentication
>> I1005 16:50:18.376338  3056 sched.cpp:641] Framework registered with
>> c0e5fdde-595e-4768-9d04-25901d4523b6-0003
>> Framework registered with c0e5fdde-595e-4768-9d04-25901d4523b6-0003
>> task cluster-test submitted to slave
>> c0e5fdde-595e-4768-9d04-25901d4523b6-S1 <<<<<<<<<<<<<<<<<<<<
>> Received status update TASK_RUNNING for task cluster-test
>>
>> Thanks,
>>
>> Guangya
>>
>> On Mon, Oct 5, 2015 at 4:21 PM, Pradeep Kiruvale <
>> pradeepkiruv...@gmail.com> wrote:
>>
>>> Hi Guangya,
>>>
>>> Thanks for your reply.
>>>
>>> I just want to know how did you launch the tasks.
>>>
>>> 1. What processes you have started on Master?
>>> 2. What are the processes you have started on Slaves?
>>>
>>> I am missing something here, otherwise all my slave have enough memory
>>> and cpus to launch the tasks I mentioned.
>>> What I am missing is some configuration steps.
>>>
>>> Thanks & Regards,
>>> Pradeep
>>>
>>>
>>> On 3 October 2015 at 13:14, Guangya Liu <gyliu...@gmail.com> wrote:
>>>
>>>> Hi Pradeep,
>>>>
>>>> I did some test with your case and found that the task can run randomly
>>>> on the three slave hosts, every time may have different result. The logic
>>>> is here:
>>>> https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.hpp#L1263-#L1266
>>>> The allocator will help random shuffle the slaves every time when
>>>> allocate resources for offers.
>>>>
>>>> I see that every of your task need the minimum resources as "
>>>> resources="cpus(*):3;mem(*):2560", can you help check if all of your
>>>> slaves have enough resources? If you want your task run on other slaves,
>>>> then those slaves need to have at least 3 cpus and 2550M memory.
>>>>
>>>> Thanks
>>>>
>>>> On Fri, Oct 2, 2015 at 9:26 PM, Pradeep Kiruvale <
>>>> pradeepkiruv...@gmail.com> wrote:
>>>>
>>>>> Hi Ondrej,
>>>>>
>>>>> Thanks for your reply
>>>>>
>>>>> I did solve that issue, yes you are right there was an issue with
>>>>> slave IP address setting.
>>>>>
>>>>> Now I am facing issue with the scheduling the tasks. When I try to
>>>>> schedule a task using
>>>>>
>>>>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>>>>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>>>>> --resources="cpus(*):3;mem(*):2560"
>>>>>
>>>>> The tasks always get scheduled on the same node. The resources from
>>>>> the other nodes are not getting used to schedule the tasks.
>>>>>
>>>>>  I just start the mesos slaves like below
>>>>>
>>>>> ./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos
>>>>>  --hostname=slave1
>>>>>
>>>>> If I submit the task using the above (mesos-execute) command from same
>>>>> as one of the slave it runs on that system.
>>>>>
>>>>> But when I submit the task from some different system. It uses just
>>>>> that system and queues the tasks not runs on the other slaves.
>>>>> Some times I see the message "Failed to getgid: unknown user"
>>>>>
>>>>> Do I need to start some process to push the task on all the slaves
>>>>> equally? Am I missing something here?
>>>>>
>>>>> Regards,
>>>>> Pradeep
>>>>>
>>>>>
>>>>>
>>>>> On 2 October 2015 at 15:07, Ondrej Smola <ondrej.sm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Pradeep,
>>>>>>
>>>>>> the problem is with IP your slave advertise - mesos by default
>>>>>> resolves your hostname - there are several solutions  (let say your node 
>>>>>> ip
>>>>>> is 192.168.56.128)
>>>>>>
>>>>>> 1)  export LIBPROCESS_IP=192.168.56.128
>>>>>> 2)  set mesos options - ip, hostname
>>>>>>
>>>>>> one way to do this is to create files
>>>>>>
>>>>>> echo "192.168.56.128" > /etc/mesos-slave/ip
>>>>>> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>>>>>>
>>>>>> for more configuration options see
>>>>>> http://mesos.apache.org/documentation/latest/configuration
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale <
>>>>>> pradeepkiruv...@gmail.com>:
>>>>>>
>>>>>>> Hi Guangya,
>>>>>>>
>>>>>>> Thanks for reply. I found one interesting log message.
>>>>>>>
>>>>>>>  7410 master.cpp:5977] Removed slave
>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>>>>>>> registered at the same address
>>>>>>>
>>>>>>> Mostly because of this issue, the systems/slave nodes are getting
>>>>>>> registered and de-registered to make a room for the next node. I can 
>>>>>>> even
>>>>>>> see this on
>>>>>>> the UI interface, for some time one node got added and after some
>>>>>>> time that will be replaced with the new slave node.
>>>>>>>
>>>>>>> The above log is followed by the below log messages.
>>>>>>>
>>>>>>>
>>>>>>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18
>>>>>>> bytes) to leveldb took 104089ns
>>>>>>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
>>>>>>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown
>>>>>>> socket with fd 15: Transport endpoint is not connected
>>>>>>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>>>> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
>>>>>>> ports(*):[31000-32000]
>>>>>>> I1002 10:01:12.754065  7413 master.cpp:1080] Slave
>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>>>> (192.168.0.116) disconnected
>>>>>>> I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
>>>>>>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
>>>>>>> I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>>>> (192.168.0.116)
>>>>>>> E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown
>>>>>>> socket with fd 16: Transport endpoint is not connected
>>>>>>> I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>>>>>>> (192.168.0.116)
>>>>>>> I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
>>>>>>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
>>>>>>> I1002 10:01:12.754240  7413 replica.cpp:658] Replica received
>>>>>>> learned notice for position 384
>>>>>>> I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20
>>>>>>> bytes) to leveldb took 95171ns
>>>>>>> I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from
>>>>>>> leveldb took 20333ns
>>>>>>> I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Pradeep
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2 October 2015 at 02:35, Guangya Liu <gyliu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Pradeep,
>>>>>>>>
>>>>>>>> Please check some of my questions in line.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Guangya
>>>>>>>>
>>>>>>>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
>>>>>>>> pradeepkiruv...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and
>>>>>>>>> 3 Slaves.
>>>>>>>>>
>>>>>>>>> One slave runs on the Master Node itself and Other slaves run on
>>>>>>>>> different nodes. Here node means the physical boxes.
>>>>>>>>>
>>>>>>>>> I tried running the tasks by configuring one Node cluster. Tested
>>>>>>>>> the task scheduling using mesos-execute, works fine.
>>>>>>>>>
>>>>>>>>> When I configure three Node cluster (1master and 3 slaves) and try
>>>>>>>>> to see the resources on the master (in GUI) only the Master node 
>>>>>>>>> resources
>>>>>>>>> are visible.
>>>>>>>>>  The other nodes resources are not visible. Some times visible but
>>>>>>>>> in a de-actived state.
>>>>>>>>>
>>>>>>>> Can you please append some logs from mesos-slave and mesos-master?
>>>>>>>> There should be some logs in either master or slave telling you what is
>>>>>>>> wrong.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Please let me know what could be the reason. All the nodes are in
>>>>>>>>> the same network. *
>>>>>>>>>
>>>>>>>>> When I try to schedule a task using
>>>>>>>>>
>>>>>>>>> /src/mesos-execute --master=192.168.0.102:5050
>>>>>>>>> --name="cluster-test" --command="/usr/bin/hackbench -s 4096 -l 
>>>>>>>>> 10845760 -g
>>>>>>>>> 2 -f 2 -P" --resources="cpus(*):3;mem(*):2560"
>>>>>>>>>
>>>>>>>>> The tasks always get scheduled on the same node. The resources
>>>>>>>>> from the other nodes are not getting used to schedule the tasks.
>>>>>>>>>
>>>>>>>> Based on your previous question, there is only one node in your
>>>>>>>> cluster, that's why other nodes are not available. We need first 
>>>>>>>> identify
>>>>>>>> what is wrong with other three nodes first.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I*s it required to register the frameworks from every slave node
>>>>>>>>> on the Master?*
>>>>>>>>>
>>>>>>>> It is not required.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> *I have configured this cluster using the git-hub code.*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Pradeep
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to