Hi Ondrej,

Thanks for your reply

I did solve that issue, yes you are right there was an issue with slave IP
address setting.

Now I am facing issue with the scheduling the tasks. When I try to schedule
a task using

/src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
--command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
--resources="cpus(*):3;mem(*):2560"

The tasks always get scheduled on the same node. The resources from the
other nodes are not getting used to schedule the tasks.

 I just start the mesos slaves like below

./bin/mesos-slave.sh --master=192.168.0.102:5050/mesos  --hostname=slave1

If I submit the task using the above (mesos-execute) command from same as
one of the slave it runs on that system.

But when I submit the task from some different system. It uses just that
system and queues the tasks not runs on the other slaves.
Some times I see the message "Failed to getgid: unknown user"

Do I need to start some process to push the task on all the slaves equally?
Am I missing something here?

Regards,
Pradeep



On 2 October 2015 at 15:07, Ondrej Smola <ondrej.sm...@gmail.com> wrote:

> Hi Pradeep,
>
> the problem is with IP your slave advertise - mesos by default resolves
> your hostname - there are several solutions  (let say your node ip is
> 192.168.56.128)
>
> 1)  export LIBPROCESS_IP=192.168.56.128
> 2)  set mesos options - ip, hostname
>
> one way to do this is to create files
>
> echo "192.168.56.128" > /etc/mesos-slave/ip
> echo "abc.mesos.com" > /etc/mesos-slave/hostname
>
> for more configuration options see
> http://mesos.apache.org/documentation/latest/configuration
>
>
>
>
>
> 2015-10-02 10:06 GMT+02:00 Pradeep Kiruvale <pradeepkiruv...@gmail.com>:
>
>> Hi Guangya,
>>
>> Thanks for reply. I found one interesting log message.
>>
>>  7410 master.cpp:5977] Removed slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S52 (192.168.0.178): a new slave
>> registered at the same address
>>
>> Mostly because of this issue, the systems/slave nodes are getting
>> registered and de-registered to make a room for the next node. I can even
>> see this on
>> the UI interface, for some time one node got added and after some time
>> that will be replaced with the new slave node.
>>
>> The above log is followed by the below log messages.
>>
>>
>> I1002 10:01:12.753865  7416 leveldb.cpp:343] Persisting action (18 bytes)
>> to leveldb took 104089ns
>> I1002 10:01:12.753885  7416 replica.cpp:679] Persisted action at 384
>> E1002 10:01:12.753891  7417 process.cpp:1912] Failed to shutdown socket
>> with fd 15: Transport endpoint is not connected
>> I1002 10:01:12.753988  7413 master.cpp:3930] Registered slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116) with cpus(*):8; mem(*):14930; disk(*):218578;
>> ports(*):[31000-32000]
>> I1002 10:01:12.754065  7413 master.cpp:1080] Slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116) disconnected
>> I1002 10:01:12.754072  7416 hierarchical.hpp:675] Added slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 (192.168.0.116) with cpus(*):8;
>> mem(*):14930; disk(*):218578; ports(*):[31000-32000] (allocated: )
>> I1002 10:01:12.754084  7413 master.cpp:2534] Disconnecting slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116)
>> E1002 10:01:12.754118  7417 process.cpp:1912] Failed to shutdown socket
>> with fd 16: Transport endpoint is not connected
>> I1002 10:01:12.754132  7413 master.cpp:2553] Deactivating slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 at slave(1)@127.0.1.1:5051
>> (192.168.0.116)
>> I1002 10:01:12.754237  7416 hierarchical.hpp:768] Slave
>> 6a11063e-b8ff-43bd-86cf-e6eef0de06fd-S62 deactivated
>> I1002 10:01:12.754240  7413 replica.cpp:658] Replica received learned
>> notice for position 384
>> I1002 10:01:12.754360  7413 leveldb.cpp:343] Persisting action (20 bytes)
>> to leveldb took 95171ns
>> I1002 10:01:12.754395  7413 leveldb.cpp:401] Deleting ~2 keys from
>> leveldb took 20333ns
>> I1002 10:01:12.754406  7413 replica.cpp:679] Persisted action at 384
>>
>>
>> Thanks,
>> Pradeep
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2 October 2015 at 02:35, Guangya Liu <gyliu...@gmail.com> wrote:
>>
>>> Hi Pradeep,
>>>
>>> Please check some of my questions in line.
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Fri, Oct 2, 2015 at 12:55 AM, Pradeep Kiruvale <
>>> pradeepkiruv...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am new to Mesos. I have set up a Mesos cluster with 1 Master and 3
>>>> Slaves.
>>>>
>>>> One slave runs on the Master Node itself and Other slaves run on
>>>> different nodes. Here node means the physical boxes.
>>>>
>>>> I tried running the tasks by configuring one Node cluster. Tested the
>>>> task scheduling using mesos-execute, works fine.
>>>>
>>>> When I configure three Node cluster (1master and 3 slaves) and try to
>>>> see the resources on the master (in GUI) only the Master node resources are
>>>> visible.
>>>>  The other nodes resources are not visible. Some times visible but in a
>>>> de-actived state.
>>>>
>>> Can you please append some logs from mesos-slave and mesos-master? There
>>> should be some logs in either master or slave telling you what is wrong.
>>>
>>>>
>>>> *Please let me know what could be the reason. All the nodes are in the
>>>> same network. *
>>>>
>>>> When I try to schedule a task using
>>>>
>>>> /src/mesos-execute --master=192.168.0.102:5050 --name="cluster-test"
>>>> --command="/usr/bin/hackbench -s 4096 -l 10845760 -g 2 -f 2 -P"
>>>> --resources="cpus(*):3;mem(*):2560"
>>>>
>>>> The tasks always get scheduled on the same node. The resources from the
>>>> other nodes are not getting used to schedule the tasks.
>>>>
>>> Based on your previous question, there is only one node in your cluster,
>>> that's why other nodes are not available. We need first identify what is
>>> wrong with other three nodes first.
>>>
>>>>
>>>> I*s it required to register the frameworks from every slave node on
>>>> the Master?*
>>>>
>>> It is not required.
>>>
>>>>
>>>> *I have configured this cluster using the git-hub code.*
>>>>
>>>>
>>>> Thanks & Regards,
>>>> Pradeep
>>>>
>>>>
>>>
>>
>

Reply via email to