Thanks for your ideas.

Changing the behavior to send SUPPRESS and REVIVE offers, the
performance improves.

Now, I have to propagate the new scheduler to all user, in order to
check if this works well.

Thanks again.


On 03/03/2017 02:13, Benjamin Mahler wrote:
> What Gabriel is alluding to is a situation where you have:
>
> * Frameworks with lower shares that do not want additional resources, and
> * Frameworks with a higher shares that want additional resources.
>
> If there are a sufficient number of frameworks, it's possible for the
> decline filters of the low share frameworks to expire before we get a
> chance to offer resources to the high share frameworks. In this case,
> we are stuck offering to the low share frameworks and never get a
> chance to offer to the high share frameworks.
>
> I can't tell yet if this is what is occurring in your setup, but the
> recommendation is to update the scheduler to make a SUPPRESS call to
> tell mesos it does not want any more resources (and REVIVE later if it
> wants resources). In your case that means that once the task list is
> emptied, you should send a SUPPRESS message.
>
> Ben
>
>
>
> On Thu, Mar 2, 2017 at 4:33 PM, Gabriel Hartmann
> <gabr...@mesosphere.io <mailto:gabr...@mesosphere.io>> wrote:
>
>     Possibly the suppress/revive problem.
>
>     On Thu, Mar 2, 2017 at 4:30 PM Benjamin Mahler <bmah...@apache.org
>     <mailto:bmah...@apache.org>> wrote:
>
>         Can you upload the full logs somewhere and link to them here?
>
>         How many frameworks are you running? Do they all run in the
>         "*" role?
>         Are the tasks short lived or long lived?
>         Can you update your test to not use the --offer_timeout? The
>         intention of that is to mitigate against frameworks that hold
>         on to offers, but it sounds like your frameworks decline.
>
>         On Thu, Mar 2, 2017 at 3:57 PM, Harold Molina-Bulla
>         <h.mol...@tsc.uc3m.es <mailto:h.mol...@tsc.uc3m.es>> wrote:
>
>             Hi,
>
>             Thanks for your reply.
>
>>             Hi there, more clarification is needed: 
>>
>>                 I have close to 800 CPUs, but the system does not
>>                 assign all the available resources to all our tasks.
>>
>>             What do you mean precisely here? Can you describe what
>>             you're seeing?
>>             Also, you have more than 800GB or RAM right?
>>
>
>             Yes, we have at least 2GBytes per CPU, and typically our
>             resource table looks like:
>
>             In this case 346/788 cpus are available and not assigned
>             to any task, but we have more than 400 tasks waiting to be
>             running.
>
>             Checking the mesos-master log, it not make offers to all
>             running frameworks all the time, just a few ones:
>
>>             I0303 00:16:01.964318 31791 master.cpp:6517] Sending 3
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0053 (Ejecucion:
>>             FRUS) at
>>             
>> scheduler-52a267e9-30d1-4cc8-847e-fa7acfddf855@192.168.151.147:32899
>>             
>> <mailto:scheduler-52a267e9-30d1-4cc8-847e-fa7acfddf855@192.168.151.147:32899>
>>             I0303 00:16:01.966234 31791 master.cpp:6517] Sending 5
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0072 (:izanami) at
>>             
>> scheduler-ce746b8b-adac-4a0c-8310-5d312c9ed04f@192.168.151.186:44233
>>             
>> <mailto:scheduler-ce746b8b-adac-4a0c-8310-5d312c9ed04f@192.168.151.186:44233>
>>             I0303 00:16:01.968003 31791 master.cpp:6517] Sending 6
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0084 (vatmoutput) at
>>             
>> scheduler-078b1978-840a-437e-a23e-5bca8c5e05c8@192.168.151.84:43023
>>             
>> <mailto:scheduler-078b1978-840a-437e-a23e-5bca8c5e05c8@192.168.151.84:43023>
>>             I0303 00:16:01.969828 31791 master.cpp:6517] Sending 6
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0081 (vatmoutput) at
>>             
>> scheduler-d921e4bb-ee23-4e77-93d9-7742264839e5@192.168.151.84:43067
>>             
>> <mailto:scheduler-d921e4bb-ee23-4e77-93d9-7742264839e5@192.168.151.84:43067>
>>             I0303 00:16:01.971613 31791 master.cpp:6517] Sending 6
>>             offers to framework
>>             c5299003-e29d-43cb-8ca7-887ab24c8513-0175 (:izanami) at
>>             
>> scheduler-e10a1167-62d7-4ded-b932-792b5478ab61@192.168.151.186:38706
>>             
>> <mailto:scheduler-e10a1167-62d7-4ded-b932-792b5478ab61@192.168.151.186:38706>
>>             I0303 00:16:01.973351 31791 master.cpp:6517] Sending 6
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0082 (vatmoutputg)
>>             at
>>             
>> scheduler-c4db35be-41e1-45cb-8005-f0f7827a23d0@192.168.151.84:33668
>>             
>> <mailto:scheduler-c4db35be-41e1-45cb-8005-f0f7827a23d0@192.168.151.84:33668>
>>             I0303 00:16:01.975126 31791 master.cpp:6517] Sending 6
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0062
>>             (vatmvalidation) at
>>             
>> scheduler-44ed1457-a752-4037-89b6-590221db3de5@192.168.151.84:33148
>>             
>> <mailto:scheduler-44ed1457-a752-4037-89b6-590221db3de5@192.168.151.84:33148>
>>             I0303 00:16:01.976877 31791 master.cpp:6517] Sending 6
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0077 (:izanami) at
>>             
>> scheduler-c648708f-32f3-44d5-9014-3fd0dbb461f7@192.168.151.186:35345
>>             
>> <mailto:scheduler-c648708f-32f3-44d5-9014-3fd0dbb461f7@192.168.151.186:35345>
>>             I0303 00:16:01.978590 31791 master.cpp:6517] Sending 6
>>             offers to framework
>>             4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0083 (vatmoutputg)
>>             at
>>             
>> scheduler-fb965e89-5764-4a07-a94a-43de45babc7a@192.168.151.84:39218
>>             
>> <mailto:scheduler-fb965e89-5764-4a07-a94a-43de45babc7a@192.168.151.84:39218>
>             We have close to twice Frameworks running in this moment,
>             one of them (not included) with more than 300 tasks
>             waiting and just 100 cpus assigned (1 cpu per task).
>
>             The problem is (we think): the mesos-master does not
>             offers resources to all the tasks all the time and the
>             declined resources are not re-offered to other tasks. Any
>             idea to how to change the behavior or the rate to offer
>             resources to the tasks?
>
>             FYI We set the --offer_timeout=1sec
>
>             Thanks in advance.
>
>             Harold Molina-Bulla Ph.D.
>
>             On 02/03/2017 23:28, Benjamin Mahler wrote:
>>
>>             Ben
>>
>>             On Thu, Mar 2, 2017 at 9:00 AM, Harold Molina-Bulla
>>             <h.mol...@tsc.uc3m.es <mailto:h.mol...@tsc.uc3m.es>> wrote:
>>
>>                 Hi Everybody,
>>
>>                 We are trying to develop an Scheduler in Python to
>>                 distribute processes in a Mesos cluster.
>>
>>                 I have close to 800 CPUs, but the system does not
>>                 assign all the available resources to all our tasks.
>>
>>                 In order to test, we are defining: 1 CPU, 1Gbyte RAM
>>                 per process in order all the process fits on our
>>                 machines. And launch several scripts simultaneous in
>>                 order to have Nprocs > Ncpus (close 900 tasks in total).
>>
>>                 Our script is based on the test_framework.py example
>>                 included in the Mesos src distribution, with changes
>>                 like if the list of tasks to launch is empty, send an
>>                 decline message.
>>
>>                 We have deployed Mesos 1.1.0.
>>
>>                 Any ideas in order the improvement the use of our
>>                 resources?
>>
>>                 Thx in advance!
>>
>>                 Harold Molina-Bulla Ph.D.
>>                 -- 
>>
>>                 /"En una época de mentira universal, decir la verdad
>>                 constituye un acto revolucionario”/
>>                 George Orwell (1984)
>>
>>                 Recuerda: PRISM te está vigilando!!! X)
>>
>>                 *Harold Molina-Bulla*
>>                 Clave GnuPG: *189D5144*
>>
>>
>
>             -- 
>
>             /"En una época de mentira universal, decir la verdad
>             constituye un acto revolucionario”/
>             George Orwell (1984)
>
>             Recuerda: PRISM te está vigilando!!! X)
>
>             *Harold Molina-Bulla*
>             /h.mol...@tsc.uc3m.es <mailto:h.mol...@tsc.uc3m.es>/
>             Clave GnuPG: *189D5144*
>
>
>

-- 

/"En una época de mentira universal, decir la verdad constituye un acto
revolucionario”/
George Orwell (1984)

Recuerda: PRISM te está vigilando!!! X)

*Harold Molina-Bulla*
/h.mol...@tsc.uc3m.es/
Clave GnuPG: *189D5144*

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to