​Hey,

Sorry for delayed response. I reinstalled my AWS infrastructure. Now I
install everything on RedHat linux. Before I use Amazon Linux.

I tested with single master (m4.large). Everything works perfect. I am not
sure if it was Amazon Linux or my old configurations.

Thanks,
​-Kirils

On 18 December 2016 at 14:03, Guillermo Rodriguez <gu...@spritekin.com>
wrote:

> Hi,
> I run my mesos cluster in AWS, betewwn 40 to 100 m4.2xlarge instances at
> any time. Between 200 and 1500 jobs anytime. Slaves run as spot instances.
>
> So, the only moment I get a TASK_LOST is when I lose a spot instance due
> to being outbid.
>
> I guess you may also lose instances due to an AWS autoscaler scale-in
> procedure, for example, if it decides the cluster is inderutilised then it
> can kill any instane in your cluster, not necessarilly the least used one.
> That's the reason we decided to develop our customised autoscaler that
> detects and kills specific instances based on our own rules.
>
> So, are you using spot fleets or spot innstances? Have you setup your
> scale-in procedures correctly?
>
> Also, if you are running fine grained tiny jobs (400 jobs in a 10xlarge
> means 0.1 CPUs and 400MB RAM each), I recommend you avoid an m4.10xlarge
> instance and run xlarge instances instead. Same price and if you lose one
> you just lose 1/10th of your jobs.
>
> Luck!
>
>
>
>
>
> ------------------------------
> *From*: "haosdent" <haosd...@gmail.com>
> *Sent*: Saturday, December 17, 2016 6:12 PM
> *To*: "user" <user@mesos.apache.org>
> *Subject*: Re: Mesos on AWS
>
> >  sometimes Mesos agent is launched but master doesn’t show them.
> It sounds like the Master Master could not connect to your Agents. May you
> mind paste your Mesos Master log? Any information show Mesos agents are
> disconnected in it?
>
> On Sat, Dec 17, 2016 at 4:08 AM, Kiril Menshikov <kmenshi...@gmail.com>
> wrote:
>>
>> I have my own framework. Sometimes I get TASK_LOST status with message
>> slave lost during health check.
>>
>> Also I found sometimes Mesos agent is launched but master doesn’t show
>> them. From agent I see that it found master and connected. After agent
>> restart it start working.
>>
>> -Kiril
>>
>>
>>
>> On Dec 16, 2016, at 21:58, Zameer Manji <zma...@apache.org> wrote:
>>
>> Hey,
>>
>> Could you detail on what you mean by "delays and health check problems"?
>> Are you using your own framework or an existing one? How are you launching
>> the tasks?
>>
>> Could you share logs from Mesos that show timeouts to ZK?
>>
>> For reference, I operate a large Mesos cluster and I have never
>> encountered problems when running 1k tasks concurrently so I think sharing
>> data would help everyone debug this problem.
>>
>> On Fri, Dec 16, 2016 at 6:05 AM, Kiril Menshikov <kmenshi...@gmail.com>
>> wrote:
>>>
>>> ?Hi,
>>>
>>> Does any body try to run Mesos on AWS instances? Can you give me
>>> recommendations.
>>>
>>> I am developing elastic (scale aws instances on demand) Mesos cluster.
>>> Currently I have 3 master instances. I run about 1000 tasks simultaneously.
>>> I see delays and health check problems.
>>>
>>> ~400 tasks fits in one m4.10xlarge instance. (160GB RAM, 40 CPU).
>>>
>>> At the moment I increase time out in ZooKeeper cluster. What can I do to
>>> decrease timeouts?
>>>
>>> Also how can I increase performance? The main bottleneck is what I have
>>> the big amount of tasks(run simultaneously) for an hour after I shutdown
>>> them or restart (depends how good them perform).
>>>
>>> -Kiril?
>>>
>>> --
>>> Zameer Manji
>>>
>>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Thanks,
-Kiril
Phone +37126409291
Riga, Latvia
Skype perimetr122

Reply via email to