My cluster consist of 9 slaves server split in 1/2 for two primary applications (Spark | Scala Microservices)
* Spark - (server 1,2,3,4,8) attributes: "rack:spark" * Long running Microservices (server 5,6,7,9) attributes "rack:ms" The spark jobs run in coarse mode and the majority of them are short lived they run for about ~10-15 minutes via Chronos and shutdown. They start every 15 minutes about ~45 jobs. We do lots of deploys daily mostly to the "rack:ms" nodes where these jobs are started via Marathon and run until we need to deploy a new release of code. Recently I started noticing jobs are taking forever to restart or startup like they're not receiving valid offers. The cluster resources consists of the following resources I always have more than enough idle resources available to bring up/down new services yet I've seen one scenario where a service took almost 10 minutes to restart. CPUs| Mem ---|--- Total| 120| 456.8 GB Used| 53.6| 140.5 GB Offered| 0| 0 B Idle| 66.4| 316.3 GB How can I combat this delay? I'm not using roles could this be the problem? Chronos jobs always seem to run fine but they require much less resource than my long running Scala services. Here is a sample job definition for in Marathon. { "id": "production/index-service", "cmd": "env && /opt/orchard/production/index- server/bin/run_jar.sh", "cpus": 1.0, "mem": 4096, "disk": 1000, "user": "orchard", "instances": 2, "constraints": [ [ "hostname","UNIQUE" ], [ "rack", "LIKE", "ms" ] ], "requirePorts": true, "labels": { "ENV": "production", "HAPROXY_GROUP": "microservice" }, "ports": [ 31703, 31803, 31903 ], "maxLaunchDelaySeconds": 3, "backoffFactor": 1.20, "healthChecks": [ { "gracePeriodSeconds": 3, "intervalSeconds": 5, "maxConsecutiveFailures": 3, "protocol": "TCP", "portIndex": 1, "timeoutSeconds": 5 } ], "upgradeStrategy": { "minimumHealthCapacity": 0.5, "maximumOverCapacity": 0.2 } } Any advice appreciated thanks. -- *NOTICE TO RECIPIENTS*: This communication is confidential and intended for the use of the addressee only. If you are not an intended recipient of this communication, please delete it immediately and notify the sender by return email. Unauthorized reading, dissemination, distribution or copying of this communication is prohibited. This communication does not constitute an offer to sell or a solicitation of an indication of interest to purchase any loan, security or any other financial product or instrument, nor is it an offer to sell or a solicitation of an indication of interest to purchase any products or services to any persons who are prohibited from receiving such information under applicable law. The contents of this communication may not be accurate or complete and are subject to change without notice. As such, Orchard App, Inc. (including its subsidiaries and affiliates, "Orchard") makes no representation regarding the accuracy or completeness of the information contained herein. The intended recipient is advised to consult its own professional advisors, including those specializing in legal, tax and accounting matters. Orchard does not provide legal, tax or accounting advice.