It's not configurable yet, but will be in the upcoming 0.23.0 release. On Fri, Jul 17, 2015 at 3:46 PM, Nastooh Avessta (navesta) < nave...@cisco.com> wrote:
> Hi > > Trying to adjust the current failover time to below 10 seconds and don’t > seem to be able to find the right set of parameters. Currently, it takes > around minute and half for master to detect that a slave has gone offline, > which seems to correspond to > slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I can’t find > these parameters in mesos-master: > > > > # mesos-master --version > > mesos 0.22.1 > > #mesos-master --help > > Usage: mesos-master [...] > > > > Supported options: > > --acls=VALUE The value could be a JSON > formatted string of ACLs > > or a file path containing the > JSON formatted ACLs used > > for authorization. Path could > be of the form 'file:///path/to/file' > > or '/path/to/file'. > > > > See the ACLs protobuf in > mesos.proto for the expected format. > > > > Example: > > { > > "register_frameworks": [ > > { > > > "principals": { "type": "ANY" }, > > > "roles": { "values": ["a"] } > > } > > ], > > "run_tasks": [ > > { > > > "principals": { > "values": ["a", "b"] }, > > "users": { > "values": ["c"] } > > } > > ], > > "shutdown_frameworks": [ > > { > > > "principals": { "values": ["a", "b"] }, > > > "framework_principals": { "values": ["c"] } > > } > > ] > > } > > --allocation_interval=VALUE Amount of time to wait between > performing > > (batch) allocations (e.g., > 500ms, 1sec, etc). (default: 1secs) > > --[no-]authenticate If authenticate is 'true' only > authenticated frameworks are allowed > > to register. If 'false' > unauthenticated frameworks are also > > allowed to register. (default: > false) > > --[no-]authenticate_slaves If 'true' only authenticated > slaves are allowed to register. > > If 'false' unauthenticated > slaves are also allowed to register. (default: false) > > --authenticators=VALUE Authenticator implementation to > use when authenticating frameworks > > and/or slaves. Use the default > 'crammd5', or > > load an alternate authenticator > module using --modules. (default: crammd5) > > --cluster=VALUE Human readable name for the > cluster, > > displayed in the webui. > > --credentials=VALUE Either a path to a text file > with a list of credentials, > > each line containing > 'principal' and 'secret' separated by whitespace, > > or, a path to a JSON-formatted > file containing credentials. > > Path could be of the form > 'file:///path/to/file' or '/path/to/file'. > > JSON file Example: > > { > > "credentials": [ > > { > > > "principal": "sherman", > > > "secret": "kitesurf", > > } > > ] > > } > > Text file Example: > > username secret > > > > --external_log_file=VALUE Specified the externally > managed log file. This file will be > > exposed in the webui and HTTP > api. This is useful when using > > stderr logging as the log file > is otherwise unknown to Mesos. > > --framework_sorter=VALUE Policy to use for allocating > resources > > between a given user's > frameworks. Options > > are the same as for > user_allocator. (default: drf) > > --[no-]help Prints this help message > (default: false) > > --hooks=VALUE A comma separated list of hook > modules to be > > installed inside master. > > --hostname=VALUE The hostname the master should > advertise in ZooKeeper. > > If left unset, the hostname is > resolved from the IP address > > that the master binds to. > > --[no-]initialize_driver_logging Whether to automatically > initialize google logging of scheduler > > and/or executor drivers. > (default: true) > > --ip=VALUE IP address to listen on > > --[no-]log_auto_initialize Whether to automatically > initialize the replicated log used for the > > registry. If this is set to > false, the log has to be manually > > initialized when used for the > very first time. (default: true) > > --log_dir=VALUE Directory path to put log files > (no default, nothing > > is written to disk unless > specified; > > does not affect logging to > stderr). > > NOTE: 3rd party log messages > (e.g. ZooKeeper) are > > only written to stderr! > > > > --logbufsecs=VALUE How many seconds to buffer log > messages for (default: 0) > > --logging_level=VALUE Log message at or above this > level; possible values: > > 'INFO', 'WARNING', 'ERROR'; if > quiet flag is used, this > > will affect just the logs from > log_dir (if specified) (default: INFO) > > --modules=VALUE List of modules to be loaded > and be available to the internal > > subsystems. > > > > Use --modules=filepath to > specify the list of modules via a > > file containing a JSON > formatted string. 'filepath' can be > > of the form > 'file:///path/to/file' or '/path/to/file'. > > > > Use --modules="{...}" to > specify the list of modules inline. > > > > Example: > > { > > "libraries": [ > > { > > "file": > "/path/to/libfoo.so", > > "modules": [ > > { > > "name": > "org_apache_mesos_bar", > > "parameters": [ > > { > > "key": "X", > > "value": "Y" > > } > > ] > > }, > > { > > "name": > "org_apache_mesos_baz" > > } > > ] > > }, > > { > > "name": "qux", > > "modules": [ > > { > > "name": > "org_apache_mesos_norf" > > } > > ] > > } > > ] > > } > > --offer_timeout=VALUE Duration of time before an > offer is rescinded from a framework. > > This helps fairness when > running frameworks that hold on to offers, > > or frameworks that accidentally > drop offers. > > --port=VALUE Port to listen on (default: > 5050) > > --[no-]quiet Disable logging to stderr > (default: false) > > --quorum=VALUE The size of the quorum of > replicas when using 'replicated_log' based > > registry. It is imperative to > set this value to be a majority of > > masters i.e., quorum > (number > of masters)/2. > > --rate_limits=VALUE The value could be a JSON > formatted string of rate limits > > or a file path containing the > JSON formatted rate limits used > > for framework rate limiting. > > Path could be of the form > 'file:///path/to/file' > > or '/path/to/file'. > > > > See the RateLimits protobuf in > mesos.proto for the expected format. > > > > Example: > > { > > "limits": [ > > { > > "principal": "foo", > > "qps": 55.5 > > }, > > { > > "principal": "bar" > > } > > ], > > "aggregate_default_qps": 33.3 > > } > > --recovery_slave_removal_limit=VALUE For failovers, limit on the > percentage of slaves that can be removed > > from the registry *and* > shutdown after the re-registration timeout > > elapses. If the limit is > exceeded, the master will fail over rather > > than remove the slaves. > > This can be used to provide > safety guarantees for production > > environments. Production > environments may expect that across Master > > failovers, at most a certain > percentage of slaves will fail > > permanently (e.g. due to > rack-level failures). > > Setting this limit would ensure > that a human needs to get > > involved should an unexpected > widespread failure of slaves occur > > in the cluster. > > Values: [0%-100%] (default: > 100%) > > --registry=VALUE Persistence strategy for the > registry; > > available options are > 'replicated_log', 'in_memory' (for testing). (default: replicated_log) > > --registry_fetch_timeout=VALUE Duration of time to wait in > order to fetch data from the registry > > after which the operation is > considered a failure. (default: 1mins) > > --registry_store_timeout=VALUE Duration of time to wait in > order to store data in the registry > > after which the operation is > considered a failure. (default: 5secs) > > --[no-]registry_strict Whether the Master will take > actions based on the persistent > > information stored in the > Registry. Setting this to false means > > that the Registrar will never > reject the admission, readmission, > > or removal of a slave. > Consequently, 'false' can be used to > > bootstrap the persistent state > on a running cluster. > > NOTE: This flag is > *experimental* and should not be used in > > production yet. (default: false) > > --roles=VALUE A comma separated list of the > allocation > > roles that frameworks in this > cluster may > > belong to. > > --[no-]root_submissions Can root submit frameworks? > (default: true) > > --slave_removal_rate_limit=VALUE The maximum rate (e.g., > 1/10mins, 2/3hrs, etc) at which slaves will > > be removed from the master when > they fail health checks. By default > > slaves will be removed as soon > as they fail the health checks. > > The value is of the form > <Number of slaves>/<Duration>. > > --slave_reregister_timeout=VALUE The timeout within which all > slaves are expected to re-register > > when a new master is elected as > the leader. Slaves that do not > > re-register within the timeout > will be removed from the registry > > and will be shutdown if they > attempt to communicate with master. > > NOTE: This value has to be > atleast 10mins. (default: 10mins) > > --user_sorter=VALUE Policy to use for allocating > resources > > between users. May be one of: > > dominant_resource_fairness > (drf) (default: drf) > > --[no-]version Show version and exit. > (default: false) > > --webui_dir=VALUE Directory path of the webui > files/assets (default: /usr/share/mesos/webui) > > --weights=VALUE A comma separated list of > role/weight pairs > > of the form > 'role=weight,role=weight'. Weights > > are used to indicate forms of > priority. > > --whitelist=VALUE Path to a file with a list of > slaves > > (one per line) to advertise > offers for. > > Path could be of the form > 'file:///path/to/file' or '/path/to/file'. > > --work_dir=VALUE Directory path to store the > persistent information stored in the > > Registry. (example: > /var/lib/mesos/master) > > --zk=VALUE ZooKeeper URL (used for leader > election amongst masters) > > May be one of: > > > zk://host1:port1,host2:port2,.../path > > zk://username:password@host1 > :port1,host2:port2,.../path > > file:///path/to/file (where > file contains one of the above) > > --zk_session_timeout=VALUE ZooKeeper session timeout. > (default: 10secs) > > > > Furthermore, setting these parameter either in /etc/mesos-master/ or > inline generates the following error: > > # /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos --port=5050 > --log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228 > --quorum=1 --work > > _dir=/var/lib/mesos --max_slave_ping_timeouts=2 > > Failed to load unknown flag 'max_slave_ping_timeouts' > > Usage: mesos-master [...] > > > > Supported options: > > --acls=VALUE The valu > > … > > > > Any thoughts? > > Cheers, > > [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg] > > *Nastooh Avessta* > ENGINEER.SOFTWARE ENGINEERING > nave...@cisco.com > Phone: *+1 604 647 1527 <%2B1%20604%20647%201527>* > > *Cisco Systems Limited* > 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121 > VANCOUVER > BRITISH COLUMBIA > V7X 1J1 > CA > Cisco.com <http://www.cisco.com/> > > > > [image: Think before you print.]Think before you print. > > This email may contain confidential and privileged material for the sole > use of the intended recipient. Any review, use, distribution or disclosure > by others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by > reply email and delete all copies of this message. > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/index.html > > Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J > 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences > <http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe > <http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy > <http://www.cisco.com/web/siteassets/legal/privacy.html>* > > >