Hi Stefano, The agent saves its resource information on the disk, so that it can recover in case of failure. When the agent starts up, it checks the work directory, and if it finds information left behind by a previous instance of the agent, it loads that information and attempts to register with the master as the same agent. In order to change the agent resources you must delete the 'work_dir', so that the agent believes it's the first agent to run on the machine.
As June and Arjun pointed out, the agent will reserve some resources for itself to make sure that the agent and other operating system processes don't starve. The actual memory and disk needed for the agent and OS varies from machine to machine, so it may be OK for you to offer more of the agent's resources to Mesos. You can do this by specifying the resources you would like to offer with the `--resources` flag. However, if the agent has 2GB of memory total, I would _not_ recommend that you tell the agent to offer the entire 2GB to Mesos. It does, in fact, need some amount of mem/disk for other processes. If you'd like to figure out how much it needs, you can launch the VM and observe its resource usage under a typical workload. The steps outlined by Arjun should work, but note that offering that much RAM may cause performance issues on your agent if you run tasks on it which attempt to use all 2GB. Cheers, Greg On Fri, Apr 8, 2016 at 2:44 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote: > You set it up with 2048MB but you probably don't really get all of it > (try `free -m` on the slave). Same with Disk (look at the value of df). > from the book "Building Applications in Mesos": > "The slave will reserve 1 GB or 50% of detected memory, whichever is > smaller, in order to run itself and other operating system services. > Likewise, it will reserve 5 GB or 50% of detected disk, whichever is > smaller." > > If you want to explicitly reserve a value, first ensure you have the > resources you want per slave then run this > <kill the mesos slave process> > rm -f /tmp/mesos/meta/slaves/latest > mesos-slave --master=MASTER_ADDRESS:5050 --hostname=slave_public_IP_i_set > --resources='cpu(*):1;mem(*):2000;disk(*):9000' > > Arjun > > On Fri, Apr 8, 2016 at 2:23 PM, Stefano Bianchi <jazzist...@gmail.com> > wrote: > >> What has to be clear is that i'm running virtual machines on openstack, >> so i am not on bare metal. >> All the VMs are Openstack Images, and each slave has been built with 2048 >> MB of RAM, so since slaves are 3 i should see in mesos something close to >> 6144 MB, but mesos shows only 2.7 GB. >> If you look at the command output i posted in previous messages, the >> current mesos resources configuration allows 920 MB and 5112 MB of disk >> space for each slave. I would like that mesos can see for instance 2000 MB >> of RAM and 9000 MB of disk. and for this reason i have run: mesos-slave >> --master=MASTER_ADDRESS:5050 --resources='cpu:1;mem:2000;disk:9000' >> >> June Taylor, i need to understand: >> 1) What the command you suggest do? >> 2) Should i stop mesos-slave before? and then run your command? >> >> Thanks in advance. >> >> 2016-04-08 21:28 GMT+02:00 June Taylor <j...@umn.edu>: >> >>> How much actual RAM do your slaves contain? You can only make available >>> up to that amount, minus the bit that the slave reserves. >>> >>> >>> Thanks, >>> June Taylor >>> System Administrator, Minnesota Population Center >>> University of Minnesota >>> >>> On Fri, Apr 8, 2016 at 1:29 PM, Stefano Bianchi <jazzist...@gmail.com> >>> wrote: >>> >>>> Hi i would like to enter in this mailing list. >>>> i'm currently doing my Master Thesis on Mesos and Calico. >>>> I'm working at INFN, institute of nuclear physics. The goal of the >>>> thesis is to build a PaaS where mesos is the scheduler and Calico must >>>> allow the interconnection between multiple datacenters linked to the CERN. >>>> >>>> I'm exploiting an IaaS based on Openstack, here i have created 6 >>>> Virtual Machines, 3 Masters and 3 Slaves, on one slave is running Mesos-DNS >>>> from Marathon. >>>> All is perfectly working, since i am on another network i changed >>>> correctly the hostnames such that on mesos are resolvable and i tried to >>>> run from marathon a simple http server which is scalable on all my machine. >>>> So all is fine and working. >>>> >>>> The only thing that i don't like is that each 3 slaves have 1 CPU 10 GB >>>> of disk memory and 2GB of RAM, but mesos currently show for each one only 5 >>>> GB of disk memory and 900MB of RAM. >>>> So checking in documentation i found the command to manage the >>>> resources. >>>> I stopped Slave1, for instance, and i have run this command: >>>> >>>> mesos-slave --master=MASTER_ADDRESS:5050 >>>> --resources='cpu:1;mem:2000;disk:9000' >>>> >>>> where i want set 2000 GB of RAM and 9000GB of disk memory. >>>> The output is the following: >>>> >>>> I0408 15:11:00.915324 7892 main.cpp:215] Build: 2016-03-10 20:32:58 by >>>> root >>>> >>>> I0408 15:11:00.915436 7892 main.cpp:217] Version: 0.27.2 >>>> >>>> I0408 15:11:00.915448 7892 main.cpp:220] Git tag: 0.27.2 >>>> >>>> I0408 15:11:00.915459 7892 main.cpp:224] Git SHA: >>>> 3c9ec4a0f34420b7803848af597de00fedefe0e2 >>>> >>>> I0408 15:11:00.923334 7892 systemd.cpp:236] systemd version `219` detected >>>> >>>> I0408 15:11:00.923384 7892 main.cpp:232] Inializing systemd state >>>> >>>> I0408 15:11:00.950050 7892 systemd.cpp:324] Started systemd slice >>>> `mesos_executors.slice` >>>> >>>> I0408 15:11:00.951529 7892 containerizer.cpp:143] Using isolation: >>>> posix/cpu,posix/mem,filesystem/posix >>>> >>>> I0408 15:11:00.963232 7892 linux_launcher.cpp:101] Using >>>> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher >>>> >>>> I0408 15:11:00.965541 7892 main.cpp:320] Starting Mesos slave >>>> >>>> I0408 15:11:00.966008 7892 slave.cpp:192] Slave started on >>>> 1)@192.168.100.56:5051 >>>> >>>> I0408 15:11:00.966023 7892 slave.cpp:193] Flags at startup: >>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" >>>> --cgroups_cpu_enable_pids_and_tids_count="false" >>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" >>>> --cgroups_limit_swap="false" --cgroups_root="mesos" >>>> --container_disk_watch_interval="15secs" --containerizers="mesos" >>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker" >>>> --docker_auth_server="https://auth.docker.io" --docker_kill_orphans="true" >>>> --docker_puller_timeout="60" >>>> --docker_registry="https://registry-1.docker.io" >>>> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" >>>> --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" >>>> --enforce_container_disk_quota="false" >>>> --executor_registration_timeout="1mins" >>>> --executor_shutdown_grace_period="5secs" >>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" >>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" >>>> --hadoop_home="" --help="false" --hostname_lookup="true" >>>> --image_provisioner_backend="copy" --initialize_driver_logging="true" >>>> --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" >>>> --logbufsecs="0" --logging_level="INFO" --master="192.168.100.55:5050" >>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs" >>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" >>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins" >>>> --registration_backoff_factor="1secs" >>>> --resources="cpu:1;mem:2000;disk:9000" --revocable_cpu_low_priority="true" >>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true" >>>> --switch_user="true" --systemd_enable_support="true" >>>> --systemd_runtime_directory="/run/systemd/system" --version="false" >>>> --work_dir="/tmp/mesos" >>>> >>>> I0408 15:11:00.967485 7892 slave.cpp:463] Slave resources: cpu(*):1; >>>> mem(*):2000; disk(*):9000; cpus(*):1; ports(*):[31000-32000] >>>> >>>> I0408 15:11:00.967547 7892 slave.cpp:471] Slave attributes: [ ] >>>> >>>> I0408 15:11:00.967560 7892 slave.cpp:476] Slave hostname: >>>> slave1.openstacklocal >>>> >>>> I0408 15:11:00.971304 7893 state.cpp:58] Recovering state from >>>> '/tmp/mesos/meta' >>>> >>>> *Failed to perform recovery: Incompatible slave info detected*. >>>> >>>> ------------------------------------------------------------ >>>> >>>> Old slave info: >>>> >>>> hostname: "*slave_public_IP_i_set*" >>>> >>>> resources { >>>> >>>> name: "cpus" >>>> >>>> type: SCALAR >>>> >>>> scalar { >>>> >>>> value: 1 >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> resources { >>>> >>>> name: "mem" >>>> >>>> type: SCALAR >>>> >>>> scalar { >>>> >>>> value: 920 >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> resources { >>>> >>>> name: "disk" >>>> >>>> type: SCALAR >>>> >>>> scalar { >>>> >>>> value: 5112 >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> resources { >>>> >>>> name: "ports" >>>> >>>> type: RANGES >>>> >>>> ranges { >>>> >>>> range { >>>> >>>> begin: 31000 >>>> >>>> end: 32000 >>>> >>>> } >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> id { >>>> >>>> value: "ad490064-1a6e-415c-8536-daef0d8e3572-S7" >>>> >>>> } >>>> >>>> checkpoint: true >>>> >>>> port: 5051 >>>> >>>> ------------------------------------------------------------ >>>> >>>> New slave info: >>>> >>>> hostname: " >>>> >>>> slave1.openstacklocal >>>> >>>> " >>>> >>>> resources { >>>> >>>> name: "cpu" >>>> >>>> type: SCALAR >>>> >>>> scalar { >>>> >>>> value: 1 >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> resources { >>>> >>>> name: "mem" >>>> >>>> type: SCALAR >>>> >>>> scalar { >>>> >>>> value: 2000 >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> resources { >>>> >>>> name: "disk" >>>> >>>> type: SCALAR >>>> >>>> scalar { >>>> >>>> value: 9000 >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> resources { >>>> >>>> name: "cpus" >>>> >>>> type: SCALAR >>>> >>>> scalar { >>>> >>>> value: 1 >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> resources { >>>> >>>> name: "ports" >>>> >>>> type: RANGES >>>> >>>> ranges { >>>> >>>> range { >>>> >>>> begin: 31000 >>>> >>>> end: 32000 >>>> >>>> } >>>> >>>> } >>>> >>>> role: "*" >>>> >>>> } >>>> >>>> id { >>>> >>>> value: "ad490064-1a6e-415c-8536-daef0d8e3572-S7" >>>> >>>> } >>>> >>>> checkpoint: true >>>> >>>> port: 5051 >>>> >>>> ------------------------------------------------------------ >>>> >>>> To remedy this do as follows: >>>> >>>> Step 1: rm -f /tmp/mesos/meta/slaves/latest >>>> >>>> This ensures slave doesn't recover old live executors. >>>> >>>> Step 2: Restart the slave. >>>> >>>> >>>> >>>> I can notice two things: >>>> >>>> >>>> 1)the message of failure; >>>> >>>> 2)the hostname is changed; the right one is a public IP i have set in >>>> order to resolve the hostname for mesos. >>>> >>>> As a consequence, when i start the slave, the resources are exaclty the >>>> same, nothing is changed. >>>> >>>> Can you please help me? >>>> >>>> >>>> Thanks! >>>> >>>> >>>> >>> >> > > > -- > Arjun Arkal Rao > > PhD Student, > Haussler Lab, > UC Santa Cruz, > USA > > aa...@ucsc.edu > >