Hi Stefano,
The agent saves its resource information on the disk, so that it can
recover in case of failure. When the agent starts up, it checks the work
directory, and if it finds information left behind by a previous instance
of the agent, it loads that information and attempts to register with the
master as the same agent. In order to change the agent resources you must
delete the 'work_dir', so that the agent believes it's the first agent to
run on the machine.

As June and Arjun pointed out, the agent will reserve some resources for
itself to make sure that the agent and other operating system processes
don't starve. The actual memory and disk needed for the agent and OS varies
from machine to machine, so it may be OK for you to offer more of the
agent's resources to Mesos. You can do this by specifying the resources you
would like to offer with the `--resources` flag. However, if the agent has
2GB of memory total, I would _not_ recommend that you tell the agent to
offer the entire 2GB to Mesos. It does, in fact, need some amount of
mem/disk for other processes. If you'd like to figure out how much it
needs, you can launch the VM and observe its resource usage under a typical
workload.

The steps outlined by Arjun should work, but note that offering that much
RAM may cause performance issues on your agent if you run tasks on it which
attempt to use all 2GB.

Cheers,
Greg

On Fri, Apr 8, 2016 at 2:44 PM, Arkal Arjun Rao <aa...@ucsc.edu> wrote:

> You set it up with 2048MB but you  probably don't really get all of it
> (try `free -m` on the slave). Same with Disk (look at the value of df).
> from the book "Building Applications in Mesos":
> "The slave will reserve 1 GB or 50% of detected memory, whichever is
> smaller, in order to run itself and other operating system services.
> Likewise, it will reserve 5 GB or 50% of detected disk, whichever is
> smaller."
>
> If you want to explicitly reserve a value, first ensure you have the
> resources you want per slave then run this
> <kill the mesos slave process>
> rm -f /tmp/mesos/meta/slaves/latest
> mesos-slave --master=MASTER_ADDRESS:5050 --hostname=slave_public_IP_i_set
> --resources='cpu(*):1;mem(*):2000;disk(*):9000'
>
> Arjun
>
> On Fri, Apr 8, 2016 at 2:23 PM, Stefano Bianchi <jazzist...@gmail.com>
> wrote:
>
>> What has to be clear is that i'm running virtual machines on openstack,
>> so i am not on bare metal.
>> All the VMs are Openstack Images, and each slave has been built with 2048
>> MB of RAM, so since slaves are 3 i should see in mesos something close to
>> 6144 MB, but mesos shows only 2.7 GB.
>> If you look at the command output i posted in previous messages, the
>> current mesos resources configuration allows 920 MB and 5112 MB of disk
>> space for each slave. I would like that mesos can see for instance 2000 MB
>> of RAM and 9000 MB of disk. and for this reason i have run: mesos-slave
>> --master=MASTER_ADDRESS:5050 --resources='cpu:1;mem:2000;disk:9000'
>>
>> June Taylor, i need to understand:
>> 1) What the command you suggest do?
>> 2) Should i stop mesos-slave before? and then run your command?
>>
>> Thanks in advance.
>>
>> 2016-04-08 21:28 GMT+02:00 June Taylor <j...@umn.edu>:
>>
>>> How much actual RAM do your slaves contain? You can only make available
>>> up to that amount, minus the bit that the slave reserves.
>>>
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>> On Fri, Apr 8, 2016 at 1:29 PM, Stefano Bianchi <jazzist...@gmail.com>
>>> wrote:
>>>
>>>> Hi i would like to enter in this mailing list.
>>>> i'm currently doing my Master Thesis on Mesos and Calico.
>>>> I'm working at INFN, institute of nuclear physics. The goal of the
>>>> thesis is to build a PaaS where mesos is the scheduler and Calico must
>>>> allow the interconnection between multiple datacenters linked to the CERN.
>>>>
>>>> I'm exploiting an IaaS based on Openstack, here i have created 6
>>>> Virtual Machines, 3 Masters and 3 Slaves, on one slave is running Mesos-DNS
>>>> from Marathon.
>>>> All is perfectly working, since i am on another network i changed
>>>> correctly the hostnames such that on mesos are resolvable and i tried to
>>>> run from marathon a simple http server which is scalable on all my machine.
>>>> So all is fine and working.
>>>>
>>>> The only thing that i don't like is that each 3 slaves have 1 CPU 10 GB
>>>> of disk memory and 2GB of RAM, but mesos currently show for each one only 5
>>>> GB of disk memory and 900MB of RAM.
>>>> So checking in documentation i found the command to manage the
>>>> resources.
>>>> I stopped Slave1, for instance, and i have run this command:
>>>>
>>>> mesos-slave --master=MASTER_ADDRESS:5050
>>>> --resources='cpu:1;mem:2000;disk:9000'
>>>>
>>>> where i want set 2000 GB of RAM and 9000GB of disk memory.
>>>>  The output is the following:
>>>>
>>>> I0408 15:11:00.915324  7892 main.cpp:215] Build: 2016-03-10 20:32:58 by 
>>>> root
>>>>
>>>> I0408 15:11:00.915436  7892 main.cpp:217] Version: 0.27.2
>>>>
>>>> I0408 15:11:00.915448  7892 main.cpp:220] Git tag: 0.27.2
>>>>
>>>> I0408 15:11:00.915459  7892 main.cpp:224] Git SHA: 
>>>> 3c9ec4a0f34420b7803848af597de00fedefe0e2
>>>>
>>>> I0408 15:11:00.923334  7892 systemd.cpp:236] systemd version `219` detected
>>>>
>>>> I0408 15:11:00.923384  7892 main.cpp:232] Inializing systemd state
>>>>
>>>> I0408 15:11:00.950050  7892 systemd.cpp:324] Started systemd slice 
>>>> `mesos_executors.slice`
>>>>
>>>> I0408 15:11:00.951529  7892 containerizer.cpp:143] Using isolation: 
>>>> posix/cpu,posix/mem,filesystem/posix
>>>>
>>>> I0408 15:11:00.963232  7892 linux_launcher.cpp:101] Using 
>>>> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
>>>>
>>>> I0408 15:11:00.965541  7892 main.cpp:320] Starting Mesos slave
>>>>
>>>> I0408 15:11:00.966008  7892 slave.cpp:192] Slave started on 
>>>> 1)@192.168.100.56:5051
>>>>
>>>> I0408 15:11:00.966023  7892 slave.cpp:193] Flags at startup: 
>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" 
>>>> --cgroups_cpu_enable_pids_and_tids_count="false" 
>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" 
>>>> --cgroups_limit_swap="false" --cgroups_root="mesos" 
>>>> --container_disk_watch_interval="15secs" --containerizers="mesos" 
>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
>>>> --docker_auth_server="https://auth.docker.io"; --docker_kill_orphans="true" 
>>>> --docker_puller_timeout="60" 
>>>> --docker_registry="https://registry-1.docker.io"; 
>>>> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
>>>> --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" 
>>>> --enforce_container_disk_quota="false" 
>>>> --executor_registration_timeout="1mins" 
>>>> --executor_shutdown_grace_period="5secs" 
>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
>>>> --hadoop_home="" --help="false" --hostname_lookup="true" 
>>>> --image_provisioner_backend="copy" --initialize_driver_logging="true" 
>>>> --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" 
>>>> --logbufsecs="0" --logging_level="INFO" --master="192.168.100.55:5050" 
>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
>>>> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
>>>> --quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
>>>> --registration_backoff_factor="1secs" 
>>>> --resources="cpu:1;mem:2000;disk:9000" --revocable_cpu_low_priority="true" 
>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true" 
>>>> --switch_user="true" --systemd_enable_support="true" 
>>>> --systemd_runtime_directory="/run/systemd/system" --version="false" 
>>>> --work_dir="/tmp/mesos"
>>>>
>>>> I0408 15:11:00.967485  7892 slave.cpp:463] Slave resources: cpu(*):1; 
>>>> mem(*):2000; disk(*):9000; cpus(*):1; ports(*):[31000-32000]
>>>>
>>>> I0408 15:11:00.967547  7892 slave.cpp:471] Slave attributes: [  ]
>>>>
>>>> I0408 15:11:00.967560  7892 slave.cpp:476] Slave hostname: 
>>>> slave1.openstacklocal
>>>>
>>>> I0408 15:11:00.971304  7893 state.cpp:58] Recovering state from 
>>>> '/tmp/mesos/meta'
>>>>
>>>> *Failed to perform recovery: Incompatible slave info detected*.
>>>>
>>>> ------------------------------------------------------------
>>>>
>>>> Old slave info:
>>>>
>>>> hostname: "*slave_public_IP_i_set*"
>>>>
>>>> resources {
>>>>
>>>>   name: "cpus"
>>>>
>>>>   type: SCALAR
>>>>
>>>>   scalar {
>>>>
>>>>     value: 1
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> resources {
>>>>
>>>>   name: "mem"
>>>>
>>>>   type: SCALAR
>>>>
>>>>   scalar {
>>>>
>>>>     value: 920
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> resources {
>>>>
>>>>   name: "disk"
>>>>
>>>>   type: SCALAR
>>>>
>>>>   scalar {
>>>>
>>>>     value: 5112
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> resources {
>>>>
>>>>   name: "ports"
>>>>
>>>>   type: RANGES
>>>>
>>>>   ranges {
>>>>
>>>>     range {
>>>>
>>>>       begin: 31000
>>>>
>>>>       end: 32000
>>>>
>>>>     }
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> id {
>>>>
>>>>   value: "ad490064-1a6e-415c-8536-daef0d8e3572-S7"
>>>>
>>>> }
>>>>
>>>> checkpoint: true
>>>>
>>>> port: 5051
>>>>
>>>> ------------------------------------------------------------
>>>>
>>>> New slave info:
>>>>
>>>> hostname: "
>>>>
>>>> slave1.openstacklocal
>>>>
>>>> "
>>>>
>>>> resources {
>>>>
>>>>   name: "cpu"
>>>>
>>>>   type: SCALAR
>>>>
>>>>   scalar {
>>>>
>>>>     value: 1
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> resources {
>>>>
>>>>   name: "mem"
>>>>
>>>>   type: SCALAR
>>>>
>>>>   scalar {
>>>>
>>>>     value: 2000
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> resources {
>>>>
>>>>   name: "disk"
>>>>
>>>>   type: SCALAR
>>>>
>>>>   scalar {
>>>>
>>>>     value: 9000
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> resources {
>>>>
>>>>   name: "cpus"
>>>>
>>>>   type: SCALAR
>>>>
>>>>   scalar {
>>>>
>>>>     value: 1
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> resources {
>>>>
>>>>   name: "ports"
>>>>
>>>>   type: RANGES
>>>>
>>>>   ranges {
>>>>
>>>>     range {
>>>>
>>>>       begin: 31000
>>>>
>>>>       end: 32000
>>>>
>>>>     }
>>>>
>>>>   }
>>>>
>>>>   role: "*"
>>>>
>>>> }
>>>>
>>>> id {
>>>>
>>>>   value: "ad490064-1a6e-415c-8536-daef0d8e3572-S7"
>>>>
>>>> }
>>>>
>>>> checkpoint: true
>>>>
>>>> port: 5051
>>>>
>>>> ------------------------------------------------------------
>>>>
>>>> To remedy this do as follows:
>>>>
>>>> Step 1: rm -f /tmp/mesos/meta/slaves/latest
>>>>
>>>>         This ensures slave doesn't recover old live executors.
>>>>
>>>> Step 2: Restart the slave.
>>>>
>>>>
>>>>
>>>> I can notice two things:
>>>>
>>>>
>>>> 1)the message of failure;
>>>>
>>>> 2)the hostname is changed; the right one is a public IP i have set in 
>>>> order to resolve the hostname for mesos.
>>>>
>>>> As a consequence, when i start the slave, the resources are exaclty the 
>>>> same, nothing is changed.
>>>>
>>>> Can you please help me?
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Arjun Arkal Rao
>
> PhD Student,
> Haussler Lab,
> UC Santa Cruz,
> USA
>
> aa...@ucsc.edu
>
>

Reply via email to