Check with @Srikant via hangout. It looks the Linux cgroups memory.stat is incorrect after `chown` cgroup to a normal user. Would continue to follow up and verify if it is the bug of Mesos cgroups after @Srikant have any test result in a new machine. Thanks a lot for @Srikant great helps!
On Thu, Oct 6, 2016 at 8:17 PM, Srikant Kalani <srikant.blackr...@gmail.com> wrote: > Thanks for the detail steps. > > We are also using same flags . > > Today we ran our task twice. First with the root I'd and it was working > fine and we were able to implement cgroups .UI was working as expected. > > But second time when we ran same task with application I'd cgroup didn't > work. Memory.stat file provided in your email dont have rss updated value. > > Do I need to use any other flags in agent so that non root I'd can also > follow cgroups. > On 5 Oct 2016 10:40 p.m., "haosdent" <haosd...@gmail.com> wrote: > >> > These flags are used in agent - cgroups_limits_swap=true >> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group >> In agent logs I can see updated memory limit to 33MB for container. >> >> Not sure if there are typos or not, some flags name may incorrect. Add >> according to >> >> > "mem_limit_bytes": 1107296256, >> >> I think mesos allocated 1107296256 bytes memory (1GB) to your task >> instead of 33 MB. >> >> For the status of `mem_rss_bytes` is zero, let me describe how I test it >> on my machine, maybe helpful for you to troubleshoot the problem. >> >> ``` >> ## Start the master >> sudo ./bin/mesos-master.sh --ip=111.223.45.25 --hostname=111.223.45.25 >> --work_dir=/tmp/mesos >> ## Start the agent >> sudo ./bin/mesos-agent.sh --ip=111.223.45.25 --hostname=111.223.45.25 >> --work_dir=/tmp/mesos --master=111.223.45.25:5050 >> --cgroups_hierarchy=/sys/fs/cgroup --isolation=cgroups/cpu,cgroups/mem >> --cgroups_limit_swap=true >> ## Start the task >> ./src/mesos-execute --master=111.223.45.25:5050 --name="test-single-1" >> --command="sleep 2000" >> ``` >> >> Then query the `/containers` endpoint to get the container id of the task >> >> ``` >> $ curl 'http://111.223.45.25:5051/containers' 2>/dev/null |jq . >> [ >> { >> "container_id": "74fea157-100f-4bf8-b0d0-b65c6e17def1", >> "executor_id": "test-single-1", >> "executor_name": "Command Executor (Task: test-single-1) (Command: sh >> -c 'sleep 2000')", >> "framework_id": "db9f43ce-0361-4c65-b42f-4dbbefa75ff8-0000", >> "source": "test-single-1", >> "statistics": { >> "cpus_limit": 1.1, >> "cpus_system_time_secs": 3.69, >> "cpus_user_time_secs": 3.1, >> "mem_anon_bytes": 9940992, >> "mem_cache_bytes": 8192, >> "mem_critical_pressure_counter": 0, >> "mem_file_bytes": 8192, >> "mem_limit_bytes": 167772160, >> "mem_low_pressure_counter": 0, >> "mem_mapped_file_bytes": 0, >> "mem_medium_pressure_counter": 0, >> "mem_rss_bytes": 9940992, >> "mem_swap_bytes": 0, >> "mem_total_bytes": 10076160, >> "mem_total_memsw_bytes": 10076160, >> "mem_unevictable_bytes": 0, >> "timestamp": 1475686847.54635 >> }, >> "status": { >> "executor_pid": 2775 >> } >> } >> ] >> ``` >> >> As you see above, the container id is `74fea157-100f-4bf8-b0d0-b65c6e17def1`, >> so I >> >> ``` >> $ cat /sys/fs/cgroup/memory/mesos/74fea157-100f-4bf8-b0d0-b65c6e17 >> def1/memory.stat >> ``` >> >> Mesos get the memory statistics from this file for the task. `total_rss` >> would be parsed as the `"mem_rss_bytes"` field. >> >> ``` >> ... >> hierarchical_memory_limit 167772160 >> hierarchical_memsw_limit 167772160 >> total_rss 9940992 >> ... >> ``` >> >> You could check which step above is mismatch with your side and reply >> this email for future discussion, the problem seems to be the >> incorrect configuration or launch flags. >> >> On Wed, Oct 5, 2016 at 8:46 PM, Srikant Kalani < >> srikant.blackr...@gmail.com> wrote: >> >>> What i can see in http output is mem_rss_bytes is not coming on rhel7. >>> >>> Here is the http output : >>> >>> Output for Agent running on rhel7 >>> >>> [{"container\_id":"8062e683\-204c\-40c2\-87ae\-fcc2c3f71b85" >>> ,"executor\_id":"\*\*\*\*\*","executor\_name":"Command Executor (Task: >>> \*\*\*\*\*) (Command: sh \-c '\\*\*\*\*\*\*...')","framewor >>> k\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0000","sou >>> rce":"\*\*\*\*\*","statistics":{"cpus\_limit":1.1,"cpus\_ >>> system\_time\_secs":0.01,"cpus\_user\_time\_secs":0.03," >>> mem\_anon\_bytes":0,"mem\_cache\_bytes":0,"mem\_critical >>> \_pressure\_counter":0,"mem\_file\_bytes":0,"mem\_limit\_ >>> bytes":1107296256,"mem\_low\_pressure\_counter":0,"mem\_ >>> mapped\_file\_bytes":0,"mem\_medium\_pressure\_counter":0," >>> mem\_rss\_bytes":0,"mem\_swap\_bytes":0,"mem\_total\_bytes": >>> 0,"mem\_unevictable\_bytes":0,"timestamp":1475668277.62915}, >>> "status":{"executor\_pid":14454}}] >>> >>> Output for Agent running on Rhel 6 >>> >>> [{"container\_id":"359c0944\-c089\-4d43\-983e\-1f97134fe799" >>> ,"executor\_id":"\*\*\*\*\*","executor\_name":"Command Executor (Task: >>> \*\*\*\*\*) (Command: sh \-c '\*\*\*\*\*\*...')","framework >>> \_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0001","sour >>> ce":"\*\*\*\*\*","statistics":{"cpus\_limit":8.1,"cpus\_ >>> system\_time\_secs":1.92,"cpus\_user\_time\_secs":6.93," >>> mem\_limit\_bytes":1107296256,"mem\_rss\_bytes":2329763840," >>> timestamp":1475670762.73402},"status":{"executor\_pid":31577}}] >>> >>> Attach are UI screenshot : >>> Wa002.jpg is for rhel7 and other one is rhel6. >>> On 5 Oct 2016 4:55 p.m., "haosdent" <haosd...@gmail.com> wrote: >>> >>>> Hi, @Srikant How about the result of >>>> http://${YOUR_AGENT_IP}:5051/containers? >>>> It is wired that you could saw >>>> >>>> ``` >>>> Updated 'memory.limit_in_bytes' to xxx >>>> ``` >>>> >>>> in log as you mentioned, but `limit_in_bytes` is still the initialize >>>> value as you show above. >>>> >>>> On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani < >>>> srikant.blackr...@gmail.com> wrote: >>>> >>>>> Here are the values - >>>>> Memory.limit_in_bytes = 1107296256 >>>>> Memory.soft_limit_in_bytes=1107296256 >>>>> Memory.memsw.limit_in_bytes=9223372036854775807 >>>>> >>>>> I have run the same task on mesos 1.0.1 running on rhel6 and UI then >>>>> shows task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is >>>>> allocated but since we don't have cgroups their so task are not getting >>>>> killed. >>>>> >>>>> On rhel7 UI is showing 0B/1.0G for task memory details. >>>>> >>>>> Any idea is this rhel7 fault or do I need to adjust some >>>>> configurations ? >>>>> On 4 Oct 2016 21:33, "haosdent" <haosd...@gmail.com> wrote: >>>>> >>>>>> Hi, @Srikant >>>>>> >>>>>> Hi, @Srikant >>>>>> >>>>>> Usually, your task should be killed when over cgroup limit. Would you >>>>>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent? >>>>>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`, >>>>>> `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and >>>>>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this >>>>>> email. >>>>>> >>>>>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could >>>>>> find it from the agent log. Or as you said, you only have this one task, >>>>>> so >>>>>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`. >>>>>> >>>>>> Furthermore, would you show the result of http:// >>>>>> ${YOUR_AGENT_IP}:5051/containers? It contains some tasks statistics >>>>>> information as well. >>>>>> >>>>>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani < >>>>>> srikant.blackr...@gmail.com> wrote: >>>>>> >>>>>>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to >>>>>>> 1.0.1. >>>>>>> After upgrade we are not able to see memory used by task which was >>>>>>> fine in previous version. Due to this cgroups are not effective. >>>>>>> >>>>>>> Answers to your questions below : >>>>>>> >>>>>>> There is only 1 task running as a appserver which is consuming >>>>>>> approx 20G mem but this info is not coming in Mesos UI. >>>>>>> Swaps are enabled in agent start command. >>>>>>> These flags are used in agent - cgroups_limits_swap=true >>>>>>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c >>>>>>> group >>>>>>> In agent logs I can see updated memory limit to 33MB for container. >>>>>>> >>>>>>> Web UI shows the total memory allocated to framework but it is not >>>>>>> showing memory used by task.It always shows 0B/33MB. >>>>>>> >>>>>>> Not sure if this is rhel7 issue or mesos 1.0.1. >>>>>>> >>>>>>> Any suggestions ? >>>>>>> On 26 Sep 2016 21:55, "haosdent" <haosd...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, @Srikant May you elaborate >>>>>>>> >>>>>>>> >We have verified using top command that framework was using 2gB >>>>>>>> memory while allocated was just 50 mb. >>>>>>>> >>>>>>>> * How many running tasks in your framework? >>>>>>>> * Do you enable or disable swap in the agents? >>>>>>>> * What's the flags that you launch agents? >>>>>>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to >>>>>>>> ` in the log of agent? >>>>>>>> >>>>>>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani < >>>>>>>> srikant.blackr...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Greg , >>>>>>>>> >>>>>>>>> Previously we were running Mesos 0.27 on Rhel6 and since we >>>>>>>>> already have one c group hierarchy for cpu and memory for our >>>>>>>>> production >>>>>>>>> processes I'd we were not able to merge two c groups hierarchy on >>>>>>>>> rhel6. >>>>>>>>> Slave process was not coming up. >>>>>>>>> Now we have moved to Rhel7 and both mesos master and slave are >>>>>>>>> running on rhel7 with c group implemented.But we are seeing that >>>>>>>>> mesos UI >>>>>>>>> not showing the actual memory used by framework. >>>>>>>>> >>>>>>>>> Any idea why framework usage of cpu and memory is not coming in >>>>>>>>> UI. Due to this OS is still not killing the task which are consuming >>>>>>>>> more >>>>>>>>> memory than the allocated one. >>>>>>>>> We have verified using top command that framework was using 2gB >>>>>>>>> memory while allocated was just 50 mb. >>>>>>>>> >>>>>>>>> Please suggest. >>>>>>>>> On 8 Sep 2016 01:53, "Greg Mann" <g...@mesosphere.io> wrote: >>>>>>>>> >>>>>>>>>> Hi Srikant, >>>>>>>>>> Without using cgroups, it won't be possible to enforce isolation >>>>>>>>>> of cpu/memory on a Linux agent. Could you elaborate a bit on why you >>>>>>>>>> aren't >>>>>>>>>> able to use cgroups currently? Have you tested the existing Mesos >>>>>>>>>> cgroup >>>>>>>>>> isolators in your system? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Greg >>>>>>>>>> >>>>>>>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani < >>>>>>>>>> srikant.blackr...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Guys, >>>>>>>>>>> >>>>>>>>>>> We are running Mesos cluster in our development environment. We >>>>>>>>>>> are seeing the cases where framework uses more amount of resources >>>>>>>>>>> like cpu >>>>>>>>>>> and memory then the initial requested resources. When any new >>>>>>>>>>> framework is >>>>>>>>>>> registered Mesos calculates the resources on the basis of already >>>>>>>>>>> offered >>>>>>>>>>> resources to first framework and it doesn't consider actual >>>>>>>>>>> resources >>>>>>>>>>> utilised by previous framework. >>>>>>>>>>> This is resulting in incorrect calculation of resources. >>>>>>>>>>> Mesos website says that we should Implement c groups but it is >>>>>>>>>>> not possible in our case as we have already implemented c groups in >>>>>>>>>>> other >>>>>>>>>>> projects and due to Linux restrictions we can't merge two c groups >>>>>>>>>>> hierarchy. >>>>>>>>>>> >>>>>>>>>>> Any idea how we can implement resource Isolation in Mesos ? >>>>>>>>>>> >>>>>>>>>>> We are using Mesos 0.27.1 >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Srikant Kalani >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Haosdent Huang >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Haosdent Huang >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > -- Best Regards, Haosdent Huang