Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-03 Thread David Hildenbrand
> For performance reasons during system updates/reboots we do not erase > memory content. The memory content is erased only on power cycle, > which we do not do in production. > > Once we hot-remove the memory, we convert it back into DAXFS PMEM > device, format it into EXT4, mount it as DAX file

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Michal Hocko
On Wed 02-09-20 08:53:49, Pavel Tatashin wrote: > On Wed, Sep 2, 2020 at 7:32 AM Michal Hocko wrote: > > > > On Wed 02-09-20 11:53:00, Vlastimil Babka wrote: > > > >> > > Thread #2: ccs killer kthread > > > >> > >css_killed_work_fn > > > >> > > cgroup_mutex <- Grab this Mutex > > > >> >

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Pavel Tatashin
> > This is how we are using it at Microsoft: there is a very large > > number of small memory machines (8G each) with low downtime > > requirements (reboot must be under a second). There is also a large > > state ~2G of memory that we need to transfer during reboot, otherwise > > it is very

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Michal Hocko
On Wed 02-09-20 08:51:06, Pavel Tatashin wrote: > > > > Thread #1: memory hot-remove systemd service > > > > Loops indefinitely, because if there is something still to be migrated > > > > this loop never terminates. However, this loop can be terminated via > > > > signal from systemd after

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Michal Hocko
On Wed 02-09-20 08:42:13, Pavel Tatashin wrote: > > > Am 02.09.2020 um 11:53 schrieb Vlastimil Babka : > > > > > > On 8/28/20 6:47 PM, Pavel Tatashin wrote: > > >> There appears to be another problem that is related to the > > >> cgroup_mutex -> mem_hotplug_lock deadlock described above. > > >> >

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Pavel Tatashin
On Wed, Sep 2, 2020 at 7:32 AM Michal Hocko wrote: > > On Wed 02-09-20 11:53:00, Vlastimil Babka wrote: > > >> > > Thread #2: ccs killer kthread > > >> > >css_killed_work_fn > > >> > > cgroup_mutex <- Grab this Mutex > > >> > > mem_cgroup_css_offline > > >> > >

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Pavel Tatashin
> > > Thread #1: memory hot-remove systemd service > > > Loops indefinitely, because if there is something still to be migrated > > > this loop never terminates. However, this loop can be terminated via > > > signal from systemd after timeout. > > > __offline_pages() > > > do { > > >

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Pavel Tatashin
> > Am 02.09.2020 um 11:53 schrieb Vlastimil Babka : > > > > On 8/28/20 6:47 PM, Pavel Tatashin wrote: > >> There appears to be another problem that is related to the > >> cgroup_mutex -> mem_hotplug_lock deadlock described above. > >> > >> In the original deadlock that I described, the

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Pavel Tatashin
> I am on an old codebase that already has the fix that you are proposing, > so I might be seeing someother issue which I will debug further. > > So looks like the loop in __offline_pages() had a call to > drain_all_pages() before it was removed by > > c52e75935f8d: mm: remove extra drain pages on

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Michal Hocko
On Wed 02-09-20 11:53:00, Vlastimil Babka wrote: > >> > > Thread #2: ccs killer kthread > >> > >css_killed_work_fn > >> > > cgroup_mutex <- Grab this Mutex > >> > > mem_cgroup_css_offline > >> > >memcg_offline_kmem.part > >> > > memcg_deactivate_kmem_caches > >> >

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Michal Hocko
On Wed 02-09-20 11:53:00, Vlastimil Babka wrote: > On 8/28/20 6:47 PM, Pavel Tatashin wrote: > > There appears to be another problem that is related to the > > cgroup_mutex -> mem_hotplug_lock deadlock described above. > > > > In the original deadlock that I described, the workaround is to > >

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread David Hildenbrand
> Am 02.09.2020 um 11:53 schrieb Vlastimil Babka : > > On 8/28/20 6:47 PM, Pavel Tatashin wrote: >> There appears to be another problem that is related to the >> cgroup_mutex -> mem_hotplug_lock deadlock described above. >> >> In the original deadlock that I described, the workaround is to >>

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Vlastimil Babka
On 8/28/20 6:47 PM, Pavel Tatashin wrote: > There appears to be another problem that is related to the > cgroup_mutex -> mem_hotplug_lock deadlock described above. > > In the original deadlock that I described, the workaround is to > replace crash dump from piping to Linux traditional save to

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-02 Thread Bharata B Rao
On Tue, Sep 01, 2020 at 08:52:05AM -0400, Pavel Tatashin wrote: > On Tue, Sep 1, 2020 at 1:28 AM Bharata B Rao wrote: > > > > On Fri, Aug 28, 2020 at 12:47:03PM -0400, Pavel Tatashin wrote: > > > There appears to be another problem that is related to the > > > cgroup_mutex -> mem_hotplug_lock

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-09-01 Thread Pavel Tatashin
On Tue, Sep 1, 2020 at 1:28 AM Bharata B Rao wrote: > > On Fri, Aug 28, 2020 at 12:47:03PM -0400, Pavel Tatashin wrote: > > There appears to be another problem that is related to the > > cgroup_mutex -> mem_hotplug_lock deadlock described above. > > > > In the original deadlock that I described,

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-08-31 Thread Bharata B Rao
On Fri, Aug 28, 2020 at 12:47:03PM -0400, Pavel Tatashin wrote: > There appears to be another problem that is related to the > cgroup_mutex -> mem_hotplug_lock deadlock described above. > > In the original deadlock that I described, the workaround is to > replace crash dump from piping to Linux

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-08-28 Thread Pavel Tatashin
There appears to be another problem that is related to the cgroup_mutex -> mem_hotplug_lock deadlock described above. In the original deadlock that I described, the workaround is to replace crash dump from piping to Linux traditional save to files method. However, after trying this workaround, I

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-08-12 Thread Pavel Tatashin
On Wed, Aug 12, 2020 at 8:04 PM Roman Gushchin wrote: > > On Wed, Aug 12, 2020 at 07:16:08PM -0400, Pavel Tatashin wrote: > > Guys, > > > > There is a convoluted deadlock that I just root caused, and that is > > fixed by this work (at least based on my code inspection it appears to > > be fixed);

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-08-12 Thread Roman Gushchin
On Wed, Aug 12, 2020 at 07:16:08PM -0400, Pavel Tatashin wrote: > Guys, > > There is a convoluted deadlock that I just root caused, and that is > fixed by this work (at least based on my code inspection it appears to > be fixed); but the deadlock exists in older and stable kernels, and I > am not

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-08-12 Thread Pavel Tatashin
BTW, I replied to a wrong version of this work. I intended to reply to version 7: https://lore.kernel.org/lkml/20200623174037.3951353-1-g...@fb.com/ Nevertheless, the problem is the same. Thank you, Pasha On Wed, Aug 12, 2020 at 7:16 PM Pavel Tatashin wrote: > > Guys, > > There is a convoluted

Re: [PATCH v2 00/28] The new cgroup slab memory controller

2020-08-12 Thread Pavel Tatashin
Guys, There is a convoluted deadlock that I just root caused, and that is fixed by this work (at least based on my code inspection it appears to be fixed); but the deadlock exists in older and stable kernels, and I am not sure whether to create a separate patch for it, or backport this whole