Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-27 Thread Nils Holland
On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > Hi, > could you try to run with the following patch on top of the previous > one? I do not think it will make a large change in your workload but > I think we need something like that so some testing under which is known > to make a hi

Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-27 Thread Nils Holland
On Tue, Dec 27, 2016 at 09:08:38AM +0100, Michal Hocko wrote: > On Mon 26-12-16 19:57:03, Nils Holland wrote: > > On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote: > > > On Fri 23-12-16 23:26:00, Nils Holland wrote: > > > > On Fri, Dec 23, 2016 at 03:47:

Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-26 Thread Nils Holland
On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote: > On Fri 23-12-16 23:26:00, Nils Holland wrote: > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote: > > > > > > Nils, even though this is still highly experimental, could you give it a > &g

Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-23 Thread Nils Holland
On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote: > > Nils, even though this is still highly experimental, could you give it a > try please? Yes, no problem! So I kept the very first patch you sent but had to revert the latest version of the debugging patch (the one in which you added

Re: OOM: Better, but still there on

2016-12-23 Thread Nils Holland
On Fri, Dec 23, 2016 at 11:51:57AM +0100, Michal Hocko wrote: > TL;DR > drop the last patch, check whether memory cgroup is enabled and retest > with cgroup_disable=memory to see whether this is memcg related and if > it is _not_ then try to test with the patch below Right, it seems we might be lo

Re: OOM: Better, but still there on

2016-12-22 Thread Nils Holland
On Thu, Dec 22, 2016 at 08:17:19PM +0100, Michal Hocko wrote: > TL;DR I still do not see what is going on here and it still smells like > multiple issues. Please apply the patch below on _top_ of what you had. I've run the usual procedure again with the new patch on top and the log is now up at:

Re: OOM: Better, but still there on

2016-12-22 Thread Nils Holland
On Thu, Dec 22, 2016 at 11:27:25AM +0100, Michal Hocko wrote: > On Thu 22-12-16 11:10:29, Nils Holland wrote: > > > However, the log comes from machine #2 again today, as I'm > > unfortunately forced to try this via VPN from work to home today, so I > > have exactly o

Re: OOM: Better, but still there on

2016-12-22 Thread Nils Holland
On Wed, Dec 21, 2016 at 08:36:59AM +0100, Michal Hocko wrote: > TL;DR > there is another version of the debugging patch. Just revert the > previous one and apply this one instead. It's still not clear what > is going on but I suspect either some misaccounting or unexpeted > pages on the LRU lists.

Re: OOM: Better, but still there on

2016-12-19 Thread Nils Holland
On Mon, Dec 19, 2016 at 02:45:34PM +0100, Michal Hocko wrote: > Unfortunatelly shrink_active_list doesn't have any tracepoint so we do > not know whether we managed to rotate those pages. If they are referenced > quickly enough we might just keep refaulting them... Could you try to apply > the fol

Re: OOM: Better, but still there on

2016-12-17 Thread Nils Holland
On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote: > On 2016/12/17 21:59, Nils Holland wrote: > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > >> mount -t tracefs none /debug/trace > >> echo 1 > /debug/trace/events/vmscan/enable &g

Re: OOM: Better, but still there on

2016-12-17 Thread Nils Holland
On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote: > On 2016/12/17 21:59, Nils Holland wrote: > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > >> mount -t tracefs none /debug/trace > >> echo 1 > /debug/trace/events/vmscan/enable &g

Re: OOM: Better, but still there on

2016-12-17 Thread Nils Holland
On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > On Fri 16-12-16 19:47:00, Nils Holland wrote: > > > > Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages > > freed, 10219 pages still pinned. > > Dec 16 18:56:29 boerne.fritz.box k

Re: OOM: Better, but still there on

2016-12-16 Thread Nils Holland
On Fri, Dec 16, 2016 at 04:58:06PM +0100, Michal Hocko wrote: > On Fri 16-12-16 08:39:41, Michal Hocko wrote: > [...] > > That being said, the OOM killer invocation is clearly pointless and > > pre-mature. We normally do not invoke it normally for GFP_NOFS requests > > exactly for these reasons. Bu