Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync

2016-07-18 Thread Matthias Dahl
Hello again... So I spent all weekend doing further tests, since this issue is really bugging me for obvious reasons. I thought it would be beneficial if I created a bug report that summarized and centralized everything in one place rather than having everything spread across several lists and p

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-15 Thread Tetsuo Handa
On 2016/07/13 22:47, Michal Hocko wrote: > On Wed 13-07-16 15:18:11, Matthias Dahl wrote: >> I tried to figure this out myself but >> couldn't find anything -- what does the number "-3" state? It is the >> position in some chain or has it a different meaning? > > $ git grep "kmem_cache_create.*bio

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync

2016-07-15 Thread Matthias Dahl
Hello... I am rather persistent (stubborn?) when it comes to tracking down bugs, if somehow possible... and it seems it paid off... somewhat. ;-) So I did quite a lot more further tests and came up with something very interesting: As long as the RAID is in sync (as-in: sync_action=idle), I can n

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Matthias Dahl
Hello Ondrej... On 2016-07-13 18:24, Ondrej Kozina wrote: One step after another. Sorry, it was not meant to be rude or anything... more frustration since I cannot be of more help and I really would like to jump in head-first and help fixing it... but lack the necessary insight into the kerne

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Ondrej Kozina
On 07/13/2016 05:32 PM, Matthias Dahl wrote: No matter what, I have no clue how to further diagnose this issue. And given that I already had unsolvable issues with dm-crypt a couple of months ago with my old machine where the system simply hang itself or went OOM when the swap was encrypted and

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Matthias Dahl
Hello... On 2016-07-13 15:47, Michal Hocko wrote: This is getting out of my area of expertise so I am not sure I can help you much more, I am afraid. That's okay. Thank you so much for investing the time. For what it is worth, I did some further tests and here is what I came up with: If I c

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Michal Hocko
On Wed 13-07-16 15:18:11, Matthias Dahl wrote: > Hello Michal, > > many thanks for all your time and help on this issue. It is very much > appreciated and I hope we can track this down somehow. > > On 2016-07-13 14:18, Michal Hocko wrote: > > > So it seems we are accumulating bios and 256B objec

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Matthias Dahl
Hello Michal, many thanks for all your time and help on this issue. It is very much appreciated and I hope we can track this down somehow. On 2016-07-13 14:18, Michal Hocko wrote: So it seems we are accumulating bios and 256B objects. Buffer heads as well but so much. Having over 4G worth of b

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Michal Hocko
On Wed 13-07-16 13:21:26, Michal Hocko wrote: > On Tue 12-07-16 16:56:32, Matthias Dahl wrote: [...] > > If that support is baked into the Fedora provided kernel that is. If > > you could give me a few hints or pointers, how to properly do a allocator > > trace point and get some decent data out of

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Michal Hocko
On Tue 12-07-16 16:56:32, Matthias Dahl wrote: > Hello Michal... > > On 2016-07-12 16:07, Michal Hocko wrote: > > > /proc/slabinfo could at least point on who is eating that memory. > > Thanks. I have made another test (and thus again put the RAID10 out of > sync for the 100th time, sigh) and ma

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Matthias Dahl
Hello Michal... On 2016-07-12 16:07, Michal Hocko wrote: /proc/slabinfo could at least point on who is eating that memory. Thanks. I have made another test (and thus again put the RAID10 out of sync for the 100th time, sigh) and made regular snapshots of slabinfo which I have attached to this

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 14:42:12, Matthias Dahl wrote: > Hello Michal... > > On 2016-07-12 13:49, Michal Hocko wrote: > > > I am not a storage expert (not even mention dm-crypt). But what those > > counters say is that the IO completion doesn't trigger so the > > PageWriteback flag is still set. Such a p

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 13:49:20, Michal Hocko wrote: > On Tue 12-07-16 13:28:12, Matthias Dahl wrote: > > Hello Michal... > > > > On 2016-07-12 11:50, Michal Hocko wrote: > > > > > This smells like file pages are stuck in the writeback somewhere and the > > > anon memory is not reclaimable because you d

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 13:28:12, Matthias Dahl wrote: > Hello Michal... > > On 2016-07-12 11:50, Michal Hocko wrote: > > > This smells like file pages are stuck in the writeback somewhere and the > > anon memory is not reclaimable because you do not have any swap device. > > Not having a swap device sh

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Matthias Dahl
Hello Michal... On 2016-07-12 13:49, Michal Hocko wrote: I am not a storage expert (not even mention dm-crypt). But what those counters say is that the IO completion doesn't trigger so the PageWriteback flag is still set. Such a page is not reclaimable obviously. So I would check the IO deliver

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 10:27:37, Matthias Dahl wrote: > Hello, > > I posted this issue already on linux-mm, linux-kernel and dm-devel a > few days ago and after further investigation it seems like that this > issue is somehow related to the fact that I am using an Intel Rapid > Storage RAID10, so I am su

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Matthias Dahl
Hello Michal... On 2016-07-12 11:50, Michal Hocko wrote: This smells like file pages are stuck in the writeback somewhere and the anon memory is not reclaimable because you do not have any swap device. Not having a swap device shouldn't be a problem -- and in this case, it would cause even m