It seem we are hitting similar bug: [Thu Jul 20 10:38:14 2017] /home/kernel/COD/linux/mm/pgtable-generic.c:33: bad pmd ffff97c4e4ab6800(00000006dca009e2) [Thu Jul 20 10:50:17 2017] BUG: Bad rss-counter state mm:ffff97cb4d11a6c0 idx:1 val:512 [Thu Jul 20 10:50:17 2017] BUG: non-zero nr_ptes on freeing mm: 1 [Wed Jul 26 09:21:00 2017] /home/kernel/COD/linux/mm/pgtable-generic.c:33: bad pmd ffff97e4b2a7f008(000000237ba009e2) [Wed Jul 26 09:25:51 2017] BUG: Bad rss-counter state mm:ffff97cb537e1f00 idx:1 val:512 [Wed Jul 26 09:25:51 2017] BUG: non-zero nr_ptes on freeing mm: 1
I've tried default LTS kernel and recent release - both versions experience same issue: uname -a 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux uname -a 4.10.0-041000-lowlatency #201702191831 SMP PREEMPT Sun Feb 19 23:36:31 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux lsb_release -rd Description: Ubuntu 16.04.2 LTS Release: 16.04 This servers are elasticsearch datanodes doing heavy I/O with 264G of RAM mostly used by filecache with lucene data. I've tried to upgrade kernel tried different versions of OpenJDK and OracleJDK. Still issue remains - elasticsearch process keep crushing in a random order on a different servers with the similar hardware configuration. I will attach apport crash report(without base64 encoded coredump) ** Attachment added: "apport crashdump" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1644056/+attachment/4921585/+files/_usr_lib_jvm_java-8-openjdk-amd64_jre_bin_java.113.crash -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1644056 Title: kernel BUG at /build/linux-lts-xenial-gUF4JR/linux-lts- xenial-4.4.0/mm/huge_memory.c:1931! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1644056/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs