Aug 02 13:37:55 ip-172-19-15-116 kernel: ../source/mm/pgtable-generic.c:33: bad pmd ffff8bb32d6d6200(0000000fcda001e0) Aug 02 13:43:10 ip-172-19-15-116 kernel: BUG: Bad rss-counter state mm:ffff8bbe9ba3dc00 idx:1 val:512 Aug 02 13:43:10 ip-172-19-15-116 kernel: BUG: non-zero nr_ptes on freeing mm: 1 The above bug shows up regularly on some 4 numa zone virtualized machines (x1.32xlarge in AWS) running CoreOS. These machines have 4 sockets with approximately 2TiB RAM. The workload is a combination of Spark and Druid which are in memory and cpu cgroup isolation. We haven't been able to get a specific test which can reproduce this in an artificial environment, but we see it with alarming regularity under production workloads. We've tried numerous sysctl tunings around numa configs but cannot seem to avoid this bug. When using half-sized machines (x1.16xlarge) with the same work loads, we have not seen this show up.
$ cat /proc/buddyinfo Node 0, zone DMA 1 1 1 0 2 1 1 0 1 1 3 Node 0, zone DMA32 71 56 53 55 54 24 10 6 6 3 472 Node 0, zone Normal 5721269 4547407 783814 47549 3829 818 200 95 10 0 0 Node 1, zone Normal 2793102 4831801 2898624 486040 78417 25929 14492 7374 2941 0 0 Node 2, zone Normal 3004007 4822788 1463551 143127 27984 12282 4765 1915 48 0 0 Node 3, zone Normal 6013297 3858190 2328605 228874 22890 7524 3784 2369 1164 1 0 $ cat /proc/meminfo MemTotal: 2014741740 kB MemFree: 392410636 kB MemAvailable: 1334517536 kB Buffers: 33108 kB Cached: 941241888 kB SwapCached: 0 kB Active: 1436430984 kB Inactive: 164888628 kB Active(anon): 660049028 kB Inactive(anon): 1700 kB Active(file): 776381956 kB Inactive(file): 164886928 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 2773500 kB Writeback: 0 kB AnonPages: 660029896 kB Mapped: 478195376 kB Shmem: 1880 kB Slab: 13393720 kB SReclaimable: 11277508 kB SUnreclaim: 2116212 kB KernelStack: 298288 kB PageTables: 3131392 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1007370868 kB Committed_AS: 754887872 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 182294528 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 280576 kB DirectMap2M: 12433408 kB DirectMap1G: 2036334592 kB
That's a pretty old kernel. Have you googled for `linux "Bad rss-counter state"' and checked that your kernel has the various fixes which are mentioned there?
We tested 4.11.6-coreos-r1. I thought we had encountered this error on that version as well, but I don't seem to have an explicit log for it. We have not tested any other patches to prevent this error.