For as long as I own the current system (over two years), I see the following kernel warning in dmesg when the system is under very high CPU / memory load for a long period of time (e.g. compiling LLVM with ThinLTO for an LTO/PGO/BOLT build that takes hours to complete): [ +2,258700] BUG: Bad page map in process clang++ pte:34999a025 pmd:5abfdb067 [ +0,000011] page:000000009c141376 refcount:68 mapcount:-191 mapping:000000007d9e887b index:0x47ce pfn:0x34999a [ +0,000005] memcg:ffff90a9145c8000 [ +0,000002] aops:ext4_da_aops [ext4] ino:961599 dentry name:"libLLVM-16git.so" [ +0,000023] flags: 0xa600000000002056(referenced|uptodate|lru|workingset|private|zone=2) [ +0,000005] raw: a600000000002056 ffffda8b1eda1908 ffffda8b1c5bb1c8 ffff90a6ba07b0d8 [ +0,000003] raw: 00000000000047ce ffff90a96c06a000 00000044ffffff40 ffff90a9145c8000 [ +0,000001] page dumped because: bad pte [ +0,000001] addr:00007f71291cf000 vm_flags:80000075 anon_vma:0000000000000000 mapping:ffff90a6ba07b0d8 index:47ce [ +0,000004] file:libLLVM-16git.so fault:filemap_fault mmap:ext4_file_mmap [ext4] read_folio:ext4_read_folio [ext4] [ +0,000032] CPU: 25 PID: 67341 Comm: clang++ Tainted: G B O 6.0.8-3.1-cachyos-bore-lto #1 ae22c0a1d3f4e64c18fc975cbc4082d37abd50c5 [ +0,000005] Hardware name: LENOVO GAMING TF/X99-TF Gaming, BIOS CX99DE26 10/10/2020 [ +0,000001] Call Trace: [ +0,000003] <TASK> [ +0,000001] ? print_bad_pte+0x1e6/0x280 [ +0,000005] ? unmap_page_range+0xaaf/0x12e0 [ +0,000004] ? balance_dirty_pages_ratelimited_flags+0x233/0x10e0 [ +0,000006] ? unmap_vmas+0xd4/0x1a0 [ +0,000003] ? exit_mmap+0x11b/0x440 [ +0,000004] ? __mmput+0x3b/0x180 [ +0,000005] ? do_exit+0x4d6/0x1140 [ +0,000003] ? do_group_exit+0x4b/0xe0 [ +0,000003] ? __x64_sys_exit_group+0xe/0x20 [ +0,000002] ? do_syscall_64+0x2b/0x60 [ +0,000005] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ +0,000006] </TASK> I normally don't see that when gaming or lighter workloads, such as a LLVM-build with FullLTO or a normal Kernel build. My Kernel config, and patches that I use, can be found in my Github repo under: https://github.com/ms178/archpkgbuilds/tree/main/packages/linux-cachyos-bore inxi -f output: System: Host: klx99 Kernel: 6.0.8-3.1-cachyos-bore-lto arch: x86_64 bits: 64 Desktop: KDE Plasma v: 5.26.80 Distro: EndeavourOS Machine: Type: Desktop System: LENOVO product: GAMING TF v: N/A serial: <superuser required> Mobo: Lenovo model: X99-TF Gaming v: G368J V1.1 serial: <superuser required> UEFI: American Megatrends v: CX99DE26 date: 10/10/2020 CPU: Info: 18-core model: Intel Xeon E5-2696 v3 bits: 64 type: MT MCP cache: L2: 4.5 MiB Speed (MHz): avg: 3000 min/max: 1200/2301 cores: 1: 3000 2: 3000 3: 3000 4: 3000 5: 3000 6: 3000 7: 3000 8: 3000 9: 3000 10: 3000 11: 3000 12: 3000 13: 3000 14: 3000 15: 3000 16: 3000 17: 3000 18: 3000 19: 3000 20: 3000 21: 3000 22: 3000 23: 3000 24: 3000 25: 3000 26: 3000 27: 3000 28: 3000 29: 3000 30: 3000 31: 3000 32: 3000 33: 3000 34: 3000 35: 3000 36: 3000 Graphics: Device-1: AMD Vega 10 XL/XT [Radeon RX 56/64] driver: amdgpu v: kernel Display: x11 server: X.Org v: 21.1.99 with: Xwayland v: 22.1.5 driver: X: loaded: amdgpu unloaded: modesetting dri: radeonsi gpu: amdgpu resolution: 2560x1440 API: OpenGL v: 4.6 Mesa 23.0.0-devel (git-ae76bba34a) renderer: AMD Radeon RX Vega (vega10 LLVM 16.0.0 DRM 3.48 6.0.8-3.1-cachyos-bore-lto) Audio: Device-1: Intel C610/X99 series HD Audio driver: snd_hda_intel Device-2: AMD Vega 10 HDMI Audio [Radeon 56/64] driver: snd_hda_intel Sound API: ALSA v: k6.0.8-3.1-cachyos-bore-lto running: yes Sound Server-1: PipeWire v: 0.3.60 running: yes Network: Device-1: Intel I350 Gigabit Network driver: igb IF: ens1f0 state: down mac: a0:36:9f:a3:72:44 Device-2: Intel I350 Gigabit Network driver: igb IF: ens1f1 state: up speed: 1000 Mbps duplex: full mac: a0:36:9f:09:3f:67 Device-3: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 IF: enp7s0 state: down mac: 00:e0:4c:68:02:1c Drives: Local Storage: total: 1.39 TiB used: 355.29 GiB (25.0%) ID-1: /dev/nvme0n1 vendor: Silicon Power model: SPCC M.2 PCIe SSD size: 953.87 GiB ID-2: /dev/sda vendor: Samsung model: SSD 860 EVO 500GB size: 465.76 GiB Partition: ID-1: / size: 937.53 GiB used: 355.29 GiB (37.9%) fs: ext4 dev: /dev/nvme0n1p2 ID-2: /boot/efi size: 299.4 MiB used: 316 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 Swap: ID-1: swap-1 type: zram size: 4 GiB used: 17.5 MiB (0.4%) dev: /dev/zram0 Sensors: System Temperatures: cpu: 50.0 C mobo: N/A gpu: amdgpu temp: 43.0 C Fan Speeds (RPM): N/A gpu: amdgpu fan: 66 Info: Processes: 572 Uptime: 2h 26m Memory: 62.66 GiB used: 6.27 GiB (10.0%) Shell: Zsh inxi: 3.3.23 I have seen this issue with several different CPUs in the same system, but I cannot rule out a hardware quirk.
Have you ever run memtest86? If not, please do and give it at least 24 hours: https://www.memtest86.com/download.htm
Not yet, but it is ECC memory and I haven't seen any issues when putting the system under stress in Windows (as the event logger would show any memory issues).
FWIW, the same system survived a 3h 13 Min long LLVM-BOLT build session with sustained full CPU and high memory load without showing this error this week.