Bug 216715
Summary: | kernel 6.1-rc5 + KASAN_OUTLINE fails to boot at very early stage when DEBUG_PAGEALLOC_ENABLE_DEFAULT is enabled (PowerMac G4 3,6) | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Erhard F. (erhard_f) |
Component: | PPC-32 | Assignee: | platform_ppc-32 |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | christophe.leroy |
Priority: | P1 | ||
Hardware: | PPC-32 | ||
OS: | Linux | ||
Kernel Version: | 6.1-rc5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel .config (6.1-rc5, PowerMac G4 DP)
kernel .config (6.1-rc6, PowerMac G4 DP) kernel dmesg (6.1-rc6, PowerMac G4 DP) - succesful boot kernel dmesg (6.1-rc6, PowerMac G4 DP) - BUG: KASAN: slab-out-of-bounds 303256: kernel dmesg (6.1-rc6, PowerMac G4 DP) - BUG: KASAN: slab-out-of-bounds, 2nd time dmesg (6.4-rc4, PowerMac G4 DP) - succesful boot kernel .config (6.4-rc3, PowerMac G4 DP) |
I have no problem booting 6.1-rc5 on QEMU with your config. What happens if you remove CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and instead pass debug_pagealloc=on as parameter to the kernel ? Created attachment 303254 [details]
kernel .config (6.1-rc6, PowerMac G4 DP)
Somehow I had a hard time reproducing this...
Finally it turned out that this is not 100% deterministic. I'll attach my new -rc6 .config which provokes the early boot failure best.
The same kernel can run into this early boot issue, or it can fail boot later with "BUG: KASAN: slab-out-of-bounds in filemap_map_pages+0x2c8/0x5d0" or it can boot up just fine.
I had the impression the early boot issue is provoked easier when CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y is set or debug_pagealloc=on is set as boot parameter as you suggested. Also it seems to be provoked easier when CONFIG_NETCONSOLE=y is used and properly configured (perhaps also linked to my yet-to-be-posted netconsole bug?).
Created attachment 303255 [details]
kernel dmesg (6.1-rc6, PowerMac G4 DP) - succesful boot
Created attachment 303256 [details] kernel dmesg (6.1-rc6, PowerMac G4 DP) - BUG: KASAN: slab-out-of-bounds Rarely but sometimes boot fails due to a KASAN error like this: [...] ================================================================== BUG: KASAN: slab-out-of-bounds in filemap_map_pages+0x2c8/0x5d0 Read of size 4 at addr c336c170 by task systemd/112 CPU: 1 PID: 112 Comm: systemd Tainted: G TN 6.1.0-rc6-PMacG4 #30 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [f1973c60] [c0d3b790] dump_stack_lvl+0x60/0xa4 (unreliable) [f1973c80] [c033d124] print_report+0x154/0x4f4 [f1973cd0] [c033ce40] kasan_report+0xe8/0x218 [f1973d40] [c0284234] filemap_map_pages+0x2c8/0x5d0 [f1973e00] [c02dfb54] handle_mm_fault+0xb58/0x10bc [f1973ed0] [c002b994] ___do_page_fault+0x228/0x814 [f1973f10] [c002c1d4] do_page_fault+0x28/0x6c [f1973f30] [c00044a8] InstructionAccess_virt+0x114/0x118 --- interrupt: 400 at 0xa7452df4 NIP: a7452df4 LR: a7452974 CTR: 00000000 REGS: f1973f40 TRAP: 0400 Tainted: G TN (6.1.0-rc6-PMacG4) MSR: 4000d032 <EE,PR,ME,IR,DR,RI> CR: 480042e0 XER: 00000000 GPR00: 00000000 afdd8cf0 a7b4a5a0 00000000 a7452974 480042e0 00000000 a7452df4 GPR08: 0000d032 400c13f8 400c1509 400c161a 400c12e7 00b2fc1c 00000000 00000000 GPR16: 00000000 00000000 afdd8f20 afdd8e98 00000000 00000001 00000000 00000000 GPR24: 00000000 02084ffc 00000000 afdd8f1c 02084b80 02084810 a757fff4 00000001 NIP [a7452df4] 0xa7452df4 LR [a7452974] 0xa7452974 --- interrupt: 400 Allocated by task 1: kasan_set_track+0x44/0x94 __kasan_slab_alloc+0xa0/0xe8 kmem_cache_alloc+0x224/0x684 __kernfs_new_node+0xe8/0x354 kernfs_new_node+0x84/0xfc __kernfs_create_file+0x50/0x204 sysfs_add_file_mode_ns+0xf4/0x1f0 internal_create_group+0x1f0/0x620 sysfs_slab_add+0x23c/0x2dc __kmem_cache_create+0x1bc/0x584 kmem_cache_create_usercopy+0x250/0x39c btrfs_init_cachep+0x48/0x1e0 init_btrfs_fs+0x50/0x2b0 do_one_initcall+0xc0/0x34c kernel_init_freeable+0x2c0/0x400 kernel_init+0x28/0x178 ret_from_kernel_thread+0x5c/0x64 The buggy address belongs to the object at c336c118 which belongs to the cache kernfs_node_cache of size 88 The buggy address is located 0 bytes to the right of 88-byte region [c336c118, c336c170) The buggy address belongs to the physical page: page:eee4bb30 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x336c flags: 0x200(slab|zone=0) raw: 00000200 00000100 00000122 c1852720 00000000 001e003c ffffffff 00000001 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: c336c000: fc 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc c336c080: fc fc 00 00 00 00 00 00 00 00 00 00 00 fc fc fc >c336c100: fc fc fc 00 00 00 00 00 00 00 00 00 00 00 fc fc ^ c336c180: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 fc c336c200: fc fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Disabling lock debugging due to kernel taint get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 systemd[1]: Mounting FUSE Control File System... get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 ** 257 printk messages dropped ** get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 get_swap_device: Bad swap file entry 00000010 systemd-journald[106]: /dev/kmsg buffer overrun, some messages lost. get_swap_device: Bad swap file entry 00000010 [...] After the 1st time this happened I checked swap to be sure it's not a disk errors fault. But the swap partition showed no sign of error: # swapoff /dev/sdb6 && mkswap -f -L swap --verbose -c /dev/sdb6 0 beschädigte Seiten mkswap: /dev/sdb6: Warnung: Alte swap-Signatur wird gelöscht. Auslagerungsbereich Version 1 wird angelegt, Größe = 8 GiB (8589930496 Bytes) LABEL=swap, UUID=aa150610-5987-492c-b9db-d3ea3f34a6be Created attachment 303257 [details] 303256: kernel dmesg (6.1-rc6, PowerMac G4 DP) - BUG: KASAN: slab-out-of-bounds, 2nd time Here a slightly different kernel is used with SMP off. KASAN hit happened instantly after trying to build glibc: [...] BUG: KASAN: slab-out-of-bounds in set_pte_at+0x9c/0x16c Read of size 4 at addr c322519c by task emerge/296 CPU: 0 PID: 296 Comm: emerge Tainted: G TN 6.1.0-rc6-PMacG4s #31 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [f2aeb760] [c0ca8ecc] dump_stack_lvl+0x34/0x74 (unreliable) [f2aeb780] [c02d2800] print_report+0x154/0x4f4 [f2aeb7d0] [c02d2508] kasan_report+0xec/0x230 [f2aeb830] [c002657c] set_pte_at+0x9c/0x16c [f2aeb860] [c02744a8] copy_page_range+0x4ec/0xee0 [f2aeb990] [c005608c] dup_mm+0x70c/0x8ac [f2aebab0] [c0057eec] copy_process+0x19b0/0x2c10 [f2aebba0] [c0059424] kernel_clone+0xd0/0x4c4 [f2aebc40] [c0059efc] sys_clone+0xfc/0x154 [f2aebd10] [c0017c60] system_call_exception+0x104/0x1ac [f2aebf30] [c001f1ac] ret_from_syscall+0x0/0x2c --- interrupt: c00 at 0xa7572df4 NIP: a7572df4 LR: a7572974 CTR: a7747ee8 REGS: f2aebf40 TRAP: 0c00 Tainted: G TN (6.1.0-rc6-PMacG4s) MSR: 0200f932 <VEC,EE,PR,FP,ME,IR,DR,RI> CR: 48882244 XER: 00000000 GPR00: 00000078 af879140 a7c2d5a0 01200011 00000000 00000000 00000000 a7c26088 GPR08: 00000000 00000002 a6e31870 a781c724 48882242 008bfff4 00000000 00a0243c GPR16: 00000000 016b591e 016b5910 af879304 31afa123 00000001 00000000 a7c284fc GPR24: 00000000 00000000 a58ddc88 a7ae3128 a7bc1f08 a6ee0118 a769fff4 00000001 NIP [a7572df4] 0xa7572df4 LR [a7572974] 0xa7572974 --- interrupt: c00 Allocated by task 1: kasan_set_track+0x44/0x94 __kasan_slab_alloc+0xa0/0xe8 kmem_cache_alloc+0x1e8/0x664 __kernfs_new_node+0xe8/0x354 kernfs_new_node+0x84/0xfc __kernfs_create_file+0x50/0x204 sysfs_add_file_mode_ns+0xf4/0x1f0 internal_create_group+0x1f0/0x620 sysfs_slab_add+0x23c/0x2dc __kmem_cache_create+0x14c/0x510 kmem_cache_create_usercopy+0x250/0x39c btrfs_init_cachep+0x48/0x1e8 init_btrfs_fs+0x50/0x2b0 do_one_initcall+0xc0/0x34c kernel_init_freeable+0x2a0/0x3e0 kernel_init+0x28/0x174 ret_from_kernel_thread+0x5c/0x64 The buggy address belongs to the object at c32251a0 which belongs to the cache kernfs_node_cache of size 88 The buggy address is located 4 bytes to the left of 88-byte region [c32251a0, c32251f8) The buggy address belongs to the physical page: page:eee50d34 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x3225 flags: 0x200(slab|zone=0) raw: 00000200 00000100 00000122 c1843d20 00000000 001e003c ffffffff 00000001 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: c3225080: fc fc 00 00 00 00 00 00 00 00 00 00 00 fc fc fc c3225100: fc fc fc 00 00 00 00 00 00 00 00 00 00 00 fc fc >c3225180: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 fc ^ c3225200: fc fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 c3225280: fc fc fc fc fc fc 00 00 00 00 00 00 00 00 00 00 ================================================================== Disabling lock debugging due to kernel taint _swap_info_get: Bad swap file entry 10005281 BUG: Bad page map in process emerge pte:00528124 pmd:03225000 addr:af85a000 vm_flags:00100173 anon_vma:c8fb3bd8 mapping:00000000 index:affdf file:(null) fault:0x0 mmap:0x0 read_folio:0x0 CPU: 0 PID: 300 Comm: emerge Tainted: G B TN 6.1.0-rc6-PMacG4s #31 Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [f2f838c0] [c0ca8ecc] dump_stack_lvl+0x34/0x74 (unreliable) [f2f838e0] [c02717e8] print_bad_pte+0x2e8/0x364 [f2f83970] [c027579c] unmap_page_range+0x900/0xa30 [f2f83a30] [c027607c] unmap_vmas+0x1d8/0x2cc [f2f83b30] [c0283a68] exit_mmap+0x154/0x2f0 [f2f83c50] [c00542e8] mmput+0x98/0x244 [f2f83c80] [c005f784] do_exit+0x434/0xdc0 [f2f83d00] [c0060318] do_group_exit+0x64/0x100 [f2f83d30] [c00603e4] __wake_up_parent+0x0/0x4c [f2f83d50] [c0017c60] system_call_exception+0x104/0x1ac [f2f83f30] [c001f1ac] ret_from_syscall+0x0/0x2c --- interrupt: c00 at 0xa7572ec0 [...] Would be nice to give it a new try with KCSAN enabled. To get KCSAN on powerpc/32, apply following series: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=354731 Created attachment 304310 [details]
dmesg (6.4-rc4, PowerMac G4 DP) - succesful boot
KASAN and KCSAN can't be enabled at the same time as it seems. If I select KASAN, KCSAN is automatically disabled.
But in this case that does not seem to be a problem. Kernel 6.4-rc4 boots fine with KASAN_OUTLINE and DEBUG_PAGEALLOC_ENABLE_DEFAULT enabled. Tested it about a dozen times to be sure.
I'll close here when kernel 6.4.0 is released and this issue is still gone for good.
Created attachment 304311 [details]
kernel .config (6.4-rc3, PowerMac G4 DP)
Cannot reproduce on kernels v6.8.x and current v6.9-rc4, outline + inline KASAN work as intended. So closing here. |
Created attachment 303245 [details] kernel .config (6.1-rc5, PowerMac G4 DP) Kernel 6.1-rc5 fails to boot on my PowerMac G4 3,6 at a very early stage (OpenFirmware console, white background) when built with CONFIG_KASAN_OUTLINE=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y. Same kernel .config without CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT just boots fine, even when Outline KASAN is enabled and CONFIG_DEBUG_PAGEALLOC=y is set. Only CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT seems to provoke the issue.