Bug 204375 - kernel 5.2.4 w. KASAN enabled fails to boot on a PowerMac G4 3,6 at very early stage
Summary: kernel 5.2.4 w. KASAN enabled fails to boot on a PowerMac G4 3,6 at very earl...
Status: CLOSED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: PPC-32 (show other bugs)
Hardware: PPC-32 Linux
: P1 normal
Assignee: platform_ppc-32
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-29 23:24 UTC by Erhard F.
Modified: 2019-08-09 12:25 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.2.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel .config (PowerMac G4 DP, kernel 5.2.4) (90.65 KB, text/plain)
2019-07-29 23:27 UTC, Erhard F.
Details
early boot picture (PowerMac G4 DP, kernel 5.2.5) (1.95 MB, image/jpeg)
2019-07-31 09:13 UTC, Erhard F.
Details
Screenshot1 (80.16 KB, image/png)
2019-07-31 09:38 UTC, Christophe Leroy
Details
Screenshot2 (76.98 KB, image/png)
2019-07-31 09:39 UTC, Christophe Leroy
Details

Description Erhard F. 2019-07-29 23:24:54 UTC
I wanted to give KASAN a try to uncover even more bugs, but as it turns out a KASAN enabled kernel does not boot at all on my PowerMac G4 MDD. ;)

Tried kernels 5.2.4 and 5.3-rc2, both without success. I can't give a dmesg as the boot process stops/stalls very early, only showing two lines on an OpenFirmware console (white background, black letters):

done
found display   : /pco@f0000000/ATY,AlteracParent@10/ATY,Alterac_B@1, opening...
Comment 1 Erhard F. 2019-07-29 23:27:55 UTC
Created attachment 284039 [details]
kernel .config (PowerMac G4 DP, kernel 5.2.4)

With this .config the G4 DP boots fine.
With this .config + KASAN enabled the G4 DP fails to boot.
Comment 2 Christophe Leroy 2019-07-31 04:49:51 UTC
Looks like a panic happens in kasan_mmu_init() due to page allocation failure.

#0  __delay (loops=0x186a0) at arch/powerpc/kernel/time.c:470
#1  0xc0043b34 in panic (fmt=<optimized out>) at kernel/panic.c:305
#2  0xc0049f1c in do_exit (code=0xb) at kernel/exit.c:789
#3  0xc0012c30 in oops_end (signr=<optimized out>, regs=<optimized out>, flags=<optimized out>) at arch/powerpc/kernel/traps.c:249
#4  die (str=0xc08c78a0 <read_bhrb+8956080> "Kernel access of bad area", regs=0xc0f60000 <reservation_seqcount_class>, err=0xb) at arch/powerpc/kernel/traps.c:294
#5  0xc00194a8 in handle_page_fault () at arch/powerpc/kernel/entry_32.S:632
#6  0xc011ca64 in task_css (task=0xc0da69a0 <init_task>, subsys_id=<optimized out>) at ./include/linux/cgroup.h:486
#7  0xc011ca7c in task_cs (task=<optimized out>) at kernel/cgroup/cpuset.c:199
#8  0xc011fed4 in cpuset_print_current_mems_allowed () at kernel/cgroup/cpuset.c:3444
#9  0xc0210c64 in warn_alloc (gfp_mask=<optimized out>, nodemask=0x0, fmt=0xc093e080 <read_bhrb+9441424> "page allocation failure: order:%u") at mm/page_alloc.c:3765
#10 0xc021151c in __alloc_pages_slowpath (ac=<optimized out>, order=<optimized out>, gfp_mask=<optimized out>) at mm/page_alloc.c:4637
#11 __alloc_pages_nodemask (gfp_mask=0x100, order=0x0, preferred_nid=<optimized out>, nodemask=<optimized out>) at mm/page_alloc.c:4745
#12 0xc0022480 in __alloc_pages (order=<optimized out>, preferred_nid=<optimized out>, gfp_mask=<optimized out>) at ./include/linux/gfp.h:473
#13 __alloc_pages_node (order=<optimized out>, gfp_mask=<optimized out>, nid=<optimized out>) at ./include/linux/gfp.h:486
#14 alloc_pages_node (nid=<optimized out>, order=<optimized out>, gfp_mask=<optimized out>) at ./include/linux/gfp.h:500
#15 __alloc_for_ptecache (mm=<optimized out>, kernel=<optimized out>) at arch/powerpc/mm/pgtable-frag.c:69
#16 pte_fragment_alloc (mm=<optimized out>, kernel=<optimized out>) at arch/powerpc/mm/pgtable-frag.c:106
#17 0xc0024034 in pte_alloc_one_kernel (mm=<optimized out>) at ./arch/powerpc/include/asm/pgalloc.h:27
#18 kasan_init_shadow_page_tables (k_start=0xf8400000, k_end=<optimized out>) at arch/powerpc/mm/kasan/kasan_init_32.c:38
#19 0xc0c92734 in kasan_mmu_init () at arch/powerpc/mm/kasan/kasan_init_32.c:104
#20 0xc0c91560 in MMU_init () at arch/powerpc/mm/init_32.c:178
#21 0xc000354c in start_here () at arch/powerpc/kernel/head_32.S:899
Comment 3 Christophe Leroy 2019-07-31 05:19:52 UTC
Looks like that's due to 4a6d8cf90017 ("powerpc/mm: don't use pte_alloc_kernel() until slab is available on PPC32"), committed just before the KASAN series
Comment 4 Christophe Leroy 2019-07-31 06:04:44 UTC
Proposed fix at https://patchwork.ozlabs.org/patch/1139515/
Comment 5 Erhard F. 2019-07-31 09:13:44 UTC
Created attachment 284057 [details]
early boot picture (PowerMac G4 DP, kernel 5.2.5)

Thanks! Your patch cleanly applied against kernel 5.2.5.

The boot process continues further and KASAN shadow mem is listed in Kernel virtual memory layout (see picture). But it still hangs. It's still OF output, I don't get to the point where radeon DRM kicks in.

I get a reboot after ~2min, most probably due to the soft lockup + hung tasks timeout.
Comment 6 Christophe Leroy 2019-07-31 09:27:35 UTC
>> =============================================================
>> OpenBIOS 1.1 [Oct 19 2017 07:00]
>> Configuration device id QEMU version 1 machine id 1
>> CPUs: 1
>> Memory: 2048M
>> UUID: 00000000-0000-0000-0000-000000000000
>> CPU type PowerPC,G4
milliseconds isn't unique.
Welcome to OpenBIOS v1.1 built on Oct 19 2017 07:00
>> [ppc] Kernel already loaded (0x01000000 + 0x01a1a7c0) (initrd 0x00000000 +
>> 0x00000000)
>> [ppc] Kernel command line: console=ttyS0
OF stdout device is: /pci@f2000000/mac-io@c/escc@13000/ch-a@13020
Preparing to boot Linux version 5.3.0-rc2+ (root@pc17473vm.idsi0.si.c-s.fr) (gcc version 5.4.0 (GCC)) #1881 SMP Wed Jul 31 05:36:00 UTC 2019
Detected machine type: 00000400
command line: 
memory layout at init:
  memory_limit : 00000000 (16 MB aligned)
  alloc_bottom : 02a1f000
  alloc_top    : 30000000
  alloc_top_hi : 80000000
  rmo_top      : 30000000
  ram_top      : 80000000
found display   : /pci@f2000000/QEMU,VGA@e, opening... done
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x02a20000 -> 0x02a1f0a4
Device tree struct  0x02a21000 -> 0x7fde7ec0
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x01000000 ...
Hello World !
Total memory = 2048MB; using 4096kB for hash table
Activating Kernel Userspace Execution Prevention
Activating Kernel Userspace Access Protection
Linux version 5.3.0-rc2+ (root@pc17473vm.idsi0.si.c-s.fr) (gcc version 5.4.0 (GCC)) #1881 SMP Wed Jul 31 05:36:00 UTC 2019
KASAN init done
Found UniNorth memory controller & host bridge @ 0xf8000000 revision: 0x762fb70
Mapped at 0xf77c0000
Found a Keylargo mac-io controller, rev: 0, mapped at 0x(ptrval)
PowerMac motherboard: PowerMac G4 AGP Graphics
boot stdout isn't a display !
Using PowerMac machine description
printk: bootconsole [udbg0] enabled
CPU maps initialized for 1 thread per core
-----------------------------------------------------
phys_mem_size     = 0x80000000
dcache_bsize      = 0x20
icache_bsize      = 0x20
cpu_features      = 0x000000000501a00a
  possible        = 0x000000002f7ff14b
  always          = 0x0000000001000000
cpu_user_features = 0x9c000001 0x00000000
mmu_features      = 0x00000001
Hash_size         = 0x400000
Hash_mask         = 0xffff
-----------------------------------------------------
Found UniNorth PCI host bridge at 0x00000000f2000000. Firmware bus number: 0->0
PCI host bridge /pci@f2000000 (primary) ranges:
  IO 0x00000000f2000000..0x00000000f27fffff -> 0x0000000000000000
 MEM 0x0000000080000000..0x000000008fffffff -> 0x0000000080000000 
WARNING ! Your machine is CUDA-based but your kernel
          wasn't compiled with CONFIG_ADB_CUDA option !
nvram: Checking bank 0...
Invalid signature
Invalid checksum
nvram: gen0=0, gen1=0
nvram: Active bank is: 0
nvram: OF partition at 0xffffffff
nvram: XP partition at 0xffffffff
nvram: NR partition at 0xffffffff
Zone ranges:
  DMA      [mem 0x0000000000000000-0x000000002fffffff]
  Normal   empty
  HighMem  [mem 0x0000000030000000-0x000000007fffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000000000-0x000000007fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000007fffffff]
percpu: Embedded 29 pages/cpu s88872 r8192 d21720 u118784
Built 1 zonelists, mobility grouping on.  Total pages: 522560
Kernel command line: console=ttyS0
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1948456K/2097152K available (8936K kernel code, 1764K rwdata, 3892K rodata, 1128K init, 11001K bss, 148696K reserved, 0K cma-reserved, 1310720K highmem)
Kernel virtual memory layout:
  * 0xf8000000..0x00000000  : kasan shadow mem
  * 0xf7fbf000..0xf7fff000  : fixmap
  * 0xf7800000..0xf7c00000  : highmem PTEs
  * 0xf6f38000..0xf7800000  : early ioremap
  * 0xf1000000..0xf6f38000  : vmalloc & ioremap
random: get_random_u32 called from __kmem_cache_create+0x2c/0x46c with crng_init=0
SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Running RCU self tests
rcu: Hierarchical RCU implementation.
rcu: 	RCU lockdep checking is enabled.
rcu: 	RCU restricting CPUs from NR_CPUS=2 to nr_cpu_ids=1.
rcu: RCU calculated value of scheduler-enlistment delay is 30 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
mpic: Setting up MPIC " MPIC 1   " version 1.2 at 80040000, max 1 CPUs
mpic: ISU size: 64, shift: 6, mask: 3f
mpic: Initializing for 64 sources
GMT Delta read from XPRAM: 0 minutes, DST: on
clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
clocksource: timebase mult[a000000] shift[24] registered
Console: colour dummy device 80x25
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:  8
... MAX_LOCK_DEPTH:          48
... MAX_LOCKDEP_KEYS:        8192
... CLASSHASH_SIZE:          4096
... MAX_LOCKDEP_ENTRIES:     32768
... MAX_LOCKDEP_CHAINS:      65536
... CHAINHASH_SIZE:          32768
 memory used by lock dependency info: 4413 kB
 per task-struct memory footprint: 1536 bytes
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
*** VALIDATE proc ***
*** VALIDATE cgroup1 ***
*** VALIDATE cgroup2 ***
smp_core99_probe
PowerMac SMP probe found 1 cpus
rcu: Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 1 CPU
Using standard scheduler topology
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
futex hash table entries: 256 (order: 2, 16384 bytes, linear)
xor: measuring software checksum speed
   8regs     :    45.600 MB/sec
   8regs_prefetch:    55.200 MB/sec
   32regs    :    67.200 MB/sec
   32regs_prefetch:    69.600 MB/sec
   altivec   :   146.400 MB/sec
xor: using function: altivec (146.400 MB/sec)
prandom: seed boundary self test passed
prandom: 100 self tests passed
NET: Registered protocol family 16
             
PCI: Probing PCI hardware
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x0000-0x7fffff]
pci_bus 0000:00: root bus resource [mem 0x80000000-0x8fffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to ff
pci 0000:00:0b.0: [106b:0020] type 00 class 0x060000
pci 0000:00:0c.0: [106b:0022] type 00 class 0xff0000
pci 0000:00:0c.0: reg 0x10: [mem 0x80000000-0x8007ffff]
pci 0000:00:0d.0: [106b:003f] type 00 class 0x0c0310
pci 0000:00:0d.0: reg 0x10: [mem 0x80080000-0x800800ff]
pci 0000:00:0e.0: [1234:1111] type 00 class 0x030000
pci 0000:00:0e.0: reg 0x10: [mem 0x81000000-0x81ffffff pref]
pci 0000:00:0e.0: reg 0x18: [mem 0x82000000-0x82000fff]
pci 0000:00:0e.0: reg 0x30: [mem 0x82010000-0x8201ffff pref]
pci 0000:00:0f.0: [10ec:8029] type 00 class 0x020000
pci 0000:00:0f.0: reg 0x10: [io  0x1000-0x10ff]
pci 0000:00:0f.0: reg 0x30: [mem 0x82040000-0x8207ffff pref]
pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
pci_bus 0000:00: resource 4 [io  0x0000-0x7fffff]
pci_bus 0000:00: resource 5 [mem 0x80000000-0x8fffffff]
alg: extra crypto tests enabled.  This is intended for developer use only.
raid6: altivecx8 gen()   148 MB/s
raid6: altivecx4 gen()   159 MB/s
raid6: altivecx2 gen()   150 MB/s
raid6: altivecx1 gen()   124 MB/s
raid6: int32x8  gen()    41 MB/s
raid6: int32x8  xor()    20 MB/s
raid6: int32x4  gen()    42 MB/s
raid6: int32x4  xor()    32 MB/s
raid6: int32x2  gen()    48 MB/s
raid6: int32x2  xor()    39 MB/s
raid6: int32x1  gen()    39 MB/s
raid6: int32x1  xor()    35 MB/s
raid6: using algorithm altivecx4 gen() 159 MB/s
raid6: using intx1 recovery algorithm
pci 0000:00:0e.0: vgaarb: setting as boot VGA device
pci 0000:00:0e.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
pci 0000:00:0e.0: vgaarb: bridge control possible
vgaarb: loaded
SCSI subsystem initialized
clocksource: Switched to clocksource timebase
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 512 (order: 2, 20480 bytes, linear)
TCP established hash table entries: 8192 (order: 3, 32768 bytes, linear)
TCP bind hash table entries: 8192 (order: 6, 294912 bytes, linear)
TCP: Hash tables configured (established 8192 bind 8192)
UDP hash table entries: 512 (order: 3, 40960 bytes, linear)
UDP-Lite hash table entries: 512 (order: 3, 40960 bytes, linear)
NET: Registered protocol family 1
PCI: CLS 0 bytes, default 32
Thermal assist unit 
using timers, 
shrink_timer: 600 jiffies
Initialise system trusted keyrings
workingset: timestamp_bits=14 max_order=19 bucket_order=5
WARNING: CPU: 0 PID: 1 at kernel/smp.c:433 smp_call_function_many+0xb0/0x3a4
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc2+ #1881
NIP:  c00fb9cc LR: c00fb9b8 CTR: 00000000
REGS: ee8d9618 TRAP: 0700   Not tainted  (5.3.0-rc2+)
MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 44004204  XER: 00000000

GPR00: c00fb968 ee8d96d0 ee8d69e0 ee8d69e0 00000003 00000000 c00fb9b8 fdd1ad3c 
GPR08: 00000100 00000100 001f0100 ee8d69e0 24004204 00000000 c0da0c60 ee8d9814 
GPR16: ee8d9810 c0017f24 00000122 eedd8ea0 c0f50000 00000000 eedd8efc 00000000 
GPR24: 00000000 c0017f7c 00000000 c0f485f8 c0f65720 00000000 ee8d69e0 c0f485f8 
NIP [c00fb9cc] smp_call_function_many+0xb0/0x3a4
LR [c00fb9b8] smp_call_function_many+0x9c/0x3a4
Call Trace:
[ee8d96d0] [c00fb968] smp_call_function_many+0x4c/0x3a4 (unreliable)
[ee8d9720] [c00fbce8] smp_call_function+0x28/0x38
[ee8d9730] [c00fbd14] on_each_cpu+0x1c/0x5c
[ee8d9750] [c00daf0c] call_timer_fn+0x194/0x354
[ee8d97e0] [c00dbe60] __run_timers.part.32+0x2ac/0x2ec
[ee8d98c0] [c00dc094] run_timer_softirq+0x78/0xe0
[ee8d98f0] [c08b96f0] __do_softirq+0x290/0x58c
[ee8d9960] [c004c778] irq_exit+0x124/0x19c
[ee8d9980] [c00119e8] timer_interrupt+0x3f0/0x4a0
[ee8d99d0] [c0019600] ret_from_except+0x0/0x14
--- interrupt: 901 at __slab_alloc.constprop.57+0x60/0x6c
    LR = __slab_alloc.constprop.57+0x5c/0x6c
[ee8d9ab8] [c0229d3c] kmem_cache_alloc+0x80/0x268
[ee8d9af8] [c02eb5ec] __kernfs_new_node.isra.7+0xb8/0x278
[ee8d9be8] [c02ed478] kernfs_new_node+0x4c/0x74
[ee8d9c18] [c02eda64] kernfs_create_dir_ns+0x40/0xa8
[ee8d9c38] [c02f1424] sysfs_create_dir_ns+0x120/0x184
[ee8d9d08] [c089ca74] kobject_add_internal+0x154/0x350
[ee8d9d38] [c089d00c] kobject_init_and_add+0xe8/0x100
[ee8d9dc8] [c022b060] sysfs_slab_add+0x118/0x2a4
[ee8d9df8] [c0caeb98] slab_sysfs_init+0xb0/0x12c
[ee8d9e18] [c0005860] do_one_initcall+0x134/0x33c
[ee8d9ec8] [c0c8c418] kernel_init_freeable+0x2b4/0x35c
[ee8d9f18] [c0005d38] kernel_init+0x18/0xf8
[ee8d9f38] [c0019348] ret_from_kernel_thread+0x14/0x1c
Instruction dump:
481326c1 893e8544 2f890000 419e0300 7c5e1378 7c431378 48132879 813e0000 
3d40001f 614a0100 7d285039 41a20008 <0fe00000> 7fe5fb78 7f64db78 3860ffff 
irq event stamp: 142739
hardirqs last  enabled at (142738): [<c08b8df4>] _raw_spin_unlock_irqrestore+0x48/0x60
hardirqs last disabled at (142739): [<c0019128>] reenable_mmu+0x1c/0xa8
softirqs last  enabled at (142708): [<c08b9914>] __do_softirq+0x4b4/0x58c
softirqs last disabled at (142733): [<c004c778>] irq_exit+0x124/0x19c
---[ end trace 0aade92aa60bd952 ]---
NET: Registered protocol family 38
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Asymmetric key parser 'pkcs8' registered
bounce: pool size: 64 pages
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler kyber registered
io scheduler bfq registered
Using unsupported 800x600 (null) at 81000000, depth=32, pitch=3200
Console: switching to colour frame buffer device 100x37
fb0: Open Firmware frame buffer device on /pci@f2000000/QEMU,VGA@e
Non-volatile memory driver v1.3
MacIO PCI driver attached to Keylargo chipset
Warning: no ADB interface detected
pata-macio 0.00020000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 0
scsi host0: pata_macio
ata1: PATA max MWDMA2 irq 16
pata-macio 0.00021000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 1
scsi host1: pata_macio
ata2: PATA max MWDMA2 irq 18
rtc-generic rtc-generic: registered as rtc0
i2c /dev entries driver
ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
NET: Registered protocol family 10
Segment Routing with IPv6
NET: Registered protocol family 17
drmem: No dynamic reconfiguration memory found
registered taskstats version 1
Loading compiled-in X.509 certificates
Btrfs loaded, crc32c=crc32c-generic
BTRFS: selftest: sectorsize: 4096  nodesize: 4096
BTRFS: selftest: running btrfs free space cache tests
BTRFS: selftest: running extent only tests
BTRFS: selftest: running bitmap only tests
BTRFS: selftest: running bitmap and extent tests
BTRFS: selftest: running space stealing from bitmap to extent tests
BTRFS: selftest: running extent buffer operation tests
BTRFS: selftest: running btrfs_split_item tests
BTRFS: selftest: running extent I/O tests
BTRFS: selftest: running find delalloc tests
BTRFS: selftest: running find_first_clear_extent_bit test
BTRFS: selftest: running extent buffer bitmap tests
BTRFS: selftest: running inode tests
BTRFS: selftest: running btrfs_get_extent tests
BTRFS: selftest: running hole first btrfs_get_extent test
BTRFS critical (device (efault)): regular/prealloc extent found for non-regular inode 256
BTRFS: selftest: fs/btrfs/tests/inode-tests.c:904 expected a real extent, got 0

============================================
WARNING: possible recursive locking detected
5.3.0-rc2+ #1881 Tainted: G        W        
--------------------------------------------
swapper/0/1 is trying to acquire lock:
(ptrval) (&(&n->list_lock)->rlock){....}, at: ___slab_alloc.constprop.58+0xf0/0x38c

but task is already holding lock:
(ptrval) (&(&n->list_lock)->rlock){....}, at: __kmem_cache_shutdown+0x70/0x20c

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&n->list_lock)->rlock);
  lock(&(&n->list_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by swapper/0/1:
 #0: (ptrval) (cpu_hotplug_lock.rw_sem){++++}, at: kmem_cache_destroy+0x54/0x1f4
 #1: (ptrval) (slab_mutex){+.+.}, at: kmem_cache_destroy+0x60/0x1f4
 #2: (ptrval) (&(&n->list_lock)->rlock){....}, at: __kmem_cache_shutdown+0x70/0x20c

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.3.0-rc2+ #1881
Call Trace:
[ee8d9938] [c0895918] dump_stack+0xb4/0xf8 (unreliable)
[ee8d9968] [c00aea18] __lock_acquire+0x1174/0x18dc
[ee8d9bd8] [c00ad1e0] lock_acquire+0x14c/0x1c0
[ee8d9c18] [c08b8a14] _raw_spin_lock+0x34/0x4c
[ee8d9c38] [c0229560] ___slab_alloc.constprop.58+0xf0/0x38c
[ee8d9cc8] [c022983c] __slab_alloc.constprop.57+0x40/0x6c
[ee8d9ce8] [c0229920] __kmalloc+0xb8/0x1f0
[ee8d9d28] [c022ba20] __kmem_cache_shutdown+0xe4/0x20c
[ee8d9d78] [c01e4e08] shutdown_cache+0x20/0x13c
[ee8d9d98] [c01e5320] kmem_cache_destroy+0x1d8/0x1f4
[ee8d9dd8] [c040bbb4] extent_io_exit+0x24/0x44
[ee8d9df8] [c0cbb93c] init_btrfs_fs+0x118/0x134
[ee8d9e18] [c0005860] do_one_initcall+0x134/0x33c
[ee8d9ec8] [c0c8c418] kernel_init_freeable+0x2b4/0x35c
[ee8d9f18] [c0005d38] kernel_init+0x18/0xf8
[ee8d9f38] [c0019348] ret_from_kernel_thread+0x14/0x1c
=============================================================================
BUG btrfs_extent_state (Tainted: G        W        ): Objects remaining in btrfs_extent_state on __kmem_cache_shutdown()
-----------------------------------------------------------------------------

INFO: Slab 0x(ptrval) objects=14 used=1 fp=0x(ptrval) flags=0x0200
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B   W         5.3.0-rc2+ #1881
Call Trace:
[ee8d9c58] [c0895918] dump_stack+0xb4/0xf8 (unreliable)
[ee8d9c88] [c0227be0] slab_err+0x98/0xa0
[ee8d9d28] [c022ba3c] __kmem_cache_shutdown+0x100/0x20c
[ee8d9d78] [c01e4e08] shutdown_cache+0x20/0x13c
[ee8d9d98] [c01e5320] kmem_cache_destroy+0x1d8/0x1f4
[ee8d9dd8] [c040bbb4] extent_io_exit+0x24/0x44
[ee8d9df8] [c0cbb93c] init_btrfs_fs+0x118/0x134
[ee8d9e18] [c0005860] do_one_initcall+0x134/0x33c
[ee8d9ec8] [c0c8c418] kernel_init_freeable+0x2b4/0x35c
[ee8d9f18] [c0005d38] kernel_init+0x18/0xf8
[ee8d9f38] [c0019348] ret_from_kernel_thread+0x14/0x1c
INFO: Object 0x(ptrval) @offset=848
INFO: Allocated in alloc_extent_state+0x2c/0x1a4 age=1114 cpu=0 pid=1
	__slab_alloc.constprop.57+0x40/0x6c
	kmem_cache_alloc+0x80/0x268
	alloc_extent_state+0x2c/0x1a4
	__set_extent_bit+0x1b8/0x770
	set_extent_bit+0x30/0x40
	btrfs_test_extent_io+0x998/0xc54
	btrfs_run_sanity_tests+0xcc/0x144
	init_btrfs_fs+0xd4/0x134
	do_one_initcall+0x134/0x33c
	kernel_init_freeable+0x2b4/0x35c
	kernel_init+0x18/0xf8
	ret_from_kernel_thread+0x14/0x1c
=============================================================================
BUG btrfs_extent_state (Tainted: G    B   W        ): Objects remaining in btrfs_extent_state on __kmem_cache_shutdown()
-----------------------------------------------------------------------------

INFO: Slab 0x(ptrval) objects=14 used=2 fp=0x(ptrval) flags=0x0200
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B   W         5.3.0-rc2+ #1881
Call Trace:
[ee8d9c58] [c0895918] dump_stack+0xb4/0xf8 (unreliable)
[ee8d9c88] [c0227be0] slab_err+0x98/0xa0
[ee8d9d28] [c022ba3c] __kmem_cache_shutdown+0x100/0x20c
[ee8d9d78] [c01e4e08] shutdown_cache+0x20/0x13c
[ee8d9d98] [c01e5320] kmem_cache_destroy+0x1d8/0x1f4
[ee8d9dd8] [c040bbb4] extent_io_exit+0x24/0x44
[ee8d9df8] [c0cbb93c] init_btrfs_fs+0x118/0x134
[ee8d9e18] [c0005860] do_one_initcall+0x134/0x33c
[ee8d9ec8] [c0c8c418] kernel_init_freeable+0x2b4/0x35c
[ee8d9f18] [c0005d38] kernel_init+0x18/0xf8
[ee8d9f38] [c0019348] ret_from_kernel_thread+0x14/0x1c
INFO: Object 0x(ptrval) @offset=2248
INFO: Allocated in alloc_extent_state+0x2c/0x1a4 age=1114 cpu=0 pid=1
	__slab_alloc.constprop.57+0x40/0x6c
	kmem_cache_alloc+0x80/0x268
	alloc_extent_state+0x2c/0x1a4
	__set_extent_bit+0x1b8/0x770
	set_extent_bit+0x30/0x40
	btrfs_test_extent_io+0x88c/0xc54
	btrfs_run_sanity_tests+0xcc/0x144
	init_btrfs_fs+0xd4/0x134
	do_one_initcall+0x134/0x33c
	kernel_init_freeable+0x2b4/0x35c
	kernel_init+0x18/0xf8
	ret_from_kernel_thread+0x14/0x1c
INFO: Object 0x(ptrval) @offset=2808
INFO: Allocated in alloc_extent_state+0x2c/0x1a4 age=1114 cpu=0 pid=1
	__slab_alloc.constprop.57+0x40/0x6c
	kmem_cache_alloc+0x80/0x268
	alloc_extent_state+0x2c/0x1a4
	__set_extent_bit+0x1b8/0x770
	set_extent_bit+0x30/0x40
	btrfs_test_extent_io+0x7f8/0xc54
	btrfs_run_sanity_tests+0xcc/0x144
	init_btrfs_fs+0xd4/0x134
	do_one_initcall+0x134/0x33c
	kernel_init_freeable+0x2b4/0x35c
	kernel_init+0x18/0xf8
	ret_from_kernel_thread+0x14/0x1c
kmem_cache_destroy btrfs_extent_state: Slab cache still has objects
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B   W         5.3.0-rc2+ #1881
Call Trace:
[ee8d9d98] [c01e5314] kmem_cache_destroy+0x1cc/0x1f4
[ee8d9dd8] [c040bbb4] extent_io_exit+0x24/0x44
[ee8d9df8] [c0cbb93c] init_btrfs_fs+0x118/0x134
[ee8d9e18] [c0005860] do_one_initcall+0x134/0x33c
[ee8d9ec8] [c0c8c418] kernel_init_freeable+0x2b4/0x35c
[ee8d9f18] [c0005d38] kernel_init+0x18/0xf8
[ee8d9f38] [c0019348] ret_from_kernel_thread+0x14/0x1c
Duplicate name in testcase-data, renamed to "duplicate-name#1"
### dt-test ### start of unittest - you will see error messages
OF: /testcase-data/phandle-tests/consumer-a: could not get #phandle-cells-missing for /testcase-data/phandle-tests/provider1
OF: /testcase-data/phandle-tests/consumer-a: could not get #phandle-cells-missing for /testcase-data/phandle-tests/provider1
OF: /testcase-data/phandle-tests/consumer-a: could not find phandle
OF: /testcase-data/phandle-tests/consumer-a: could not find phandle
OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found 0
OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found 0
OF: /testcase-data/phandle-tests/consumer-b: could not get #phandle-missing-cells for /testcase-data/phandle-tests/provider1
OF: /testcase-data/phandle-tests/consumer-b: could not find phandle
OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found 0
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest0/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest1/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest2/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest3/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest5/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest6/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest7/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest8/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/test-unittest8/property-foo
OF: overlay: node_overlaps_later_cs: #6 overlaps with #7 @/testcase-data/overlay-node/test-bus/test-unittest8
OF: overlay: overlay #6 is not topmost
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/i2c-test-bus/test-unittest12/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data/overlay-node/test-bus/i2c-test-bus/test-unittest13/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/fairway-1/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/fairway-1/ride@100/track@30/incline-up
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/fairway-1/ride@100/track@40/incline-up
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/lights@40000/status
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/lights@40000/color
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/lights@40000/rate
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/hvac_2
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/ride_200
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/ride_200_left
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/ride_200_right
OF: overlay: ERROR: multiple fragments add and/or delete node /testcase-data-2/substation@100/motor-1/controller
OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/controller/name
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail
OF: overlay: WARNING: memory leak will occur if overlay removed, property: /testcase-data-2/substation@100/motor-1/rpm_avail
OF: overlay: ERROR: multiple fragments add, update, and/or delete property /testcase-data-2/substation@100/motor-1/rpm_avail
### dt-test ### end of unittest - 223 passed, 0 failed
Warning: unable to open an initial console.
VFS: Cannot open root device "(null)" or unknown-block(8,1): error -6
Please append a correct "root=" boot option; here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B   W         5.3.0-rc2+ #1881
Call Trace:
[ee8d9d58] [c0895918] dump_stack+0xb4/0xf8 (unreliable)
[ee8d9d88] [c0043a08] panic+0x1c0/0x3f0
[ee8d9e48] [c0c8ca3c] mount_block_root+0x348/0x388
[ee8d9ef8] [c0c8cc44] prepare_namespace+0x13c/0x178
[ee8d9f18] [c0005d38] kernel_init+0x18/0xf8
[ee8d9f38] [c0019348] ret_from_kernel_thread+0x14/0x1c
Rebooting in 120 seconds..
Comment 7 Christophe Leroy 2019-07-31 09:38:27 UTC
Created attachment 284059 [details]
Screenshot1

When booting QEMU in graphic mode, it first hangs for some time here.
Comment 8 Christophe Leroy 2019-07-31 09:39:34 UTC
Created attachment 284061 [details]
Screenshot2

Then it reverts to text mode and continue booting
Comment 9 Erhard F. 2019-07-31 10:38:40 UTC
Hmm, nice..

I'll try to get the G4 booting further by prolonging HUNG_TASK_TIMEOUT and PANIC_TIMEOUT to 10min. Probably the btrfs guys also want to know about this btrfs deadlock. I try to get a stacktrace, but if I am not successful I hope it's ok if I borrow your QEMU data from here to open the btrfs bug?
Comment 10 Christophe Leroy 2019-07-31 11:37:52 UTC
Yes feel free to take anything I have attached here.
Comment 11 Erhard F. 2019-07-31 13:11:03 UTC
I opened bug #204397 with your output.

The G4 DP won't boot to a stage where I can get a dmesg, even if waiting for 10 minutes.
Comment 12 Christophe Leroy 2019-08-01 05:36:03 UTC
On my side under QEMU, deactivating KASAN on lib/mpi significantly reduces boot time.
For that, just add the following on top of lib/mpi/Makefile:

KASAN_SANITIZE := n
Comment 13 Erhard F. 2019-08-01 11:53:22 UTC
Thanks for the hint! But adding KASAN_SANITIZE := n to lib/mpi/Makefile did not make a change in this case.

However I had the idea to disable btrfs (at all) and boot from my other ext4 partition. This worked out and I got further. After other stacktraces flying by the boot process now stalls at offb switching over to radeondrmfb. Again reboot after timer kicks in after 10min.

Seems btrfs is not in the best condition on ppc32 with 4K pagesize..

I would test this on the G5 or Talos II too, but there is no kernel option to enable 'KASAN: runtime memory debugging' on ppc64.
Comment 14 Erhard F. 2019-08-05 00:42:10 UTC
With radeon.ko module removed the G4 DP continues and finishes booting with KASAN. So this one seems fixed. Thanks!
Comment 15 Erhard F. 2019-08-08 22:04:53 UTC
The fix landed succesfully in 5.2.7 and I can confirm it works on my G4 now. Thanks!

Note You need to log in before you can comment on or make changes to this bug.