Bug 14148

Summary: kernel panic: do_wp_page assert_pte_locked failed when DEBUG_VM
Product: Platform Specific/Hardware Reporter: Wang, Baojun (wangbj)
Component: PPC-32Assignee: platform_ppc-32
Status: CLOSED OBSOLETE    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-rc3, 2.6.31-rc9-git2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: problematic config file for mpc8548cds

Description Wang, Baojun 2009-09-09 15:08:57 UTC
Created attachment 23049 [details]
problematic config file for mpc8548cds

powerpc mpc8548cds (I only have this board on hand) will kernel panic if DEBUG_VM (kernel hacking) is enabled due to assertion failed in function do_wp_page(). I think it highly possible for other ppc boards like 44x have the same problem too, but I don't have the board.

here is the full log from power up (after u-boot). and the attachment is related .config, NOTE the kernel boot successfully if CONFIG_DEBUG_VM is not enabled.

host system is gentoo, the gcc (powerpc-unknown-linux-gnu-gcc) is build by gentoo crossdev, version 4.4.1, (cross) glibc is 2.9, (cross) binutils is 2.19.1, (cross) kernel headers is 2.6.30. target (mpc8548cds) root filesystem is also gentoo (200907xx, extracted from stage3 tarball).

I have running similar test on x86 using qemu (0.10.6, +kvm), the result seems OK, especially x86 pass all lock api test suite.

  Best Regards,
Wang Baojun 

   Uncompressing Kernel Image ... OK
   Loading Device Tree to 007fa000, end 007ff618 ... OK
[    0.000000] Using MPC85xx CDS machine description
[    0.000000] Memory CAM mapping: 256/0/0 Mb, residual: 0Mb
[    0.000000] Linux version 2.6.31-rc3 (XXX@localhost) (gcc version 4.4.1 (Gentoo 4.4.1 p1.0) ) #1 PREEMPT Wed Sep 9 05:04:39 CST 2009
[    0.000000] console [udbg0] enabled
setup_arch: bootmem
mpc85xx_cds_setup_arch()
CDS Version = 0x13 in slot 1

[    0.000000] Found FSL PCI host bridge at 0x00000000e0008000. Firmware bus number: 0->2
[    0.000000] PCI host bridge /pci@e0008000 (primary) ranges:
[    0.000000]  MEM 0x0000000080000000..0x000000008fffffff -> 0x0000000080000000
[    0.000000]   IO 0x00000000e2000000..0x00000000e27fffff -> 0x0000000000000000
[    0.000000] /pci@e0008000: PCICSRBAR @ 0xfff00000
[    0.000000] Found FSL PCI host bridge at 0x00000000e0009000. Firmware bus number: 0->0
[    0.000000] PCI host bridge /pci@e0009000  ranges:
[    0.000000]  MEM 0x0000000090000000..0x000000009fffffff -> 0x0000000090000000
[    0.000000]   IO 0x00000000e2800000..0x00000000e2ffffff -> 0x0000000000000000
[    0.000000] /pci@e0009000: PCICSRBAR @ 0x0
[    0.000000] /pci@e0009000: WARNING: Outbound window cfg leaves gaps in memory map. Adjusting the memory map could reduce unnecessary bounce buffering.
[    0.000000] /pci@e0009000: DMA window size is 0x0
[    0.000000] Found FSL PCI host bridge at 0x00000000e000a000. Firmware bus number: 0->255
[    0.000000] PCI host bridge /pcie@e000a000  ranges:
[    0.000000]  MEM 0x00000000a0000000..0x00000000bfffffff -> 0x00000000a0000000
[    0.000000]   IO 0x00000000e3000000..0x00000000e30fffff -> 0x0000000000000000
[    0.000000] /pcie@e000a000: PCICSRBAR @ 0xfff00000
arch: exit
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00010000
[    0.000000]   Normal   0x00010000 -> 0x00010000
[    0.000000]   HighMem  0x00010000 -> 0x00010000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[1] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00010000
[    0.000000] MMU: Allocated 1088 bytes of context maps for 255 contexts
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 65024
[    0.000000] Kernel command line: root=/dev/nfs rw nfsroot=xxx:/exports/nfs/diskless/gentoo-ppc ip=whatever:whatever:whatever:whatever:mpc8548cds:eth1:off console=ttyS1,115200
[    0.000000] PID hash table entries: 1024 (order: 10, 4096 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Memory: 253568k/262144k available (5608k kernel code, 8324k reserved, 220k data, 127k bss, 184k init)
[    0.000000] Kernel virtual memory layout:
[    0.000000]   * 0xfffef000..0xfffff000  : fixmap
[    0.000000]   * 0xff800000..0xffc00000  : highmem PTEs
[    0.000000]   * 0xfe6f7000..0xff800000  : early ioremap
[    0.000000]   * 0xd1000000..0xfe6f7000  : vmalloc & ioremap
[    0.000000] SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:512
[    0.000000] mpic: Setting up MPIC " OpenPIC  " version 1.2 at e0040000, max 1 CPUs
[    0.000000] mpic: ISU size: 80, shift: 7, mask: 7f
[    0.000000] mpic: Initializing for 80 sources
[    0.000000] clocksource: timebase mult[3c9b26d] shift[22] registered
[10607.274019] Console: colour dummy device 80x25
[10607.278419] ------------------------
[10607.281952] | Locking API testsuite:
[10607.285509] ----------------------------------------------------------------------------
[10607.293568]                                  | spin |wlock |rlock |mutex | wsem | rsem |
[10607.301628]   --------------------------------------------------------------------------
[10607.309689]                      A-A deadlock:failed|failed|  ok  |failed|failed|failed|
[10607.317745]                  A-B-B-A deadlock:failed|failed|  ok  |failed|failed|failed|
[10607.325805]              A-B-B-C-C-A deadlock:failed|failed|  ok  |failed|failed|failed|
[10607.333865]              A-B-C-A-B-C deadlock:failed|failed|  ok  |failed|failed|failed|
[10607.341925]          A-B-B-C-C-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
[10607.349985]          A-B-C-D-B-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
[10607.358045]          A-B-C-D-B-C-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
[10607.366105]                     double unlock:failed|failed|failed|failed|failed|failed|
[10607.374165]                   initialize held:failed|failed|failed|failed|failed|failed|
[10607.382225]                  bad unlock order:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
[10607.390285]   --------------------------------------------------------------------------
[10607.398348]               recursive read-lock:             |  ok  |             |failed|
[10607.406405]            recursive read-lock #2:             |  ok  |             |failed|
[10607.414465]             mixed read-write-lock:             |failed|             |failed|
[10607.422525]             mixed write-read-lock:             |failed|             |failed|
[10607.430585]   --------------------------------------------------------------------------
[10607.438648]      hard-irqs-on + irq-safe-A/12:failed|failed|  ok  |
[10607.444885]      soft-irqs-on + irq-safe-A/12:failed|failed|  ok  |
[10607.451125]      hard-irqs-on + irq-safe-A/21:failed|failed|  ok  |
[10607.457365]      soft-irqs-on + irq-safe-A/21:failed|failed|  ok  |
[10607.463605]        sirq-safe-A => hirqs-on/12:failed|failed|  ok  |
[10607.469845]        sirq-safe-A => hirqs-on/21:failed|failed|  ok  |
[10607.476085]          hard-safe-A + irqs-on/12:failed|failed|  ok  |
[10607.482325]          soft-safe-A + irqs-on/12:failed|failed|  ok  |
[10607.488565]          hard-safe-A + irqs-on/21:failed|failed|  ok  |
[10607.494805]          soft-safe-A + irqs-on/21:failed|failed|  ok  |
[10607.501045]     hard-safe-A + unsafe-B #1/123:failed|failed|  ok  |
[10607.507285]     soft-safe-A + unsafe-B #1/123:failed|failed|  ok  |
[10607.513525]     hard-safe-A + unsafe-B #1/132:failed|failed|  ok  |
[10607.519765]     soft-safe-A + unsafe-B #1/132:failed|failed|  ok  |
[10607.526005]     hard-safe-A + unsafe-B #1/213:failed|failed|  ok  |
[10607.532245]     soft-safe-A + unsafe-B #1/213:failed|failed|  ok  |
[10607.538485]     hard-safe-A + unsafe-B #1/231:failed|failed|  ok  |
[10607.544725]     soft-safe-A + unsafe-B #1/231:failed|failed|  ok  |
[10607.550965]     hard-safe-A + unsafe-B #1/312:failed|failed|  ok  |
[10607.557205]     soft-safe-A + unsafe-B #1/312:failed|failed|  ok  |
[10607.563445]     hard-safe-A + unsafe-B #1/321:failed|failed|  ok  |
[10607.569685]     soft-safe-A + unsafe-B #1/321:failed|failed|  ok  |
[10607.575925]     hard-safe-A + unsafe-B #2/123:failed|failed|  ok  |
[10607.582165]     soft-safe-A + unsafe-B #2/123:failed|failed|  ok  |
[10607.588405]     hard-safe-A + unsafe-B #2/132:failed|failed|  ok  |
[10607.594645]     soft-safe-A + unsafe-B #2/132:failed|failed|  ok  |
[10607.600885]     hard-safe-A + unsafe-B #2/213:failed|failed|  ok  |
[10607.607125]     soft-safe-A + unsafe-B #2/213:failed|failed|  ok  |
[10607.613365]     hard-safe-A + unsafe-B #2/231:failed|failed|  ok  |
[10607.619605]     soft-safe-A + unsafe-B #2/231:failed|failed|  ok  |
[10607.625845]     hard-safe-A + unsafe-B #2/312:failed|failed|  ok  |
[10607.632085]     soft-safe-A + unsafe-B #2/312:failed|failed|  ok  |
[10607.638325]     hard-safe-A + unsafe-B #2/321:failed|failed|  ok  |
[10607.644565]     soft-safe-A + unsafe-B #2/321:failed|failed|  ok  |
[10607.650805]       hard-irq lock-inversion/123:failed|failed|  ok  |
[10607.657045]       soft-irq lock-inversion/123:failed|failed|  ok  |
[10607.663285]       hard-irq lock-inversion/132:failed|failed|  ok  |
[10607.669525]       soft-irq lock-inversion/132:failed|failed|  ok  |
[10607.675765]       hard-irq lock-inversion/213:failed|failed|  ok  |
[10607.682005]       soft-irq lock-inversion/213:failed|failed|  ok  |
[10607.688245]       hard-irq lock-inversion/231:failed|failed|  ok  |
[10607.694485]       soft-irq lock-inversion/231:failed|failed|  ok  |
[10607.700725]       hard-irq lock-inversion/312:failed|failed|  ok  |
[10607.706965]       soft-irq lock-inversion/312:failed|failed|  ok  |
[10607.713205]       hard-irq lock-inversion/321:failed|failed|  ok  |
[10607.719445]       soft-irq lock-inversion/321:failed|failed|  ok  |
[10607.725685]       hard-irq read-recursion/123:  ok  |
[10607.730712]       soft-irq read-recursion/123:  ok  |
[10607.735739]       hard-irq read-recursion/132:  ok  |
[10607.740765]       soft-irq read-recursion/132:  ok  |
[10607.745792]       hard-irq read-recursion/213:  ok  |
[10607.750819]       soft-irq read-recursion/213:  ok  |
[10607.755845]       hard-irq read-recursion/231:  ok  |
[10607.760872]       soft-irq read-recursion/231:  ok  |
[10607.765898]       hard-irq read-recursion/312:  ok  |
[10607.770925]       soft-irq read-recursion/312:  ok  |
[10607.775952]       hard-irq read-recursion/321:  ok  |
[10607.780979]       soft-irq read-recursion/321:  ok  |
[10607.786005] --------------------------------------------------------
[10607.792335] 145 out of 218 testcases failed, as expected. |
[10607.797881] ----------------------------------------------------
[10607.803885] Mount-cache hash table entries: 512
[10607.809422] NET: Registered protocol family 16

[10607.815925] PCI: Probing PCI hardware
[10607.819754] pci 0000:01:02.0: unknown header type 7f, ignoring device
[10607.828103] pci 0002:03:00.0: ignoring class b20 (doesn't match header type 01)
[10607.835331] pci 0002:03:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[10607.841894] pci 0002:03:00.0: PME# disabled
[10607.846344] pci 0000:00:1c.0: PCI bridge, secondary bus 0000:01
[10607.852198] pci 0000:00:1c.0:   IO window: 0x00-0x1fff
[10607.857276] pci 0000:00:1c.0:   MEM window: 0x80000000-0x800fffff
[10607.863341] pci 0000:00:1c.0:   PREFETCH window: 0x80100000-0x801fffff
[10607.869846] pci 0002:03:00.0: PCI bridge, secondary bus 0002:04
[10607.875733] pci 0002:03:00.0:   IO window: 0x00-0xfffff
[10607.880933] pci 0002:03:00.0:   MEM window: 0xa0000000-0xbfffffff
[10607.886999] pci 0002:03:00.0:   PREFETCH window: disabled
[10607.892374] pci 0002:03:00.0: enabling device (0106 -> 0107)
[10607.904727] bio: create slab <bio-0> at 0
[10607.909389] SCSI subsystem initialized
[10607.913416] usbcore: registered new interface driver usbfs
[10607.918882] usbcore: registered new interface driver hub
[10607.924188] usbcore: registered new device driver usb
[10607.929447] Freescale Elo / Elo Plus DMA driver
[10607.934288] Slow work thread pool: Starting up
[10607.938699] Slow work thread pool: Ready
[10607.942535] FS-Cache: Loaded
[10607.945483] CacheFiles: Loaded
[10607.949181] NET: Registered protocol family 2
[10607.953486] IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
[10607.960380] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[10607.967432] TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
[10607.973816] TCP: Hash tables configured (established 8192 bind 8192)
[10607.980097] TCP reno registered
[10607.983262] NET: Registered protocol family 1
[10607.988554] fsl-elo-dma e0021300.dma: Probe the Freescale DMA driver for fsl,eloplus-dma controller at 0xe0021300...
[10607.999050] fsl-elo-dma e0021300.dma: #0 (fsl,eloplus-dma-channel), irq 20
[10608.005854] fsl-elo-dma e0021300.dma: #1 (fsl,eloplus-dma-channel), irq 21
[10608.012701] fsl-elo-dma e0021300.dma: #2 (fsl,eloplus-dma-channel), irq 22
[10608.019548] fsl-elo-dma e0021300.dma: #3 (fsl,eloplus-dma-channel), irq 23
[10608.027547] i8259 legacy interrupt controller initialized
[10608.033372] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[10608.039300] fuse init (API version 7.12)
[10608.043358] msgmni has been set to 495
[10608.047230] alg: No test for stdrng (krng)
[10608.051255] io scheduler noop registered
[10608.055133] io scheduler anticipatory registered (default)
[10608.060587] io scheduler deadline registered
[10608.064838] io scheduler cfq registered
[10608.069051] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[10608.074585] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[10608.086264] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[10608.093032] Platform driver 'serial8250' needs updating - please use dev_pm_ops
[10608.100363] serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
[10608.107233] serial8250.0: ttyS1 at MMIO 0xe0004600 (irq = 42) is a 16550A
[10608.113932] console handover: boot [udbg0] -> real [ttyS1]
[10608.122360] brd: module loaded
[10608.126962] loop: module loaded
[10608.130092] Loading iSCSI transport class v2.0-870.
[10608.135343] iscsi: registered transport (tcp)
[10608.140220] scsi0 : pata_via
[10608.143313] scsi1 : pata_via
[10608.146337] ata1: PATA max UDMA/100 cmd 0x1ff8 ctl 0x1ff4 bmdma 0x1fd0 irq 14
[10608.153476] ata2: PATA max UDMA/100 cmd 0x1fe8 ctl 0x1fe4 bmdma 0x1fd8 irq 14
[10608.161149] eth0: Gianfar Ethernet Controller Version 1.2, 00:04:9f:00:85:4b
[10608.168202] eth0: Running with NAPI enabled
[10608.172376] eth0: 256/256 RX/TX BD ring size
[10608.176866] eth1: Gianfar Ethernet Controller Version 1.2, 00:04:9f:00:85:4c
[10608.183912] eth1: Running with NAPI enabled
[10608.188084] eth1: 256/256 RX/TX BD ring size
[10608.192582] Freescale PowerQUICC MII Bus: probed
[10608.198252] Freescale PowerQUICC MII Bus: probed
[10608.203145] 8139too Fast Ethernet driver 0.9.28
[10608.208362] eth2: RealTek RTL8139 at 0xd1040000, ff:3f:ff:3f:ff:3f, IRQ 18
[10608.215299] tun: Universal TUN/TAP device driver, 1.6
[10608.220347] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[10608.226967] usbmon: debugfs is not available
[10608.231236] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[10608.237951] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[10608.244282] uhci_hcd: USB Universal Host Controller Interface driver
[10608.250671] uhci_hcd 0000:01:04.2: UHCI Host Controller
[10608.255896] uhci_hcd 0000:01:04.2: new USB bus registered, assigned bus number 1
[10608.263310] uhci_hcd 0000:01:04.2: irq 10, io base 0x00001040
[10608.269117] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
[10608.275895] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[10608.283102] usb usb1: Product: UHCI Host Controller
[10608.287967] usb usb1: Manufacturer: Linux 2.6.31-rc3 uhci_hcd
[10608.293699] usb usb1: SerialNumber: 0000:01:04.2
[10608.298448] usb usb1: configuration #1 chosen from 1 choice
[10608.304116] hub 1-0:1.0: USB hub found
[10608.307875] hub 1-0:1.0: 2 ports detected
[10608.311956] uhci_hcd 0000:01:04.3: UHCI Host Controller
[10608.317183] uhci_hcd 0000:01:04.3: new USB bus registered, assigned bus number 2
[10608.324592] uhci_hcd 0000:01:04.3: irq 11, io base 0x00001f80
[10608.330395] usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
[10608.337174] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[10608.344380] usb usb2: Product: UHCI Host Controller
[10608.349247] usb usb2: Manufacturer: Linux 2.6.31-rc3 uhci_hcd
[10608.354979] usb usb2: SerialNumber: 0000:01:04.3
[10608.371440] usb usb2: configuration #1 chosen from 1 choice
[10608.377094] hub 2-0:1.0: USB hub found
[10608.380852] hub 2-0:1.0: 2 ports detected
[10608.384978] Initializing USB Mass Storage driver...
[10608.390087] usbcore: registered new interface driver usb-storage
[10608.396091] USB Mass Storage support registered.
[10608.400779] usbcore: registered new interface driver libusual
[10608.406862] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[10608.413598] i2c /dev entries driver
[10608.417713] PowerPC Book-E Watchdog Timer Loaded
[10608.422416] EDAC MC: Ver: 2.1.0 Sep  9 2009
[10608.426783] Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
[10608.433772] mpc85xx_mc_err_probe: No ECC DIMMs discovered
[10608.439573] EDAC DEVICE0: Giving out device to module 'MPC85xx_edac' controller 'mpc85xx_l2_err': DEV 'mpc85xx_l2_err' (INTERRUPT)
[10608.451328] MPC85xx_edac acquired irq 25 for L2 Err
[10608.456196] MPC85xx_edac L2 err registered
[10608.460654] talitos e0030000.crypto: hwrng
[10608.464789] alg: No test for authenc(hmac(sha1),cbc(aes)) (authenc-hmac-sha1-cbc-aes-talitos)
[10608.473320] talitos e0030000.crypto: authenc-hmac-sha1-cbc-aes-talitos
[10608.479870] alg: No test for authenc(hmac(sha1),cbc(des3_ede)) (authenc-hmac-sha1-cbc-3des-talitos)
[10608.488949] talitos e0030000.crypto: authenc-hmac-sha1-cbc-3des-talitos
[10608.495579] alg: No test for authenc(hmac(sha256),cbc(aes)) (authenc-hmac-sha256-cbc-aes-talitos)
[10608.504453] talitos e0030000.crypto: authenc-hmac-sha256-cbc-aes-talitos
[10608.511183] alg: No test for authenc(hmac(sha256),cbc(des3_ede)) (authenc-hmac-sha256-cbc-3des-talitos)
[10608.520589] talitos e0030000.crypto: authenc-hmac-sha256-cbc-3des-talitos
[10608.527389] alg: No test for authenc(hmac(md5),cbc(aes)) (authenc-hmac-md5-cbc-aes-talitos)
[10608.535742] talitos e0030000.crypto: authenc-hmac-md5-cbc-aes-talitos
[10608.542195] alg: No test for authenc(hmac(md5),cbc(des3_ede)) (authenc-hmac-md5-cbc-3des-talitos)
[10608.551152] talitos e0030000.crypto: authenc-hmac-md5-cbc-3des-talitos
[10608.557704] talitos e0030000.crypto: cbc-aes-talitos
[10608.562695] talitos e0030000.crypto: cbc-3des-talitos
[10608.568027] usbcore: registered new interface driver usbhid
[10608.573596] usbhid: v2.6:USB HID core driver
[10608.577971] TCP cubic registered
[10608.581193] Initializing XFRM netlink socket
[10608.585467] NET: Registered protocol family 10
[10608.590174] IPv6 over IPv4 tunneling driver
[10608.594599] NET: Registered protocol family 17
[10608.599149] RPC: Registered udp transport module.
[10608.603851] RPC: Registered tcp transport module.
[10608.608545] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[10608.615319] All bugs added by David S. Miller <davem@redhat.com>
[10608.621716] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[10609.131199] ADDRCONF(NETDEV_UP): eth1: link is not ready
[10610.138790] IP-Config: Complete:
[10610.141831]      device=eth1, addr=10.9.130.212, mask=255.255.252.0, gw=10.9.128.1,
[10610.149413]      host=mpc8548cds, domain=, nis-domain=(none),
[10610.155149]      bootserver=10.9.128.106, rootserver=10.9.128.106, rootpath=
[10610.162538] Looking up port of RPC 100003/2 on 10.9.128.106
[10611.131063] PHY: mdio@e0024520:01 - Link is Up - 1000/Full
[10611.136579] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[10611.170949] Looking up port of RPC 100005/1 on 10.9.128.106
[10611.179067] VFS: Mounted root (nfs filesystem) on device 0:12.
[10611.184938] Freeing unused kernel memory: 184k init
[10611.192802] ------------[ cut here ]------------
[10611.197409] Kernel BUG at c0014d70 [verbose debug info unavailable]
[10611.203660] Oops: Exception in kernel mode, sig: 5 [#1]
[10611.208866] PREEMPT MPC85xx CDS
[10611.211997] Modules linked in:
[10611.215040] NIP: c0014d70 LR: c0014eb4 CTR: 00000002
[10611.219988] REGS: cf82db40 TRAP: 0700   Not tainted  (2.6.31-rc3)
[10611.226061] MSR: 00029000 <EE,ME,CE>  CR: 88448044  XER: 20000000
[10611.232162] TASK = cf828000[1] 'init' THREAD: cf82c000
[10611.237108] GPR00: 00000001 cf82dbf0 cf828000 cf9781c0 bf8031d8 cf9f400c 0057902f 00000001
[10611.245471] GPR08: cf978200 cf9f4000 00000002 00000000 28448042 1001b0b0 00000001 cf88ee00
[10611.253833] GPR16: c05c0000 bf8031d8 00000002 10000000 48000000 00000001 00000008 c05ecf20
[10611.262196] GPR24: 0057902b 0057902f cf82c000 00000000 cf9f400c 00000001 bf8031d8 cf98b000
[10611.270749] NIP [c0014d70] assert_pte_locked+0x3c/0x44
[10611.275872] LR [c0014eb4] ptep_set_access_flags+0xa8/0xf4
[10611.281252] Call Trace:
[10611.283687] [cf82dbf0] [bf8031d8] 0xbf8031d8 (unreliable)
[10611.289079] [cf82dc10] [c008e87c] do_wp_page+0xf8/0x82c
[10611.294292] [cf82dc60] [c0014770] do_page_fault+0x2c0/0x480
[10611.299851] [cf82dd10] [c0011078] handle_page_fault+0xc/0x80
[10611.305504] [cf82ddd0] [c00f2b4c] load_elf_binary+0x8a8/0x121c
[10611.311325] [cf82de50] [c00af418] search_binary_handler+0x144/0x37c
[10611.317578] [cf82dea0] [c00b0bc8] do_execve+0x270/0x2c8
[10611.322794] [cf82dee0] [c0008754] sys_execve+0x68/0xa4
[10611.327919] [cf82df00] [c0010c38] ret_from_syscall+0x0/0x3c
[10611.333482] [cf82dfc0] [c00b9350] sys_dup+0x38/0x78
[10611.338349] [cf82dfd0] [c0002030] init_post+0x94/0x108
[10611.343478] [cf82dfe0] [c054c234] kernel_init+0x114/0x130
[10611.348865] [cf82dff0] [c00109b8] kernel_thread+0x4c/0x68
[10611.354249] Instruction dump:
[10611.357206] 4d9e0020 38000000 0f000000 0f000000 81230024 5480653a 7c09002e 54090027
[10611.364959] 7c000026 54001ffe 0f000000 38000001 <0f000000> 4e800020 7c0802a6 9421fff0
[10611.372887] ---[ end trace 0cda2392272f221a ]---
[10611.377488] note: init[1] exited with preempt_count 2
[10611.382625] BUG: scheduling while atomic: init/1/0x10000003
[10611.388191] Modules linked in:
[10611.391241] Call Trace:
[10611.393678] [cf82d900] [c0007a70] show_stack+0x78/0x1a8 (unreliable)
[10611.400029] [cf82d930] [c0032194] __schedule_bug+0x68/0x6c
[10611.405511] [cf82d940] [c0460470] schedule+0x36c/0x3fc
[10611.410646] [cf82d990] [c0032988] __cond_resched+0x24/0x4c
[10611.416126] [cf82d9a0] [c0460604] _cond_resched+0x40/0x48
[10611.421520] [cf82d9b0] [c003c944] put_files_struct+0x10c/0x124
[10611.427345] [cf82d9d0] [c003e824] do_exit+0x598/0x62c
[10611.432390] [cf82da20] [c000e134] die+0x100/0x1a0
[10611.437088] [cf82da40] [c000e3f0] _exception+0xd8/0x1c8
[10611.442306] [cf82db30] [c0011240] ret_from_except_full+0x0/0x4c
[10611.448217] [cf82dbf0] [bf8031d8] 0xbf8031d8
[10611.452482] [cf82dc10] [c008e87c] do_wp_page+0xf8/0x82c
[10611.457700] [cf82dc60] [c0014770] do_page_fault+0x2c0/0x480
[10611.463266] [cf82dd10] [c0011078] handle_page_fault+0xc/0x80
[10611.468918] [cf82ddd0] [c00f2b4c] load_elf_binary+0x8a8/0x121c
[10611.474747] [cf82de50] [c00af418] search_binary_handler+0x144/0x37c
[10611.481004] [cf82dea0] [c00b0bc8] do_execve+0x270/0x2c8
[10611.486222] [cf82dee0] [c0008754] sys_execve+0x68/0xa4
[10611.491353] [cf82df00] [c0010c38] ret_from_syscall+0x0/0x3c
[10611.496918] [cf82dfc0] [c00b9350] sys_dup+0x38/0x78
[10611.501789] [cf82dfd0] [c0002030] init_post+0x94/0x108
[10611.506922] [cf82dfe0] [c054c234] kernel_init+0x114/0x130
[10611.512312] [cf82dff0] [c00109b8] kernel_thread+0x4c/0x68
[10611.517781] Kernel panic - not syncing: Attempted to kill init!
[10611.523693] Rebooting in 180 seconds..
[10621.378634] eth1: no IPv6 routers present
Comment 1 Andrew Morton 2009-09-11 20:09:55 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 9 Sep 2009 15:09:15 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=14148
> 
>            Summary: kernel panic: do_wp_page assert_pte_locked failed when
>                     DEBUG_VM
>            Product: Platform Specific/Hardware
>            Version: 2.5
>     Kernel Version: 2.6.31-rc3, 2.6.31-rc9-git2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PPC-32
>         AssignedTo: platform_ppc-32@kernel-bugs.osdl.org
>         ReportedBy: wangbj@lzu.edu.cn
>         Regression: Yes
> 
> 
> Created an attachment (id=23049)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=23049)
> problematic config file for mpc8548cds
> 
> powerpc mpc8548cds (I only have this board on hand) will kernel panic if
> DEBUG_VM (kernel hacking) is enabled due to assertion failed in function
> do_wp_page(). I think it highly possible for other ppc boards like 44x have
> the
> same problem too, but I don't have the board.
> 
> here is the full log from power up (after u-boot). and the attachment is
> related .config, NOTE the kernel boot successfully if CONFIG_DEBUG_VM is not
> enabled.
> 
> host system is gentoo, the gcc (powerpc-unknown-linux-gnu-gcc) is build by
> gentoo crossdev, version 4.4.1, (cross) glibc is 2.9, (cross) binutils is
> 2.19.1, (cross) kernel headers is 2.6.30. target (mpc8548cds) root filesystem
> is also gentoo (200907xx, extracted from stage3 tarball).
> 
> I have running similar test on x86 using qemu (0.10.6, +kvm), the result
> seems
> OK, especially x86 pass all lock api test suite.

First question:

> [10611.192802] ------------[ cut here ]------------
> [10611.197409] Kernel BUG at c0014d70 [verbose debug info unavailable]

Why did we not get the file-n-line?  That's iritating.

Oh, CONFIG_DEBUG_BUGVERBOSE=n.  Don't do that.  We should make that thing
harder to get at, to stop people shooting our feet off.

> [10611.203660] Oops: Exception in kernel mode, sig: 5 [#1]
> [10611.208866] PREEMPT MPC85xx CDS
> [10611.211997] Modules linked in:
> [10611.215040] NIP: c0014d70 LR: c0014eb4 CTR: 00000002
> [10611.219988] REGS: cf82db40 TRAP: 0700   Not tainted  (2.6.31-rc3)
> [10611.226061] MSR: 00029000 <EE,ME,CE>  CR: 88448044  XER: 20000000
> [10611.232162] TASK = cf828000[1] 'init' THREAD: cf82c000
> [10611.237108] GPR00: 00000001 cf82dbf0 cf828000 cf9781c0 bf8031d8 cf9f400c
> 0057902f 00000001
> [10611.245471] GPR08: cf978200 cf9f4000 00000002 00000000 28448042 1001b0b0
> 00000001 cf88ee00
> [10611.253833] GPR16: c05c0000 bf8031d8 00000002 10000000 48000000 00000001
> 00000008 c05ecf20
> [10611.262196] GPR24: 0057902b 0057902f cf82c000 00000000 cf9f400c 00000001
> bf8031d8 cf98b000
> [10611.270749] NIP [c0014d70] assert_pte_locked+0x3c/0x44
> [10611.275872] LR [c0014eb4] ptep_set_access_flags+0xa8/0xf4
> [10611.281252] Call Trace:
> [10611.283687] [cf82dbf0] [bf8031d8] 0xbf8031d8 (unreliable)
> [10611.289079] [cf82dc10] [c008e87c] do_wp_page+0xf8/0x82c
> [10611.294292] [cf82dc60] [c0014770] do_page_fault+0x2c0/0x480
> [10611.299851] [cf82dd10] [c0011078] handle_page_fault+0xc/0x80
> [10611.305504] [cf82ddd0] [c00f2b4c] load_elf_binary+0x8a8/0x121c
> [10611.311325] [cf82de50] [c00af418] search_binary_handler+0x144/0x37c
> [10611.317578] [cf82dea0] [c00b0bc8] do_execve+0x270/0x2c8
> [10611.322794] [cf82dee0] [c0008754] sys_execve+0x68/0xa4
> [10611.327919] [cf82df00] [c0010c38] ret_from_syscall+0x0/0x3c
> [10611.333482] [cf82dfc0] [c00b9350] sys_dup+0x38/0x78
> [10611.338349] [cf82dfd0] [c0002030] init_post+0x94/0x108
> [10611.343478] [cf82dfe0] [c054c234] kernel_init+0x114/0x130
> [10611.348865] [cf82dff0] [c00109b8] kernel_thread+0x4c/0x68
> [10611.354249] Instruction dump:
> [10611.357206] 4d9e0020 38000000 0f000000 0f000000 81230024 5480653a 7c09002e
> 54090027
> [10611.364959] 7c000026 54001ffe 0f000000 38000001 <0f000000> 4e800020
> 7c0802a6
> 9421fff0
> [10611.372887] ---[ end trace 0cda2392272f221a ]---

So do_wp_page() called ptep_set_access_flags().  If CONFIG_DEBUG_VM=y,
powerpc's ptep_set_access_flags() will call
arch/powerpc/mm/pgtable.c:assert_pte_locked().  Because of the lack of
file-n-line info it is unclear which of those many assertions
triggered.  It looks like BUG_ON(!pmd_present(*pmd)).  Perhaps.


Please set CONFIG_DEBUG_BUGVERBOSE=y in your .config and then tell us
(via emailed reply-to-all) which line in arch/powerpc/mm/pgtable.c
triggered the BUG.  Please actually quote that line, or tell us exactly
which kernel version you're using so we can see which line it was in
the source code.

Thanks.
Comment 2 Anonymous Emailer 2009-09-11 21:37:25 UTC
Reply-To: galak@kernel.crashing.org

On Sep 11, 2009, at 3:09 PM, Andrew Morton wrote:

>
> (switched to email.  Please respond via emailed reply-to-all, not  
> via the
> bugzilla web interface).
>
> On Wed, 9 Sep 2009 15:09:15 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=14148
>>
>>           Summary: kernel panic: do_wp_page assert_pte_locked  
>> failed when
>>                    DEBUG_VM
>>           Product: Platform Specific/Hardware
>>           Version: 2.5
>>    Kernel Version: 2.6.31-rc3, 2.6.31-rc9-git2
>>          Platform: All
>>        OS/Version: Linux
>>              Tree: Mainline
>>            Status: NEW
>>          Severity: normal
>>          Priority: P1
>>         Component: PPC-32
>>        AssignedTo: platform_ppc-32@kernel-bugs.osdl.org
>>        ReportedBy: wangbj@lzu.edu.cn
>>        Regression: Yes
>>
>>
>> Created an attachment (id=23049)
>> --> (http://bugzilla.kernel.org/attachment.cgi?id=23049)
>> problematic config file for mpc8548cds
>>
>> powerpc mpc8548cds (I only have this board on hand) will kernel  
>> panic if
>> DEBUG_VM (kernel hacking) is enabled due to assertion failed in  
>> function
>> do_wp_page(). I think it highly possible for other ppc boards like  
>> 44x have the
>> same problem too, but I don't have the board.
>>
>> here is the full log from power up (after u-boot). and the  
>> attachment is
>> related .config, NOTE the kernel boot successfully if  
>> CONFIG_DEBUG_VM is not
>> enabled.
>>
>> host system is gentoo, the gcc (powerpc-unknown-linux-gnu-gcc) is  
>> build by
>> gentoo crossdev, version 4.4.1, (cross) glibc is 2.9, (cross)  
>> binutils is
>> 2.19.1, (cross) kernel headers is 2.6.30. target (mpc8548cds) root  
>> filesystem
>> is also gentoo (200907xx, extracted from stage3 tarball).
>>
>> I have running similar test on x86 using qemu (0.10.6, +kvm), the  
>> result seems
>> OK, especially x86 pass all lock api test suite.
>
> First question:
>
>> [10611.192802] ------------[ cut here ]------------
>> [10611.197409] Kernel BUG at c0014d70 [verbose debug info  
>> unavailable]
>
> Why did we not get the file-n-line?  That's iritating.
>
> Oh, CONFIG_DEBUG_BUGVERBOSE=n.  Don't do that.  We should make that  
> thing
> harder to get at, to stop people shooting our feet off.
>
>> [10611.203660] Oops: Exception in kernel mode, sig: 5 [#1]
>> [10611.208866] PREEMPT MPC85xx CDS
>> [10611.211997] Modules linked in:
>> [10611.215040] NIP: c0014d70 LR: c0014eb4 CTR: 00000002
>> [10611.219988] REGS: cf82db40 TRAP: 0700   Not tainted  (2.6.31-rc3)
>> [10611.226061] MSR: 00029000 <EE,ME,CE>  CR: 88448044  XER: 20000000
>> [10611.232162] TASK = cf828000[1] 'init' THREAD: cf82c000
>> [10611.237108] GPR00: 00000001 cf82dbf0 cf828000 cf9781c0 bf8031d8  
>> cf9f400c
>> 0057902f 00000001
>> [10611.245471] GPR08: cf978200 cf9f4000 00000002 00000000 28448042  
>> 1001b0b0
>> 00000001 cf88ee00
>> [10611.253833] GPR16: c05c0000 bf8031d8 00000002 10000000 48000000  
>> 00000001
>> 00000008 c05ecf20
>> [10611.262196] GPR24: 0057902b 0057902f cf82c000 00000000 cf9f400c  
>> 00000001
>> bf8031d8 cf98b000
>> [10611.270749] NIP [c0014d70] assert_pte_locked+0x3c/0x44
>> [10611.275872] LR [c0014eb4] ptep_set_access_flags+0xa8/0xf4
>> [10611.281252] Call Trace:
>> [10611.283687] [cf82dbf0] [bf8031d8] 0xbf8031d8 (unreliable)
>> [10611.289079] [cf82dc10] [c008e87c] do_wp_page+0xf8/0x82c
>> [10611.294292] [cf82dc60] [c0014770] do_page_fault+0x2c0/0x480
>> [10611.299851] [cf82dd10] [c0011078] handle_page_fault+0xc/0x80
>> [10611.305504] [cf82ddd0] [c00f2b4c] load_elf_binary+0x8a8/0x121c
>> [10611.311325] [cf82de50] [c00af418] search_binary_handler 
>> +0x144/0x37c
>> [10611.317578] [cf82dea0] [c00b0bc8] do_execve+0x270/0x2c8
>> [10611.322794] [cf82dee0] [c0008754] sys_execve+0x68/0xa4
>> [10611.327919] [cf82df00] [c0010c38] ret_from_syscall+0x0/0x3c
>> [10611.333482] [cf82dfc0] [c00b9350] sys_dup+0x38/0x78
>> [10611.338349] [cf82dfd0] [c0002030] init_post+0x94/0x108
>> [10611.343478] [cf82dfe0] [c054c234] kernel_init+0x114/0x130
>> [10611.348865] [cf82dff0] [c00109b8] kernel_thread+0x4c/0x68
>> [10611.354249] Instruction dump:
>> [10611.357206] 4d9e0020 38000000 0f000000 0f000000 81230024  
>> 5480653a 7c09002e
>> 54090027
>> [10611.364959] 7c000026 54001ffe 0f000000 38000001 <0f000000>  
>> 4e800020 7c0802a6
>> 9421fff0
>> [10611.372887] ---[ end trace 0cda2392272f221a ]---
>
> So do_wp_page() called ptep_set_access_flags().  If CONFIG_DEBUG_VM=y,
> powerpc's ptep_set_access_flags() will call
> arch/powerpc/mm/pgtable.c:assert_pte_locked().  Because of the lack of
> file-n-line info it is unclear which of those many assertions
> triggered.  It looks like BUG_ON(!pmd_present(*pmd)).  Perhaps.
>
>
> Please set CONFIG_DEBUG_BUGVERBOSE=y in your .config and then tell us
> (via emailed reply-to-all) which line in arch/powerpc/mm/pgtable.c
> triggered the BUG.  Please actually quote that line, or tell us  
> exactly
> which kernel version you're using so we can see which line it was in
> the source code.
>
> Thanks.

I think I fixed this:

commit 797a747a82e23530ee45d2927bf84f3571c1acb2
Author: Kumar Gala <galak@kernel.crashing.org>
Date:   Tue Aug 18 15:21:40 2009 +0000

     powerpc/mm: Fix assert_pte_locked to work properly on uniprocessor

     Since the pte_lockptr is a spinlock it gets optimized away on
     uniprocessor builds so using spin_is_locked is not correct.  We  
can use
     assert_spin_locked instead and get the proper behavior between UP  
and
     SMP builds.

     Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
     Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

But the patch was queued up for .32 not .31

- k
Comment 3 Michael Ellerman 2009-09-15 06:48:18 UTC
On Fri, 2009-09-11 at 20:09 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> First question:
> 
> > [10611.192802] ------------[ cut here ]------------
> > [10611.197409] Kernel BUG at c0014d70 [verbose debug info unavailable]
> 
> Why did we not get the file-n-line?  That's iritating.
> 
> Oh, CONFIG_DEBUG_BUGVERBOSE=n.  Don't do that.  We should make that thing
> harder to get at, to stop people shooting our feet off.

Actually it's disabled in the defconfig (mpc85xx_defconfig I assume). 

It does add ~8K to the kernel, but that defconfig also has DEBUG_INFO=y
which surely makes any saving from non-verbose BUG() insignificant. So
it should just be enabled in the defconfig, right Kumar?

cheers