Bug 62421 - Kernel oops, data corruption after creating many XFS volumes on LVM enabled RAID array.
Summary: Kernel oops, data corruption after creating many XFS volumes on LVM enabled R...
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: ARM Linux
: P1 blocking
Assignee: XFS Guru
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-02 15:49 UTC by Dallas Clement
Modified: 2013-10-17 13:48 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.3.4
Tree: Mainline
Regression: No


Attachments

Description Dallas Clement 2013-10-02 15:49:10 UTC
I'm consistently seeing a kernel oops after creating a couple hundred 1GB volumes and formatting them XFS, with quota checking turned on.  This is with LVM enabled on a RAID array.  RAID level does not seem to matter.

The test system is running vanilla Linux 3.3.4 XFS code on an ARM 7 processor (Armada XP MV78260 - PJ4Bv7 Processor rev 2 v7l)

It looks like maybe the oops is happening during quota checking.

After this oops occurs, the filesystem is corrupted.  Subsequent reboots will typically result in a hard lockup in the Linux kernel when it tries to mount the filesystem.

XFS (md10): Mounting Filesystem
XFS (md10): Ending clean mount
XFS (md10): Quotacheck needed: Please wait.
XFS (md10): Quotacheck: Done.

XFS (dm-0): Mounting Filesystem
XFS (dm-0): Ending clean mount
XFS (dm-0): Quotacheck needed: Please wait.
XFS (dm-0): Quotacheck: Done.

...

XFS (dm-155): Mounting Filesystem
XFS (dm-155): Ending clean mount
XFS (dm-155): Quotacheck needed: Please wait.
XFS (dm-155): Quotacheck: Done.
XFS (dm-156): Mounting Filesystem
Unable to handle kernel paging request at virtual address b0111000
pgd = c98dc000
[b0111000] *pgd=00000000
Internal error: Oops: 15 [#1] SMP
Modules linked in: tcm_loop(O) iscsi_target_mod(O) target_core_pscsi(O) target_core_file(O) target_core_iblock(O) target_core_mod(PO) configfs usbhid usblp usb_storage usb_libusual uhci_hcd ohci_hcd ehci_hcd xhci_hcd usbcore usb_common
CPU: 0    Tainted: P           O  (3.3.4 #1)
PC is at vmap_page_range_noflush+0xe0/0x1b0
LR is at vmap_page_range_noflush+0x158/0x1b0
pc : [<c00cc7b8>]    lr : [<c00cc830>]    psr: 80000013
sp : d2559b20  ip : b0111000  fp : d2559b6c
r10: 1276165f  r9 : b0111000  r8 : fac01000
r7 : c06f66b4  r6 : 00000400  r5 : b0111004  r4 : fac00000
r3 : c09b6c40  r2 : e93a1000  r1 : 00000000  r0 : d2763ffc
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 098dc06a  DAC: 00000015
Process mount (pid: 25053, stack limit = 0xd25582f0)
Stack: (0xd2559b20 to 0xd255a000)
9b20: d2559b6c c0007eb0 fac00fff e93a1000 e93a0000 0000065f fac01000 00000000
9b40: d2559bec fa800000 00000003 c0727328 00000401 e93a0000 00000000 00000000
9b60: d2559bcc d2559b70 c00cccec c00cc6e4 000000d0 000002d0 d2559bb4 0000065f
9b80: e93a0000 c00d5c84 000202d0 00000401 00000000 00401000 00001004 00000000
9ba0: d2559d58 cda93780 00000003 c0727328 00000401 e93a0000 00000000 00000000
9bc0: d2559bec d2559bd0 c02564d4 c00cc918 00100000 cda93780 00000401 00001000
9be0: d2559c2c d2559bf0 c0256a50 c0256484 00401000 00000000 7fffffff ce6db000
9c00: cda93388 d1e7c200 c02985ac d1e7c200 00000001 00002000 d2559d58 00001000
9c20: d2559c4c d2559c30 c0296828 c025698c 00100100 00200200 00000008 00000008
9c40: d2559cb4 d2559c50 c02985ac c02967cc 00000000 00000001 d2559c74 d2559c68
9c60: c04ed328 c04ed17c 00000000 d2559c78 c0257658 c04ed31c 00000000 00001008
9c80: 00000008 d2559c90 c0296908 d1e7c200 00000008 00000000 00000001 00000001
9ca0: d2559d58 00001000 d2559ce4 d2559cb8 c0298874 c0298554 00000001 00000000
9cc0: 00000000 d1e7c200 00000008 00000000 d1e7c2c0 00000001 d2559d4c d2559ce8
9ce0: c0298ca4 c0298750 00000001 cda93300 d2559d1c c0071d1c cbbcc000 00000001
9d00: 00000000 d2559d60 d1e7c2a0 cda93300 00000008 00000000 d1c553c0 d258b200
9d20: cbbcc000 00000400 00000000 d1e7c200 00000000 00428100 00000000 0000000f
9d40: d2559d84 d2559d50 c029a418 c029889c d2559d84 d2559d60 00000008 00000000
9d60: 00000008 00000000 ce6db000 00000400 00000000 d1f4d000 d2559db4 d2559d88
9d80: c02a1728 c029a404 00005000 00000000 00005000 d1f4d000 00000003 00000007
9da0: 0001fe00 00000000 d2559e14 d2559db8 c029d044 c02a1650 00005000 d2559dc8
9dc0: c026a5c0 c026a5d8 c98439c0 c026538c d2559e04 d2559de0 c0263394 c005a378
9de0: 00000000 00000000 00000000 d1f4d000 d1f4ec00 00000000 00000040 d25a0000
9e00: c026522c d25a0000 d2559e3c d2559e18 c026538c c029cd28 0000000c f2bb70e0
9e20: 00000083 00008000 f2bb7150 d1f4ec00 d2559e94 d2559e40 c00df3bc c0265238
9e40: 00000000 312d6d64 d2003635 d2559e58 00000004 d2559e60 000050ec c00d5c84
9e60: cd017c22 271aee18 00000005 f4a093c0 c0711678 00008000 d25a0000 c0711678
9e80: d2558000 cc4983c0 d2559eac d2559e98 c0263710 c00df294 c026522c c00be250
9ea0: d2559ed4 d2559eb0 c00dfff4 c0263700 c0711678 cd017c00 00008000 f4a093c0
9ec0: cd017c00 00008000 d2559efc d2559ed8 c00f7698 c00dffe4 d25a0000 c0711678
9ee0: cc4983c0 d25a0000 cd017c00 00008000 d2559f24 d2559f00 c00f7eb4 c00f7650
9f00: d25a0000 00000008 cd017c00 00008400 00008000 d25a0000 d2559f6c d2559f28
9f20: c00f980c c00f7e84 f4a08290 f2aaf800 d2559f38 d2559f48 c00af0dc c00f9144
9f40: 20000013 c912f000 bef3feff 00008400 00000000 c000dfe8 d2558000 00000000
9f60: d2559fa4 d2559f70 c00f9a2c c00f91fc d25a0000 00000000 524b7722 cc4983c0
9f80: cd017c00 d25a0000 000bc048 000bc048 bef3feff 00000015 00000000 d2559fa8
9fa0: c000de20 c00f99ac 000bc048 000bc048 bef3feff bef3ff21 bef3fede 00008400
9fc0: 000bc048 000bc048 bef3feff 00000015 bef3ff21 00000000 bef3fede 00000000
9fe0: 00008400 bef3fac4 0004fba8 b6e80374 60000010 bef3feff 00000000 00000000
Backtrace: 
[<c00cc6d8>] (vmap_page_range_noflush+0x0/0x1b0) from [<c00cccec>] (vm_map_ram+0x3e0/0x454)
[<c00cc90c>] (vm_map_ram+0x0/0x454) from [<c02564d4>] (_xfs_buf_map_pages+0x5c/0xac)
[<c0256478>] (_xfs_buf_map_pages+0x0/0xac) from [<c0256a50>] (xfs_buf_get_uncached+0xd0/0x14c)
 r6:00001000 r5:00000401 r4:cda93780 r3:00100000
[<c0256980>] (xfs_buf_get_uncached+0x0/0x14c) from [<c0296828>] (xlog_get_bp+0x68/0xbc)
[<c02967c0>] (xlog_get_bp+0x0/0xbc) from [<c02985ac>] (xlog_write_log_records+0x64/0x1fc)
 r5:00000008 r4:00000008
[<c0298548>] (xlog_write_log_records+0x0/0x1fc) from [<c0298874>] (xlog_clear_stale_blocks+0x130/0x14c)
[<c0298744>] (xlog_clear_stale_blocks+0x0/0x14c) from [<c0298ca4>] (xlog_find_tail+0x414/0x484)
[<c0298890>] (xlog_find_tail+0x0/0x484) from [<c029a418>] (xlog_recover+0x20/0xa0)
[<c029a3f8>] (xlog_recover+0x0/0xa0) from [<c02a1728>] (xfs_log_mount+0xe4/0x174)
 r6:d1f4d000 r5:00000000 r4:00000400
[<c02a1644>] (xfs_log_mount+0x0/0x174) from [<c029d044>] (xfs_mountfs+0x328/0x610)
 r9:00000000 r8:0001fe00 r7:00000007 r6:00000003 r5:d1f4d000
r4:00005000
[<c029cd1c>] (xfs_mountfs+0x0/0x610) from [<c026538c>] (xfs_fs_fill_super+0x160/0x258)
[<c026522c>] (xfs_fs_fill_super+0x0/0x258) from [<c00df3bc>] (mount_bdev+0x134/0x1d8)
 r8:d1f4ec00 r7:f2bb7150 r6:00008000 r5:00000083 r4:f2bb70e0
r3:0000000c
[<c00df288>] (mount_bdev+0x0/0x1d8) from [<c0263710>] (xfs_fs_mount+0x1c/0x28)
[<c02636f4>] (xfs_fs_mount+0x0/0x28) from [<c00dfff4>] (mount_fs+0x1c/0xc0)
[<c00dffd8>] (mount_fs+0x0/0xc0) from [<c00f7698>] (vfs_kern_mount+0x54/0xc0)
 r6:00008000 r5:cd017c00 r4:f4a093c0
[<c00f7644>] (vfs_kern_mount+0x0/0xc0) from [<c00f7eb4>] (do_kern_mount+0x3c/0xdc)
 r8:00008000 r7:cd017c00 r6:d25a0000 r5:cc4983c0 r4:c0711678
r3:d25a0000
[<c00f7e78>] (do_kern_mount+0x0/0xdc) from [<c00f980c>] (do_mount+0x61c/0x698)
 r8:d25a0000 r7:00008000 r6:00008400 r5:cd017c00 r4:00000008
r3:d25a0000
[<c00f91f0>] (do_mount+0x0/0x698) from [<c00f9a2c>] (sys_mount+0x8c/0xcc)
[<c00f99a0>] (sys_mount+0x0/0xcc) from [<c000de20>] (ret_fast_syscall+0x0/0x30)
 r7:00000015 r6:bef3feff r5:000bc048 r4:000bc048
Code: e1a09005 e51b1030 e2855004 e7923001 (e5992000) 
---[ end trace 32ec378b8e6c443e ]---
Comment 1 Eric Sandeen 2013-10-07 02:26:43 UTC
Can you reproduce it on x86 on a 3.3.4 kernel?  If so, can you reproduce it on x86 on a 3.11 kernel?  If not, perhaps whatever this is is fixed upstream...

The oops is post-quotacheck, FWIW.
Comment 2 Dallas Clement 2013-10-17 13:48:24 UTC
This issue has been resolved by upgrading the Marvell patches for this processor.

Note You need to log in before you can comment on or make changes to this bug.