Bug 9272

Summary: kernel BUG at drivers/md/raid5.c:143
Product: IO/Storage Reporter: Puzin, Dimitri (bugs)
Component: MDAssignee: Neil Brown (neilb)
Severity: blocking    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc1-git10 Tree: Mainline
Subsystem: Regression: ---
Bug Depends on:    
Bug Blocks: 9243    
Attachments: corrective patch

Description Puzin, Dimitri 2007-11-01 05:03:27 UTC
Most recent kernel where this bug did not occur:

Distribution: Debian Etch 4.0

Hardware Environment:
Intel P-III-800 EB (x86_32)
512 MB RAM
AIC7xxx SCSI Controller for RAID-1 RootFS
sata_sil Controller (3114) with 4x 300GB SATA Disks as RAID-5

Software Environment:
mdadm version 2.5.6

Problem Description:
shortly after the raid5 array is initialized the kernel stops with
kernel BUG at drivers/md/raid5.c:143!
invalid opcode: 0000 [#1]
Modules linked in: xfs tcp_cubic raid456 async_xor async_memcpy async_tx xor deadline_iosched w83781d hwmon_vid hwmon eeprom netconsole configfs e1000 3c59x mii sata_sil button ata_piix uhci_hcd parport_pc i2c_i801 8250_pnp 8250 serial_core parport evdev usbcore libata i2c_core rng_core intel_agp agpgart pcspkr rtc sg st sr_mod cdrom ch dm_mirror dm_snapshot thermal processor fan unix ext3 jbd mbcache raid1 md_mod sd_mod dm_mod aic7xxx scsi_transport_spi scsi_mod

Pid: 1666, comm: md2_raid5 Not tainted (2.6.24-rc1-git10 #1)
EIP: 0060:[<e0b019e9>] EFLAGS: 00010002 CPU: 0
EIP is at __release_stripe+0x9f/0x121 [raid456]
EAX: 00000000 EBX: c1a0d208 ECX: c1a0d210 EDX: c1a0d208
ESI: c1b4bb00 EDI: c28f6fcc EBP: 00000000 ESP: c28f6f48
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process md2_raid5 (pid: 1666, ti=c28f6000 task=c1952530 task.ti=c28f6000)
Stack: 7fffffff 00000282 7fffffff e0b01a79 c1a0d208 e0b06390 7fffffff 7fffffff
       c28f6fcc 00000000 c0235242 c1afbc00 c1b4bb00 00000003 00000001 c02fa6a0
       00000d94 c02fa730 00000000 c02fa640 c1952530 00000008 00000000 c1a0f1e0
Call Trace:
 [<e0b01a79>] release_stripe+0xe/0x12 [raid456]
 [<e0b06390>] raid5d+0x33c/0x341 [raid456]
 [<c0235242>] schedule_timeout+0x13/0x8b
 [<e08bc69d>] md_thread+0xb5/0xcb [md_mod]
 [<c0124a81>] autoremove_wake_function+0x0/0x35
 [<e08bc5e8>] md_thread+0x0/0xcb [md_mod]
 [<c012491d>] kthread+0x36/0x5d
 [<c01248e7>] kthread+0x0/0x5d
 [<c01046b7>] kernel_thread_helper+0x7/0x10
Code: e8 30 dc 68 df e9 8e 00 00 00 0f ba 73 20 08 8d 46 38 8b 48 04 8d 53 08 89 50 04 89 43 08 89 4a 04 89 11 eb 73 83 7a 30 00 74 04 <0f> 0b eb fe 0f ba 72 20 05 19 c0 85 c0 74 17 ff 4e 58 83 7e 58
EIP: [<e0b019e9>] __release_stripe+0x9f/0x121 [raid456] SS:ESP 0068:c28f6f48

Steps to reproduce:
* Build current -rc Kernel
* reboot
* assemble an raid5 array
* mount the volume from that array
* try to read/write data
It will crash here a second or two after mounting, but during my tests I've seen it sometimes advances past that point and dies when data is written.
Comment 1 Neil Brown 2007-11-01 18:30:52 UTC
It looks like a patch was misapplied.
In particular commit 4ae3f847e49e3787eca91bced31f8fd328d50496
applied a chunk to handle_stripe6 that should have been applied
to handle_stripe5.

I'll attach a patch with plenty of context which fixes it up.

Thanks for the report.
Comment 2 Neil Brown 2007-11-01 18:31:21 UTC
Created attachment 13368 [details]
corrective patch
Comment 3 Puzin, Dimitri 2007-11-02 06:33:47 UTC
It works for me. Thanks for the quick resolution!
Comment 4 Rafael J. Wysocki 2007-11-07 16:15:43 UTC
Fixed by:

commit def6ae26a9e69c3e6d0f0054524c76fd32420ecd
Author: Neil Brown <neilb@suse.de>
Commit: Linus Torvalds <torvalds@woody.linux-foundation.org>

    md: fix misapplied patch in raid5.c