Most recent kernel where this bug did not occur: 2.6.23.1 Distribution: Debian Etch 4.0 Hardware Environment: Intel P-III-800 EB (x86_32) 512 MB RAM AIC7xxx SCSI Controller for RAID-1 RootFS sata_sil Controller (3114) with 4x 300GB SATA Disks as RAID-5 Software Environment: mdadm version 2.5.6 Problem Description: shortly after the raid5 array is initialized the kernel stops with kernel BUG at drivers/md/raid5.c:143! invalid opcode: 0000 [#1] Modules linked in: xfs tcp_cubic raid456 async_xor async_memcpy async_tx xor deadline_iosched w83781d hwmon_vid hwmon eeprom netconsole configfs e1000 3c59x mii sata_sil button ata_piix uhci_hcd parport_pc i2c_i801 8250_pnp 8250 serial_core parport evdev usbcore libata i2c_core rng_core intel_agp agpgart pcspkr rtc sg st sr_mod cdrom ch dm_mirror dm_snapshot thermal processor fan unix ext3 jbd mbcache raid1 md_mod sd_mod dm_mod aic7xxx scsi_transport_spi scsi_mod Pid: 1666, comm: md2_raid5 Not tainted (2.6.24-rc1-git10 #1) EIP: 0060:[<e0b019e9>] EFLAGS: 00010002 CPU: 0 EIP is at __release_stripe+0x9f/0x121 [raid456] EAX: 00000000 EBX: c1a0d208 ECX: c1a0d210 EDX: c1a0d208 ESI: c1b4bb00 EDI: c28f6fcc EBP: 00000000 ESP: c28f6f48 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process md2_raid5 (pid: 1666, ti=c28f6000 task=c1952530 task.ti=c28f6000) Stack: 7fffffff 00000282 7fffffff e0b01a79 c1a0d208 e0b06390 7fffffff 7fffffff c28f6fcc 00000000 c0235242 c1afbc00 c1b4bb00 00000003 00000001 c02fa6a0 00000d94 c02fa730 00000000 c02fa640 c1952530 00000008 00000000 c1a0f1e0 Call Trace: [<e0b01a79>] release_stripe+0xe/0x12 [raid456] [<e0b06390>] raid5d+0x33c/0x341 [raid456] [<c0235242>] schedule_timeout+0x13/0x8b [<e08bc69d>] md_thread+0xb5/0xcb [md_mod] [<c0124a81>] autoremove_wake_function+0x0/0x35 [<e08bc5e8>] md_thread+0x0/0xcb [md_mod] [<c012491d>] kthread+0x36/0x5d [<c01248e7>] kthread+0x0/0x5d [<c01046b7>] kernel_thread_helper+0x7/0x10 ======================= Code: e8 30 dc 68 df e9 8e 00 00 00 0f ba 73 20 08 8d 46 38 8b 48 04 8d 53 08 89 50 04 89 43 08 89 4a 04 89 11 eb 73 83 7a 30 00 74 04 <0f> 0b eb fe 0f ba 72 20 05 19 c0 85 c0 74 17 ff 4e 58 83 7e 58 EIP: [<e0b019e9>] __release_stripe+0x9f/0x121 [raid456] SS:ESP 0068:c28f6f48 Steps to reproduce: * Build current -rc Kernel * reboot * assemble an raid5 array * mount the volume from that array * try to read/write data It will crash here a second or two after mounting, but during my tests I've seen it sometimes advances past that point and dies when data is written.
It looks like a patch was misapplied. In particular commit 4ae3f847e49e3787eca91bced31f8fd328d50496 applied a chunk to handle_stripe6 that should have been applied to handle_stripe5. I'll attach a patch with plenty of context which fixes it up. Thanks for the report.
Created attachment 13368 [details] corrective patch
It works for me. Thanks for the quick resolution!
Fixed by: commit def6ae26a9e69c3e6d0f0054524c76fd32420ecd Author: Neil Brown <neilb@suse.de> Commit: Linus Torvalds <torvalds@woody.linux-foundation.org> md: fix misapplied patch in raid5.c http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=def6ae26a9e69c3e6d0f0054524c76fd32420ecd