This bug occurs when writing images with partimage to device /dev/sdp* Hardware is Intel P4 based, two SATA disks in mirroring configuration. System is a minimal system that runs in an initrd environment. After writing exactly 360 MB (+/- 0.25 MB) the partimage process goes in D state and we find an oops in the log. It looks very similar to a problem reported 10 months ago: https://www.redhat.com/archives/dm-devel/2005-February/msg00039.html ksymoops 2.4.9 on i686 2.6.14.2-bapdev. Options used -V (default) -K (specified) -l lsmod (specified) -o /usr/src/lxboot-swfill/lxboot-swfill-V2.0.10-current/root/lib/modules/2.6.15-rc5-swfill (specified) -m /usr/src/linux-2.6.15-rc5-swfill/System.map (specified) No modules in ksyms, skipping objects No ksyms, skipping lsmod Unable to handle kernel paging request at virtual address f88c6000 c01793e5 *pde = 1fc61067 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<c01793e5>] Not tainted VLI Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 (2.6.15-rc5-swfill) eax: ffffffff ebx: f88c6000 ecx: 07fffdfa edx: 00000000 esi: 00020000 edi: f88c6000 ebp: ffffbf27 esp: f7febec0 ds: 007b es: 007b ss: 0068 Stack: 00000000 00000000 f6d83180 f7febefc f6d4158c f7fcfac0 c01cf8c6 f88c2000 0001bf27 00020000 f6d4158c f6d41580 c01d00ad f687a4e0 f7febefc 0001ffff f6d4158c f6d41580 f687a4e0 c01d0137 f6d4158c f6d41580 c01d0363 f6d4158c Call Trace: [<c01cf8c6>] [<c01d00ad>] [<c01d0137>] [<c01d0363>] [<c01d06ff>] [<c01d0749>] [<c011dd81>] [<c01d071a>] [<c010fce9>] [<c0215d5c>] [<c010fce9>] [<c011dc2b>] [<c0120c5e>] [<c0120be5>] [<c01012b1>] Code: f0 eb 49 89 44 24 04 83 c7 04 c7 04 24 00 00 00 00 89 f8 29 d8 c1 e0 03 29 c5 74 26 8d 4d 1f 89 fb c1 e9 05 b8 ff ff ff ff 31 d2 <f3> af 74 09 33 47 fc 83 ef 04 0f bc d0 29 df c1 e7 03 01 fa 89 >>EIP; c01793e5 <find_next_zero_bit+76/a1> <===== >>eax; ffffffff <__kernel_rt_sigreturn+1bbf/????> >>ebx; f88c6000 <pg0+38611000/3fd49400> >>ecx; 07fffdfa <phys_startup_32+7effdfa/c0000000> >>edi; f88c6000 <pg0+38611000/3fd49400> >>ebp; ffffbf27 <pg0+3fd46f27/3fd49400> >>esp; f7febec0 <pg0+37d36ec0/3fd49400> Trace; c01cf8c6 <core_get_resync_work+27/55> Trace; c01d00ad <__rh_recovery_prepare+14/84> Trace; c01d0137 <rh_recovery_prepare+1a/3b> Trace; c01d0363 <do_recovery+13/64> Trace; c01d06ff <do_mirror+4e/69> Trace; c01d0749 <do_work+2f/58> Trace; c011dd81 <worker_thread+156/1ba> Trace; c01d071a <do_work+0/58> Trace; c010fce9 <default_wake_function+0/12> Trace; c0215d5c <schedule+456/4bb> Trace; c010fce9 <default_wake_function+0/12> Trace; c011dc2b <worker_thread+0/1ba> Trace; c0120c5e <kthread+79/a3> Trace; c0120be5 <kthread+0/a3> Trace; c01012b1 <kernel_thread_helper+5/b> This architecture has variable length instructions, decoding before eip is unreliable, take these instructions with a pinch of salt. Code; c01793ba <find_next_zero_bit+4b/a1> 00000000 <_EIP>: Code; c01793ba <find_next_zero_bit+4b/a1> 0: f0 eb 49 lock jmp 4c <_EIP+0x4c> Code; c01793bd <find_next_zero_bit+4e/a1> 3: 89 44 24 04 mov %eax,0x4(%esp) Code; c01793c1 <find_next_zero_bit+52/a1> 7: 83 c7 04 add $0x4,%edi Code; c01793c4 <find_next_zero_bit+55/a1> a: c7 04 24 00 00 00 00 movl $0x0,(%esp) Code; c01793cb <find_next_zero_bit+5c/a1> 11: 89 f8 mov %edi,%eax Code; c01793cd <find_next_zero_bit+5e/a1> 13: 29 d8 sub %ebx,%eax Code; c01793cf <find_next_zero_bit+60/a1> 15: c1 e0 03 shl $0x3,%eax Code; c01793d2 <find_next_zero_bit+63/a1> 18: 29 c5 sub %eax,%ebp Code; c01793d4 <find_next_zero_bit+65/a1> 1a: 74 26 je 42 <_EIP+0x42> Code; c01793d6 <find_next_zero_bit+67/a1> 1c: 8d 4d 1f lea 0x1f(%ebp),%ecx Code; c01793d9 <find_next_zero_bit+6a/a1> 1f: 89 fb mov %edi,%ebx Code; c01793db <find_next_zero_bit+6c/a1> 21: c1 e9 05 shr $0x5,%ecx Code; c01793de <find_next_zero_bit+6f/a1> 24: b8 ff ff ff ff mov $0xffffffff,%eax Code; c01793e3 <find_next_zero_bit+74/a1> 29: 31 d2 xor %edx,%edx This decode from eip onwards should be reliable Code; c01793e5 <find_next_zero_bit+76/a1> 00000000 <_EIP>: Code; c01793e5 <find_next_zero_bit+76/a1> <===== 0: f3 af repz scas %es:(%edi),%eax <===== Code; c01793e7 <find_next_zero_bit+78/a1> 2: 74 09 je d <_EIP+0xd> Code; c01793e9 <find_next_zero_bit+7a/a1> 4: 33 47 fc xor 0xfffffffc(%edi),%eax Code; c01793ec <find_next_zero_bit+7d/a1> 7: 83 ef 04 sub $0x4,%edi Code; c01793ef <find_next_zero_bit+80/a1> a: 0f bc d0 bsf %eax,%edx Code; c01793f2 <find_next_zero_bit+83/a1> d: 29 df sub %ebx,%edi Code; c01793f4 <find_next_zero_bit+85/a1> f: c1 e7 03 shl $0x3,%edi Code; c01793f7 <find_next_zero_bit+88/a1> 12: 01 fa add %edi,%edx Code; c01793f9 <find_next_zero_bit+8a/a1> 14: 89 .byte 0x89 Interrupts: CPU0 0: 78885 IO-APIC-edge timer 1: 554 IO-APIC-edge i8042 9: 0 IO-APIC-level acpi 14: 21 IO-APIC-edge ide0 16: 375024 IO-APIC-level eth0 19: 35 IO-APIC-level libata 20: 0 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb2 21: 0 IO-APIC-level uhci_hcd:usb3 22: 0 IO-APIC-level uhci_hcd:usb4 23: 0 IO-APIC-level uhci_hcd:usb5 NMI: 0 LOC: 78843 ERR: 0 MIS: 0 /proc/iomem: 00000000-000987ff : System RAM 00098800-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000cb000-000cbfff : Adapter ROM 000cc000-000d11ff : Adapter ROM 000f0000-000fffff : System ROM 00100000-3f6dffff : System RAM 00100000-00217be0 : Kernel code 00217be1-0026fdd3 : Kernel data 3f6e0000-3f6ecfff : ACPI Tables 3f6ed000-3f6fffff : ACPI Non-volatile Storage 3f700000-3fffffff : reserved e0000000-efffffff : 0000:00:02.0 f0000000-f0003fff : 0000:00:1b.0 f0004000-f00043ff : 0000:00:1d.7 f0004000-f00043ff : ehci_hcd f0004400-f00047ff : 0000:00:1f.2 f0004400-f00047ff : ahci f0040000-f007ffff : 0000:00:02.0 f0080000-f00fffff : 0000:00:02.0 f0100000-f01fffff : PCI Bus #05 f0100000-f010ffff : 0000:05:00.0 f0100000-f010ffff : tg3 fec00000-fecfffff : reserved fee00000-feefffff : reserved ffb80000-ffffffff : reserved Module Size Used by Not tainted ahci 5988 2 - Live 0xf885a000 ata_piix 3716 0 - Live 0xf8802000 libata 30796 2 ahci,ata_piix, Live 0xf88a5000 sr_mod 9412 0 - Live 0xf887b000 sd_mod 8240 2 - Live 0xf88a1000 usb_storage 18980 0 - Live 0xf88ae000 scsi_mod 47720 5 ahci,libata,sr_mod,sd_mod,usb_storage, Live 0xf884d000 ide_floppy 10400 0 - Live 0xf8844000 ide_cd 27908 0 - Live 0xf8899000 cdrom 27264 2 sr_mod,ide_cd, Live 0xf8891000 ide_disk 8800 0 - Live 0xf8849000 floppy 40676 0 - Live 0xf8870000 tg3 72772 0 - Live 0xf885d000 crc32 2912 0 - Live 0xf8842000 mii 2528 0 - Live 0xf8840000 uhci_hcd 20976 0 - Live 0xf8812000 ohci_hcd 12036 0 - Live 0xf8828000 ehci_hcd 20520 0 - Live 0xf8821000 usbmouse 2272 0 - Live 0xf880e000 usbkbd 3552 0 - Live 0xf8805000 usbhid 26272 0 - Live 0xf8819000 usbcore 70692 8 usb_storage,uhci_hcd,ohci_hcd,ehci_hcd,usbmouse,usbkbd,usbhid, Live 0xf882d000 lspci -vv: 0000:00:00.0 0600: 8086:2770 (rev 02) Subsystem: 1734:1084 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 0 Capabilities: [e0] #09 [5109] 0000:00:02.0 0300: 8086:2772 (rev 02) Subsystem: 1734:1084 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: Memory at f0080000 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at 2000 [size=8] Region 2: Memory at e0000000 (32-bit, prefetchable) [size=256M] Region 3: Memory at f0040000 (32-bit, non-prefetchable) [size=256K] Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Address: 00000000 Data: 0000 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 0000:00:1b.0 0403: 8086:27d8 (rev 01) Subsystem: 1734:1083 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 0x08 (32 bytes) Interrupt: pin A routed to IRQ 9 Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [70] #10 [0091] 0000:00:1c.0 0604: 8086:27d0 (rev 01) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 0x08 (32 bytes) Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- Capabilities: [40] #10 [0141] Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Address: 00000000 Data: 0000 Capabilities: [90] #0d [0000] Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 0000:00:1c.1 0604: 8086:27d2 (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 0x08 (32 bytes) Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 Memory behind bridge: f0100000-f01fffff BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [40] #10 [0141] Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Address: 00000000 Data: 0000 Capabilities: [90] #0d [0000] Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 0000:00:1c.2 0604: 8086:27d4 (rev 01) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 0x08 (32 bytes) Bus: primary=00, secondary=07, subordinate=07, sec-latency=0 BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- Capabilities: [40] #10 [0141] Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Address: 00000000 Data: 0000 Capabilities: [90] #0d [0000] Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 0000:00:1c.3 0604: 8086:27d6 (rev 01) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 0x08 (32 bytes) Bus: primary=00, secondary=09, subordinate=09, sec-latency=0 BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- Capabilities: [40] #10 [0141] Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Address: 00000000 Data: 0000 Capabilities: [90] #0d [0000] Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 0000:00:1d.0 0c03: 8086:27c8 (rev 01) Subsystem: 1734:1085 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 20 Region 4: I/O ports at 2400 [size=32] 0000:00:1d.1 0c03: 8086:27c9 (rev 01) Subsystem: 1734:1085 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin B routed to IRQ 21 Region 4: I/O ports at 2800 [size=32] 0000:00:1d.2 0c03: 8086:27ca (rev 01) Subsystem: 1734:1085 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin C routed to IRQ 22 Region 4: I/O ports at 2c00 [size=32] 0000:00:1d.3 0c03: 8086:27cb (rev 01) Subsystem: 1734:1085 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin D routed to IRQ 23 Region 4: I/O ports at 3000 [size=32] 0000:00:1d.7 0c03: 8086:27cc (rev 01) (prog-if 20) Subsystem: 1734:1085 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 20 Region 0: Memory at f0004000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 0000:00:1e.0 0604: 8086:244e (rev e1) (prog-if 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Bus: primary=00, secondary=0b, subordinate=0b, sec-latency=0 BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- Capabilities: [50] #0d [0000] 0000:00:1f.0 0601: 8086:27b8 (rev 01) Subsystem: 1734:1085 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Capabilities: [e0] #09 [100c] 0000:00:1f.1 0101: 8086:27df (rev 01) (prog-if 8a [Master SecP PriP]) Subsystem: 1734:1085 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 18 Region 0: I/O ports at <unassigned> Region 1: I/O ports at <unassigned> Region 2: I/O ports at <unassigned> Region 3: I/O ports at <unassigned> Region 4: I/O ports at 3820 [size=16] 0000:00:1f.2 0104: 8086:27c3 (rev 01) Subsystem: 1734:1085 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin B routed to IRQ 19 Region 0: I/O ports at 3840 [size=8] Region 1: I/O ports at 3834 [size=4] Region 2: I/O ports at 3838 [size=8] Region 3: I/O ports at 3830 [size=4] Region 4: I/O ports at 3800 [size=32] Region 5: Memory at f0004400 (32-bit, non-prefetchable) [size=1K] Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Address: 00000000 Data: 0000 Capabilities: [70] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a8] #12 [0010] 0000:00:1f.3 0c05: 8086:27da (rev 01) Subsystem: 1734:1085 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin B routed to IRQ 9 Region 4: I/O ports at 3400 [size=32] 0000:05:00.0 0200: 14e4:1677 (rev 01) Subsystem: 1734:1096 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 0x08 (32 bytes) Interrupt: pin A routed to IRQ 16 Region 0: Memory at f0100000 (64-bit, non-prefetchable) [size=64K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Address: 6600080004090000 Data: 0013 Capabilities: [d0] #10 [0001] /proc/partitions: major minor #blocks name 8 0 117220824 sda 8 1 15719571 sda1 8 2 1 sda2 8 5 101496087 sda5 8 16 117220824 sdb 8 17 15719571 sdb1 8 18 1 sdb2 8 21 101496087 sdb5 254 0 117218304 sdp 254 1 15719571 sdp1 254 5 101496087 sdp5 ps: PID Uid VmSize Stat Command 1 root 544 S init 2 root SWN [ksoftirqd/0] 3 root SW< [events/0] 4 root SW< [khelper] 5 root SW< [kthread] 7 root SW< [kblockd/0] 8 root SW< [kacpid] 126 root SW [pdflush] 127 root SW [pdflush] 129 root SW< [aio/0] 128 root SW [kswapd0] 200 root SW< [kseriod] 251 root 432 S syslogd 253 root 428 S klogd -c 1 261 root 1180 S /bin/bash /opt/sbin/run 262 root 1440 R /bin/bash -login 265 root 1400 S /bin/bash -login 266 root 1404 S /bin/bash -login 267 root 1404 S /bin/bash -login 268 root 1396 S /bin/bash -login 310 root SW< [khubd] 437 root 156 S udhcpc -s /etc/udhcpc/default.script -i eth0 441 root 880 S linuxvnc 1 520 root SW< [ata/0] 535 root SW< [scsi_eh_0] 536 root SW< [scsi_eh_1] 537 root SW< [scsi_eh_2] 538 root SW< [scsi_eh_3] 541 root 424 S dropbear 715 root 1328 S /bin/sh /opt/sbin/swfill 737 root 1008 S smbmount //172.16.1.115/runin /mnt -o username Admini 739 root SW [smbiod] 794 root SW< [kcopyd] 799 root 752 S /bin/sh /opt/sbin/swfill 802 root 1620 D partimage restore -f3 -b -g1 /dev/sdp1 /mnt/images/a5 803 root 1620 S partimage restore -f3 -b -g1 /dev/sdp1 /mnt/images/a5 804 root 1620 S partimage restore -f3 -b -g1 /dev/sdp1 /mnt/images/a5 980 root 348 S sleep 1 981 root 612 R ps Linux Client 2.6.15-rc5 #1 Mon Dec 12 10:02:35 CET 2005 i686 unknown
Based on the stack trace, this is DM, not MD. Reassigning....
What's your dm configuration? Run # dmsetup info -c # dmsetup table # dmsetup status (might be dmsetup.static in your environment)
Here's the requested dmsetup info. I've also attached a better lspci output. dmsetup info -c Name Maj Min Stat Open Targ Event UUID isw_cfdbggdhji_Volume05 254 5 L--w 0 1 0 isw_cfdbggdhji_Volume0 254 0 L--w 2 1 0 isw_cfdbggdhji_Volume01 254 1 L--w 1 1 0 dmsetup table isw_cfdbggdhji_Volume05: 0 202992174 linear 254:0 31439268 isw_cfdbggdhji_Volume0: 0 234436608 mirror core 2 2048 nosync 2 8:0 0 8:16 0 isw_cfdbggdhji_Volume01: 0 31439142 linear 254:0 63 dmsetup status isw_cfdbggdhji_Volume05: 0 202992174 linear isw_cfdbggdhji_Volume0: 0 234436608 mirror 2 8:0 8:16 114471/114471 isw_cfdbggdhji_Volume01: 0 31439142 linear
Created attachment 6809 [details] Output of lspci -vv
Created attachment 6863 [details] Stacktrace of partimage processes This is a stack trace of the partimage processes, the first of which is stuck in 'D' state. Obtained with sysrq.
I have some additional info. The oops occurs earlier than I thought; perhaps during definition of the DM RAID or shortly after. Also, I see "Access beyond end of device" of both /dev/sda and /dev/sdb in the log (couldn't save that log yet). I added kdb to the kernel, but the kernel didn't break into kdb on the Oops. I thought it should. What could have gone wrong?
I did some debugging today. The loop in core_get_resync_work is started with lc->sync_bits = 0xf88db000 (the value contained in %edi) lc->region_count = 114471 lc->sync_search = 0 After the first call of find_next_zero_bit, *region becomes 114496 and hence, lc->sync_search becomes 114497. Since this is not equal to lc->region_count the loop continues. But it is already larger than lc->region_count and out of range. The following patch fixes the problem for me: --- linux-2.6.14.2/drivers/md/dm-log.c 2005-10-28 02:02:08.000000000 +0200 +++ linux-2.6.15-rc5/drivers/md/dm-log.c 2005-12-23 15:36:39.000000000 +0100 @@ -573,7 +573,7 @@ lc->sync_search); lc->sync_search = *region + 1; - if (*region == lc->region_count) + if (*region >= lc->region_count) return 0; } while (log_test_bit(lc->recovering_bits, *region)); I wonder why one would only test for equality. Is find_next_zero bit supposed to only return a result within the range? Or was it once, but has been changed recently? Or is the real problem elsewhere, perhaps in incorrect drive geometry?
Hans-Joachim's patch fixes the problem for me as well.
The patch attached here is in (atleast) 2.6.16-rc3. Please close this report. Thanks.