Distribution: Fedora Core 3 Hardware Environment: ASUS A7N8X-E important: onbaord SATA controller Sil 3112A 1 GB RAM 4 Maxtors 200Gb in Raid 0+1 (software) Software Environment: mdadm raid 0+1 + lvm on top (bug has been reproduced without this raid setup) Problem Description: After copying about 200GB of data, there is either a kernel panic or a complete freeze with the message ata1: command timeout. Enabling on the hardware the Sil3112A (and loading the module) and moving over 2 drives to that controller causes the crash to happen earlier: after about 20GB of data copied over. No crashes happen using only the sil3112A controller and the sil_sata, only with promise_sata, and more rapidly when both were loaded. Once when we got the kernel panic it said: -not syncing: fs/jbd/transaction.c:978:spin_lock(fs/jbd/journal.c) already locked by fs/jbd/transaction.c:1553 and __journal_try_to_free_buffer was on top of the stack trace. Steps to reproduce: cp -ax from a large drive to another large drive. different scenarios that reproduced the bug: + 2 drives in raid0 on the sil3112A copying over to 2 drives in raid0 on SATA150Tx4 ata1:comand timeout after 20GB (independently of which port was used on the promise card) + 2 drives in raid0 on connector 1 and 3 of SATA150tx4 and 2 other drives in raid0 on 2 and 4, WITH the sil_sata module loaded in memory but with the sil3112A not present (hw disabled): ata1: command timeout after 60GB +same as above with only sata_promise loaded instead of both sata modules: after about 200GB. The msg is always ata1 :command timeout in all cases (except for once a kernel panic in this case) we used the same hardware minus the promise card for several months with drives on the sil_3112A on RH9 (latest updates: possibly 2.4.31) without any issues. here's what lspci -v returns (with the sil3112A disabled, and sil_sata not loaded) 00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?) (rev c1) Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac Flags: bus master, 66Mhz, fast devsel, latency 0 Memory at e0000000 (32-bit, prefetchable) [size=64M] Capabilities: [40] AGP version 2.0 Capabilities: [60] HyperTransport: Host or Secondary Interface 00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev c1) Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac Flags: 66Mhz, fast devsel 00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev c1) Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac Flags: 66Mhz, fast devsel 00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev c1) Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac Flags: 66Mhz, fast devsel 00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev c1) Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac Flags: 66Mhz, fast devsel 00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev c1) Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac Flags: 66Mhz, fast devsel 00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a4) Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard Flags: bus master, 66Mhz, fast devsel, latency 0 Capabilities: [48] HyperTransport: Slave or Primary Interface 00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2) Subsystem: ASUSTeK Computer Inc.: Unknown device 0c11 Flags: 66Mhz, fast devsel, IRQ 5 I/O ports at c800 [size=32] Capabilities: [44] Power Management version 2 00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4) (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 10 Memory at ea002000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4) (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 9 Memory at ea003000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 00:02.2 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 5 Memory at ea004000 (32-bit, non-prefetchable) [size=256] Capabilities: [44] Debug port Capabilities: [80] Power Management version 2 00:04.0 Ethernet controller: nVidia Corporation nForce2 Ethernet Controller (rev a1) Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard onboard nForce2 Ethernet Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 5 Memory at ea000000 (32-bit, non-prefetchable) [size=4K] I/O ports at cc00 [size=8] Capabilities: [44] Power Management version 2 00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=32 I/O behind bridge: 00009000-0000afff Memory behind bridge: e6000000-e8ffffff Prefetchable memory behind bridge: e4000000-e5ffffff 00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2) (prog-if 8a [Master SecP PriP]) Subsystem: ASUSTeK Computer Inc.: Unknown device 0c11 Flags: bus master, 66Mhz, fast devsel, latency 0 I/O ports at f000 [size=16] Capabilities: [44] Power Management version 2 00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1) (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 32 Bus: primary=00, secondary=03, subordinate=03, sec-latency=32 01:04.0 Ethernet controller: Marvell Technology Group Ltd. Yukon Gigabit Ethernet 10/100/1000Base-T Adapter (rev 13) Subsystem: ASUSTeK Computer Inc.: Unknown device 811a Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 5 Memory at e8020000 (32-bit, non-prefetchable) [size=16K] I/O ports at 9000 [size=256] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data 01:06.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model 64/Model 64 Pro] (rev 15) (prog-if 00 [VGA]) Subsystem: nVidia Corporation: Unknown device 0006 Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 9 Memory at e7000000 (32-bit, non-prefetchable) [size=16M] Memory at e4000000 (32-bit, prefetchable) [size=32M] Capabilities: [60] Power Management version 1 01:07.0 Modem: Intel Corp.: Unknown device 1080 (rev 03) (prog-if 00 [Generic]) Subsystem: Intel Corp.: Unknown device 100a Flags: bus master, stepping, medium devsel, latency 32, IRQ 10 Memory at e8025000 (32-bit, non-prefetchable) [size=4K] I/O ports at 9400 [size=256] Capabilities: [80] Power Management version 2 01:09.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02) Subsystem: Promise Technology, Inc. PDC20318 (SATA150 TX4) Flags: bus master, 66Mhz, medium devsel, latency 96, IRQ 5 I/O ports at 9800 [size=64] I/O ports at 9c00 [size=16] I/O ports at a000 [size=128] Memory at e8024000 (32-bit, non-prefetchable) [size=4K] Memory at e8000000 (32-bit, non-prefetchable) [size=128K] Capabilities: [60] Power Management version 2
It also seems that the timeout is related to a semaphore or locking problem: the freeze (or kernel panic) happens much faster if more than one process is accessing the disk; it happened twice in very short time when another machine was mounting the drive via nfs.
The spin_lock kernel panic has been adressed in 2.6.11. The ata1 command timeout turns out to be related only to raid arrays composed of different drives; in particular a raid 0 pair composed of one 6Y200M0 and one 6B200S0. Arrays of identical disks alleviates this problem; the older 6Y2000M0 seems to be the source of it.