Bug 4106

Summary: PDC20318 ata1: command timeout (promise SATA150 TX4 not FASTRACK S150 TX4)
Product: IO/Storage Reporter: Francois Payette (francoisp)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: REJECTED DOCUMENTED    
Severity: blocking    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.10-1.741_FC3 Subsystem:
Regression: --- Bisected commit-id:

Description Francois Payette 2005-01-26 09:45:09 UTC
Distribution: Fedora Core 3
Hardware Environment: ASUS A7N8X-E important: onbaord SATA controller Sil 3112A
1 GB RAM 4 Maxtors 200Gb in Raid 0+1 (software)
Software Environment: mdadm raid 0+1 + lvm on top (bug has been reproduced
without this raid setup)

Problem Description: After copying about 200GB of data, there is either a kernel
panic or a complete freeze with the message ata1: command timeout. Enabling on
the hardware the Sil3112A (and loading the module) and moving over 2 drives to
that controller causes the crash to happen earlier: after about 20GB of data
copied over. No crashes happen using only the sil3112A controller and the
sil_sata, only with promise_sata, and more rapidly when both were loaded.


Once when we got the kernel panic it said:
-not syncing: fs/jbd/transaction.c:978:spin_lock(fs/jbd/journal.c) already
locked by fs/jbd/transaction.c:1553

and __journal_try_to_free_buffer was on top of the stack trace. 

Steps to reproduce: cp -ax from a large drive to another large drive.

different scenarios that reproduced the bug:

+ 2 drives in raid0 on the sil3112A copying over to 2 drives in raid0 on
SATA150Tx4 ata1:comand timeout after 20GB (independently of which port was used
on the promise card)

+ 2 drives in raid0 on connector 1 and 3 of SATA150tx4 and 2 other drives in
raid0 on 2 and 4, WITH the sil_sata module loaded in memory but with the
sil3112A not present (hw disabled): ata1: command timeout after 60GB

+same as above with only sata_promise loaded instead of both sata modules: after
about 200GB. The msg is always ata1 :command timeout in all cases (except for
once a kernel panic in this case)

we used the same hardware minus the promise card for several months with drives
on the sil_3112A on RH9 (latest updates: possibly 2.4.31) without any issues.

here's what lspci -v returns (with the sil3112A disabled, and sil_sata not loaded)

00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?) (rev c1)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac
        Flags: bus master, 66Mhz, fast devsel, latency 0
        Memory at e0000000 (32-bit, prefetchable) [size=64M]
        Capabilities: [40] AGP version 2.0
        Capabilities: [60] HyperTransport: Host or Secondary Interface

00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev c1)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac
        Flags: 66Mhz, fast devsel

00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev c1)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac
        Flags: 66Mhz, fast devsel

00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev c1)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac
        Flags: 66Mhz, fast devsel

00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev c1)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac
        Flags: 66Mhz, fast devsel

00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev c1)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 80ac
        Flags: 66Mhz, fast devsel

00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a4)
        Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard
        Flags: bus master, 66Mhz, fast devsel, latency 0
        Capabilities: [48] HyperTransport: Slave or Primary Interface

00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 0c11
        Flags: 66Mhz, fast devsel, IRQ 5
        I/O ports at c800 [size=32]
        Capabilities: [44] Power Management version 2

00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
(prog-if 10 [OHCI])
        Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard
        Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 10
        Memory at ea002000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [44] Power Management version 2

00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
(prog-if 10 [OHCI])
        Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard
        Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 9
        Memory at ea003000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [44] Power Management version 2

00:02.2 USB Controller: nVidia Corporation nForce2 USB Controller (rev a4)
(prog-if 20 [EHCI])
        Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard
        Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 5
        Memory at ea004000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [44] Debug port
        Capabilities: [80] Power Management version 2

00:04.0 Ethernet controller: nVidia Corporation nForce2 Ethernet Controller (rev a1)
        Subsystem: ASUSTeK Computer Inc. A7N8X Mainboard onboard nForce2 Ethernet
        Flags: bus master, 66Mhz, fast devsel, latency 0, IRQ 5
        Memory at ea000000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at cc00 [size=8]
        Capabilities: [44] Power Management version 2

00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3)
(prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
        I/O behind bridge: 00009000-0000afff
        Memory behind bridge: e6000000-e8ffffff
        Prefetchable memory behind bridge: e4000000-e5ffffff

00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2) (prog-if 8a
[Master SecP PriP])
        Subsystem: ASUSTeK Computer Inc.: Unknown device 0c11
        Flags: bus master, 66Mhz, fast devsel, latency 0
        I/O ports at f000 [size=16]
        Capabilities: [44] Power Management version 2

00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1) (prog-if 00 [Normal
decode])
        Flags: bus master, 66Mhz, medium devsel, latency 32
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=32

01:04.0 Ethernet controller: Marvell Technology Group Ltd. Yukon Gigabit
Ethernet 10/100/1000Base-T Adapter (rev 13)
        Subsystem: ASUSTeK Computer Inc.: Unknown device 811a
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 5
        Memory at e8020000 (32-bit, non-prefetchable) [size=16K]
        I/O ports at 9000 [size=256]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data

01:06.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model
64/Model 64 Pro] (rev 15) (prog-if 00 [VGA])
        Subsystem: nVidia Corporation: Unknown device 0006
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 9
        Memory at e7000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e4000000 (32-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 1

01:07.0 Modem: Intel Corp.: Unknown device 1080 (rev 03) (prog-if 00 [Generic])
        Subsystem: Intel Corp.: Unknown device 100a
        Flags: bus master, stepping, medium devsel, latency 32, IRQ 10
        Memory at e8025000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 9400 [size=256]
        Capabilities: [80] Power Management version 2

01:09.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318
(SATA150 TX4) (rev 02)
        Subsystem: Promise Technology, Inc. PDC20318 (SATA150 TX4)
        Flags: bus master, 66Mhz, medium devsel, latency 96, IRQ 5
        I/O ports at 9800 [size=64]
        I/O ports at 9c00 [size=16]
        I/O ports at a000 [size=128]
        Memory at e8024000 (32-bit, non-prefetchable) [size=4K]
        Memory at e8000000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [60] Power Management version 2
Comment 1 Francois Payette 2005-01-27 08:22:03 UTC
It also seems that the timeout is related to a semaphore or locking problem: the
freeze (or kernel panic) happens much faster if more than one process is
accessing the disk; it happened twice in very short time when another machine
was mounting the drive via nfs.
Comment 2 Francois Payette 2005-02-18 07:55:03 UTC
The spin_lock kernel panic has been adressed in 2.6.11.

The ata1 command timeout turns out to be related only to raid arrays composed of
different drives; in particular a raid 0 pair composed of one 6Y200M0 and one
6B200S0. Arrays of identical disks alleviates this problem; the older 6Y2000M0
seems to be the source of it.