Bug 15173
Summary: | sata_via VT6421 softRAID | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Pawel Piatek (xj) |
Component: | Serial ATA | Assignee: | Tejun Heo (tj) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | fercerpav, kernelbugtracker, matej.zary, napperley, peter, q, sjorrit, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32.7, 2.6.33-rc6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg output
/var/log/messages /var/log/syslog dmesg output after boot dmesg after running hdparm -tT lspci output format of the drive sata_via-crc-fix.patch lspci -nnv output lspci output smartctl drive output dmesg from before the lockup lspci -nnv on my A7V600 (KT600 chipset) |
Description
Pawel Piatek
2010-01-30 05:33:39 UTC
Can you please attach full dmesg output after such incident? Thanks. Created attachment 24921 [details]
dmesg output
dmesg output
Both the controller and drive are reporting communication problems. They can't talk to each other properly at the link level. How often does this happen? Can you post log w/ timestamps? Can you please try to use a shorter cable? Created attachment 24967 [details]
/var/log/messages
Log with timestamp /var/log/messages.
Created attachment 24968 [details]
/var/log/syslog
Log with timestamps
hi, thanks for reply. I change cables two times - this not help :(. Some more hardware information: # lspci 00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 03) 00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03) 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02) 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01) 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01) 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02) 00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) 00:0f.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03) 00:11.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 24) 01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage Pro AGP 1X/2X (rev 5c) 02:0b.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID Controller (rev 50) # lsscsi [0:0:0:0] disk ATA SAMSUNG SV0511D MJ20 /dev/sda [0:0:1:0] disk ATA ST310210A 3.21 /dev/sdb [1:0:0:0] disk ATA SAMSUNG SP0842N BH90 /dev/sdc [1:0:1:0] disk ATA ST340016A 3.05 /dev/sdd [2:0:0:0] disk ATA WDC WD10EADS-00M 01.0 /dev/sde [3:0:0:0] disk ATA WDC WD10EADS-00M 01.0 /dev/sdf [4:0:0:0] disk ATA ST340016A 3.05 /dev/sdg [4:0:1:0] disk ATA IBM-DTLA-305020 TW2O /dev/sdh This very much looks like a hardware problem. Can you power up a separate power supply and connect half of SATA harddrives there? You can power up a PSU by doing the following. http://modtown.co.uk/mt/article2.php?id=psumod Ok. I connect SATA drives to separate power suply, but this don't change anything. Problem seems apear when copy data between SATA drivers. So when create raid level 1 - then "sync" proces cause this errors. Maybe this is problem with DMA support in sata_via module ? Thanks for testing it. The failures you're seeing is between the host controller (the via chip) and the hard drive. Both the controller and the hard drive are reporting that they can't hear each other very well. Host side issues (between the controller and the components on the mainboard) usually don't manifest as ATA bus issues. I'm afraid there isn't much the driver can do with these failures. There could be some PHY level knobs in the controller but given that this is the first report of this type of issues with the controller, I'm much more inclined toward faulty add-in board (ie. signal trace lengths not matched properly, faulty connector kind of things). Can you try it on a different operating system? Thanks. Created attachment 26469 [details]
dmesg output after boot
Hello, The problem Pawel Piatek describes also manifests itself for me and another guy who bought a VT6421-based controller at the same time. The drives in this case are WDC WD15EARS-00Z5B1. It's known to me that these drives need to be aligned, but the fdisk in util-linux-ng 2.17.2 defaults to a start sector of 2048 bytes anyway. On a SB700 motherboard controller, the drive works fine (sequential output of hdparm -tT around 100MB/s)... with the VT6421 it's in the range of 1-30 MB/s. We see a lot of 'hard resetting' link messages too. Attached are my dmesg directly after booting, and a dmesg full of error messages when I run hdparm -tT on the drive, along with some other files. I cannot verify whether the controller works on another OS, but I've ruled out cable, connection issues etc. as the drives works fine with the SB700 controller and the same problem persists for the other guy. Note that I experience the problems on a 2.6.32.5 kernel. Because I saw some recent fixes, the other guy tried a 2.6.34 kernel but the problem still remains. Any clues, hints, ideas? In any case a big thanks to the (driver/libata) developers anyway, your work is greatly appreciated. With kind regards, Jorrit Tijben Created attachment 26470 [details]
dmesg after running hdparm -tT
Created attachment 26471 [details]
lspci output
Created attachment 26472 [details]
format of the drive
Hello, Hmm... I just tested my vt6421 addon card with several different recent wd drives and I'm seeing the same problem w/ all of them while other drives work just fine. Bus trace doesn't show anything particular except the host claiming bad reception every now and then. There seems to a phy compatibility issue here. I'll test a bit more and contact related parties. Thanks. (In reply to comment #15) > There seems to a phy compatibility issue > here. I'll test a bit more and contact related parties. Thanks for replying so quickly. I'm curious to see whether it's fixable in the driver. Jorrit Tijben Created attachment 26589 [details]
sata_via-crc-fix.patch
Okay, this should fix the problem although I don't know how it does it. Can you please try it?
According to the docs I have for PCI ID 0x3249, register 0x52 is "Transport Miscellaneous Control" bits: 7 Reserved ............................................always reads 0 6 Transport Issue Early Request to Link to improve Performance .............................. default = 0 5 Reserved ................................................... default = 0 4 Single Data FIS Transmission................. default = 0 Allow over 8k bytes. 3 BIST FIS ................................................... default = 0 Controller can accept BIST FIS when behaves as a device (Rx53[1:0] are set). This bit is set only for controller to control BIST FIS self-test. 2 SATA Flow Control Water Flag 1 FFF0 threshold (the value is based on RX43) 0 32DW.....................................................default 1 COMRESET Will reset both master / slave device (test mode only) ............................ default = 0 0 Reset Shadow Register (test mode only) default = 0 So, that's setting SATA Flow Control Water Flag to FFF0 threshold which is somehow based on something called RX43. Still sounds like a proper mystery to me. :-( Thanks. Hello, Excuse me for the late response, but I can confirm the fix by Joseph Chan works. hdparm -tT now gives around 50MB/s for buffered disk reads now and the error messages are gone. The performance is still a bit meager compared to the SB700 on-board controller, but I don't know what the possible bottlenecks are. A very big thanks to all for investigating and fixing this, it's *really* appreaciated! Jorrit Tijben Patch already in mainline and will be released for -stable too. Resolving as FIXED. Thanks. Works for mee too! I had messages like the follwing, with 2.6.34.1 they have now disappeared. ata1.00: exception Emask 0x12 SAct 0x0 SErr 0x1000500 action 0x6 ata1.00: BMDMA stat 0x5 ata1: SError: { UnrecovData Proto TrStaTrns } ata1.00: cmd c8/00:20:b7:18:38/00:00:00:00:00/e0 tag 0 dma 16384 in res 51/40:20:b7:18:38/00:00:00:00:00/e0 Emask 0x12 (ATA bus error) ata1.00: status: { DRDY ERR } ata1.00: error: { UNC } ata1: hard resetting link ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: configured for UDMA/100 ata1: EH complete akk:~# lspci 00:01.0 Host bridge: Advanced Micro Devices [AMD] CS5536 [Geode companion] Host Bridge (rev 33) 00:01.2 Entertainment encryption device: Advanced Micro Devices [AMD] Geode LX AES Security Block 00:09.0 Ethernet controller: VIA Technologies, Inc. VT6105M [Rhine-III] (rev 96) 00:0b.0 Ethernet controller: VIA Technologies, Inc. VT6105M [Rhine-III] (rev 96) 00:0c.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID Controller (rev 50) 00:0e.0 Network controller: Atheros Communications Inc. Device 0029 (rev 01) 00:0f.0 ISA bridge: Advanced Micro Devices [AMD] CS5536 [Geode companion] ISA (rev 03) 00:0f.2 IDE interface: Advanced Micro Devices [AMD] CS5536 [Geode companion] IDE (rev 01) 00:0f.4 USB Controller: Advanced Micro Devices [AMD] CS5536 [Geode companion] OHC (rev 02) 00:0f.5 USB Controller: Advanced Micro Devices [AMD] CS5536 [Geode companion] EHC (rev 02) akk:~# My hard disc is a WD: [ 1.453025] sata_via 0000:00:0c.0: version 2.6 [ 1.453153] sata_via 0000:00:0c.0: routed to hard irq line 9 [ 1.464542] sata_via 0000:00:0c.0: setting latency timer to 64 [ 1.464717] scsi0 : sata_via [ 1.471000] scsi1 : sata_via [ 1.477183] scsi2 : sata_via [ 1.483323] ata1: SATA max UDMA/133 port i16@0x1800 bmdma 0x1c00 irq 9 [ 1.496424] ata2: SATA max UDMA/133 port i16@0x1840 bmdma 0x1c08 irq 9 [ 1.509496] ata3: PATA max UDMA/133 port i16@0x1880 bmdma 0x1c10 irq 9 ... [ 3.694987] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 3.715340] ata2.00: ATA-8: WDC WD15EADS-00S2B0, 01.00A01, max UDMA/133 [ 3.716602] ata2.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 0/32) [ 3.720406] ata2.00: configured for UDMA/133 [ 3.721316] scsi 1:0:0:0: Direct-Access ATA WDC WD15EADS-00S 01.0 PQ : 0 ANSI: 5 [ 3.723468] sd 1:0:0:0: [sda] 2930277168 512-byte logical blocks: (1.50 TB/1. 36 TiB) [ 3.724202] sd 1:0:0:0: [sda] Write Protect is off [ 3.725841] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 3.725980] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, does n't support DPO or FUA [ 3.727823] sda: sda1 [ 3.741980] sd 1:0:0:0: [sda] Attached SCSI disk I had similar problems with my 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) and WDC Caviar Green 2TB WD20EARS disks. I thought I'd report that Joseph Chan's <JosephChan@via.com.tw> magix fix also works for this controller. I patched the kernel with the above sata_via-crc-fix.patch (uncommenting the if statement device check) and haven't had problems since. Can you please attach output of lspci -nnv? Created attachment 37112 [details]
lspci -nnv output
lspci output for the VT6420 case described below
Possibly more about the cause and the magic patch can be found on http://lxr.free-electrons.com/source/drivers/ata/sata_via.c#L579 I've also got the same problem (WD Cavier HD doesn't properly communicate with VIA disk controller) with a cheap SATA/IDE disk controller that uses the VIA VT6421 chipset. Linux kernel 2.6.38-8 is used. It seems that the problem still hasn't been resolved. Typical symptoms I am getting are that an application is launched (eg Libre Office), or any other action is performed on the desktop and the disk will stop working after it has been running for a few seconds. About half a minute later any action that was started is suddenly completed when the disk (WD Cavier) resumes running. These issues occur frequently. Below is some output relating to the problem via dmesg: ---------------------------------------------- [ 1.839815] scsi2 : sata_via [ 1.842185] scsi3 : sata_via [ 1.843810] scsi4 : sata_via [ 1.843964] ata3: SATA max UDMA/133 port i16@0x1460 bmdma 0x1440 irq 16 [ 1.843973] ata4: SATA max UDMA/133 port i16@0x1470 bmdma 0x1448 irq 16 [ 1.843979] ata5: PATA max UDMA/133 port i16@0x1480 bmdma 0x1450 irq 16 [ 1.844854] e100 0000:05:08.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 [ 1.982048] e100 0000:05:08.0: PME# disabled [ 1.983255] e100 0000:05:08.0: eth0: addr 0xfc500000, irq 20, MAC addr 00:0b:cd:a3:39:2a [ 2.244068] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 2.253311] ata3.00: ATA-8: WDC WD5000AAKX-001CA0, 15.01H15, max UDMA/133 [ 2.253319] ata3.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32) [ 2.269316] ata3.00: configured for UDMA/133 [ 2.269567] scsi 2:0:0:0: Direct-Access ATA WDC WD5000AAKX-0 15.0 PQ: 0 ANSI: 5 [ 2.269981] sd 2:0:0:0: Attached scsi generic sg1 type 0 [ 2.270535] sd 2:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB) [ 2.270655] sd 2:0:0:0: [sda] Write Protect is off [ 2.270664] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 2.270715] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2.313705] sda: sda1 sda2 < sda5 > [ 2.314696] sd 2:0:0:0: [sda] Attached SCSI disk [ 2.599041] ata4: SATA link down (SStatus 0 SControl 310) [ 3.006774] EXT4-fs (sda1): INFO: recovery required on readonly filesystem [ 3.006784] EXT4-fs (sda1): write access will be enabled during recovery [ 3.063666] ata3.00: exception Emask 0x12 SAct 0x0 SErr 0x1380500 action 0x6 [ 3.063674] ata3.00: BMDMA stat 0x5 [ 3.063681] ata3: SError: { UnrecovData Proto 10B8B Dispar BadCRC TrStaTrns } [ 3.063689] ata3.00: failed command: READ DMA EXT [ 3.063702] ata3.00: cmd 25/00:00:88:bb:05/00:01:1d:00:00/e0 tag 0 dma 131072 in [ 3.063705] res 51/84:7f:88:bb:05/84:00:1d:00:00/e0 Emask 0x12 (ATA bus error) [ 3.063711] ata3.00: status: { DRDY ERR } [ 3.063716] ata3.00: error: { ICRC ABRT } [ 3.063731] ata3: hard resetting link [ 3.380055] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ---------------------------------------------- As you can see the issue still hasn't been resolved even though the issue is currently marked as RESOLVED CODE_FIX. Does VIA really have a VT6420 chipset? Sounds too similar/close to VT6421. napperley, the problem discussed in this report was a quite specific incompatibility between some WDC drives and vt6420/1 and manifests as almost constant stream of ATA bus errors with specific SError value (0x1000500). The workaround has been applied to all vt6420/1 controllers. The problem you're seeing seems different. Can you please try debugging the hardware first? ie. try different cable, port, hard drive, power supply and observe and report how the pattern of failures change. Thanks. Ok, this all seems to make sense and hopefuly will fix my very similar problem too. Only problem is I'm running VT8237 and according to http://forum.sources.ru/index.php?showtopic=328955&hl= last post the config registers seem to be the opposite for VT8237. Is there any way you could realese a similar patch for this chip too? I'm also seeing soft bus resets at high transfer rates or dmraid resync (fixed by limiting the max resync speed to 50000). Or is there any other way to limit max sata transfer speed without a sata_via patch? This is what happens when you mix old chipset with brand new WD drives. Thanks in advance. *** Bug 50661 has been marked as a duplicate of this bug. *** Ok, misread dmesg, sata_via actually running on 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) on Asus A7V880 I realize that kernel 2.6.32.7 won't ever make it to Centos 4.9 so I'll try to work this around by adding setpci -s 00:0f.0 52=4 to rc.local. Hi there, it seems that there are 2 problems with the commited patch for this bug (commit 8b27ff4cf6d15964aa2987aeb58db4dfb1f87a19) on VT6421 IDE RAID Controller 1. it causes major performance regression on disk transfer speed with SAMSUNG HD502IJ HDD without commit 8b27ff4cf6d15964aa2987aeb58db4dfb1f87a19: hdparm -t --direct /dev/sdb /dev/sdb: Timing O_DIRECT disk reads: 310 MB in 3.00 seconds = 103.28 MB/sec with commit 8b27ff4cf6d15964aa2987aeb58db4dfb1f87a19: hdparm -t --direct /dev/sdb /dev/sdb: Timing O_DIRECT disk reads: 184 MB in 3.02 seconds = 60.83 MB/sec 2. suspend/resume cycle clears the PCI register value which had been set up by the patch (so it looks like the affected WD drives will start behave badly again after resume, in my case the suspend/resume cycle cures the transfer speed regression) lspci -xxx can be used to verify the register values before and after suspend/resume suspend/resume cycle can be "emulated" with setpci -s 02:01.0 0x52.B=4 and setpci -s 02:01.0 0x52.B=0 commands (in my case) lspci and smartctl in attachments Created attachment 120341 [details]
lspci output
Created attachment 120351 [details]
smartctl drive output
I can confirm the speed regression, it's almost 2x slower now. Please reopen the bug, this issue needs to be dealt with in a more elegant way. Testing Seagate ST1000DM003-1ER1 with an add-on VT6421A PCI card (Gembird SIDE-1), motherboard chipset is KT600, Linux version 3.0.0. # /sbin/setpci -s 00:0c.0 0x52.B=4; for i in `seq 5`; do /sbin/hdparm -t --direct /dev/sda; done; /sbin/setpci -s 00:0c.0 0x52.B=0; for i in `seq 5`; do /sbin/hdparm -t --direct /dev/sda; done /dev/sda: Timing O_DIRECT disk reads: 178 MB in 3.02 seconds = 59.00 MB/sec /dev/sda: Timing O_DIRECT disk reads: 190 MB in 3.02 seconds = 62.86 MB/sec /dev/sda: Timing O_DIRECT disk reads: 190 MB in 3.02 seconds = 63.01 MB/sec /dev/sda: Timing O_DIRECT disk reads: 190 MB in 3.02 seconds = 62.94 MB/sec /dev/sda: Timing O_DIRECT disk reads: 190 MB in 3.03 seconds = 62.79 MB/sec /dev/sda: Timing O_DIRECT disk reads: 348 MB in 3.01 seconds = 115.49 MB/sec /dev/sda: Timing O_DIRECT disk reads: 348 MB in 3.01 seconds = 115.61 MB/sec /dev/sda: Timing O_DIRECT disk reads: 348 MB in 3.01 seconds = 115.45 MB/sec /dev/sda: Timing O_DIRECT disk reads: 348 MB in 3.01 seconds = 115.45 MB/sec /dev/sda: Timing O_DIRECT disk reads: 348 MB in 3.01 seconds = 115.67 MB/sec When reading two HDs at once, the lowered high-water-mark PCI register setting (which is applied by default) still isn't enough to prevent some kernel messages. Is that fixable (maybe with another tunable)? I was going to set up an old Athlon XP2500+ on a A7V600 (w/ VT6420 RAID controller onboard) to test some stuff with grub / md before changing anything on the machine I normally use. HDs are * WDC WD10EADS-65L5B1 (1TB green power, 90MB/s sequential read) * WD1600JD-00HBB0 (160GB, 57MB/s sequential read) I can sudo dd if=/dev/sda2 of=/dev/null bs=1024k iflag=direct or same for the other drive, with no trouble. But if I dd from both drives at once (or from /dev/md/g2-root (RAID10, f2 layout, 64k chunk size)), then I get some SATA command error messages on the port of the faster HD (the WD10EADS). (and btw, the 64k chunk size is to make sure the files GRUB needs aren't contiguous with the f2 layout. I plan to use 512k for real.) Xubuntu's installer crashed most of the way into an install a RAID10,f2 partitioned md device, with segfaults in a several commands that it ran after chrooting into the xfs mount that it copied files to. (I tested my RAM and my USB stick, I don't think the corruption came from them. Everything went fine when installing into a plain partition on the 1TB drive, not touching the md device.) I'll dd my partitions some more, and see if I get a crash or a change in the crc of either blockdev. (Not sure the CPU is fast enough to md5sum both disks at full speed...) The system was totally idle when I ran the two dd processes on tty1 and tty2. (The X server was running, but I was logged out.) I was ssh'ed in in case the system locked up, like I saw happen once while dding an md device from the live CD. So there were a few interrupts from the network card. I killed one of the dd processes very soon after seeing some errors. When I tried again later (after it had already limited speed to "UDMA/100", (whatever that means for SATA...)), I still get link resets. these are the error messages: ``` [ 1299.900044] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 [ 1299.900072] ata3.00: BMDMA stat 0x5 [ 1299.900084] ata3.00: failed command: READ DMA EXT [ 1299.900102] ata3.00: cmd 25/00:00:00:d0:03/00:04:00:00:00/e0 tag 0 dma 524288 in res 51/84:af:51:cc:03/84:03:00:00:00/e0 Emask 0x10 (ATA bus error) [ 1299.900126] ata3.00: status: { DRDY ERR } [ 1299.900136] ata3.00: error: { ICRC ABRT } [ 1299.900155] ata3: soft resetting link [ 1300.080192] ata3.00: configured for UDMA/133 [ 1300.080211] ata3: EH complete [ 1304.552049] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 [ 1304.552087] ata3.00: BMDMA stat 0x5 [ 1304.552100] ata3.00: failed command: READ DMA EXT [ 1304.552117] ata3.00: cmd 25/00:00:00:48:0a/00:04:00:00:00/e0 tag 0 dma 524288 in res 51/84:cf:31:44:0a/84:03:00:00:00/e0 Emask 0x10 (ATA bus error) [ 1304.552141] ata3.00: status: { DRDY ERR } [ 1304.552151] ata3.00: error: { ICRC ABRT } [ 1304.552170] ata3: soft resetting link [ 1304.732180] ata3.00: configured for UDMA/133 [ 1304.732198] ata3: EH complete [ 1304.784060] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 [ 1304.784083] ata3.00: BMDMA stat 0x5 [ 1304.784094] ata3.00: failed command: READ DMA EXT [ 1304.784111] ata3.00: cmd 25/00:00:00:58:0a/00:04:00:00:00/e0 tag 0 dma 524288 in res 51/84:af:51:54:0a/84:03:00:00:00/e0 Emask 0x10 (ATA bus error) [ 1304.784134] ata3.00: status: { DRDY ERR } [ 1304.784144] ata3.00: error: { ICRC ABRT } [ 1304.784161] ata3: soft resetting link [ 1304.964179] ata3.00: configured for UDMA/133 [ 1304.964195] ata3: EH complete [ 1308.064044] ata3.00: limiting speed to UDMA/100:PIO4 [ 1308.064055] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 [ 1308.064077] ata3.00: BMDMA stat 0x5 [ 1308.064090] ata3.00: failed command: READ DMA EXT [ 1308.064107] ata3.00: cmd 25/00:00:00:d8:0e/00:04:00:00:00/e0 tag 0 dma 524288 in res 51/84:df:21:d4:0e/84:03:00:00:00/e0 Emask 0x10 (ATA bus error) [ 1308.064130] ata3.00: status: { DRDY ERR } [ 1308.064827] ata3.00: error: { ICRC ABRT } [ 1308.065731] ata3: soft resetting link [ 1308.244183] ata3.00: configured for UDMA/100 [ 1308.244204] ata3: EH complete [11580.572067] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 [11580.572754] ata3.00: BMDMA stat 0x5 [11580.573874] ata3.00: failed command: READ DMA [11580.575236] ata3.00: cmd c8/00:00:00:07:1c/00:00:00:00:00/e0 tag 0 dma 131072 in res 51/84:cf:31:06:1c/84:03:00:00:00/e0 Emask 0x10 (ATA bus error) [11580.578081] ata3.00: status: { DRDY ERR } [11580.579536] ata3.00: error: { ICRC ABRT } [11580.581006] ata3: soft resetting link [11580.744228] ata3.00: configured for UDMA/100 [11580.744256] ata3: EH complete ``` (I attached full lspci -nnv and dmesg output) 00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237/VX700 PCI Bridge 00:09.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12) 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:10.0 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.1 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.2 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.3 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.4 USB controller: VIA Technologies, Inc. USB 2.0 (rev 86) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) 00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller (rev 80) 01:00.0 VGA compatible controller: NVIDIA Corporation NV44A [GeForce 6200] (rev a1) $ lspci -xxx -s 00:0f.0 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 00: 06 11 49 31 07 00 90 02 80 00 04 01 00 20 80 00 10: 01 d4 00 00 01 d0 00 00 01 b8 00 00 01 b4 00 00 20: 01 b0 00 00 01 a8 00 00 00 00 00 00 43 10 ed 80 30: 00 00 00 00 c0 00 00 00 00 00 00 00 00 02 00 00 40: 13 03 f1 44 06 af 00 00 10 82 45 03 00 00 00 00 50: 00 00 04 00 00 00 04 04 00 10 10 00 05 00 20 00 60: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 01 00 01 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 46 36 00 10 46 36 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 80 02 49 31 43 10 ed 80 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 peter@gamma2:~$ cat /proc/interrupts CPU0 0: 47 IO-APIC-edge timer 1: 7700 IO-APIC-edge i8042 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 14: 0 IO-APIC-edge pata_via 15: 14537 IO-APIC-edge pata_via 16: 608 IO-APIC-fasteoi nouveau 18: 45963 IO-APIC-fasteoi eth0 20: 393236 IO-APIC-fasteoi sata_via 21: 55160 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5 22: 54 IO-APIC-fasteoi snd_via82xx NMI: 59 Non-maskable interrupts LOC: 516256 Local timer interrupts SPU: 0 Spurious interrupts PMI: 59 Performance monitoring interrupts IWI: 0 IRQ work interrupts RTR: 0 APIC ICR read retries RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 50 Machine check polls THR: 0 Hypervisor callback interrupts ERR: 0 MIS: 0 update: letting it run for a while, running crc32 in parallel on each disk (fast disk going at 42MB/s, slow disk going at 52MB/s), I started to see some errors from the slower HD's port. And then my ssh session locked up. And so did the ps/2 keyboard. (not even alt+sysrq+b works). I'm still seeing some messages scroll up the console, including some (typed by hand from the console of the wedged machine): "usb 1-2: device descriptor read/64, error -110", and "INFO: xfsaild/sda4:143 blocked for more than 120 seconds". Oh, that's my root FS, so I guess the whole system goes to crap when / and the swap partitions are blocked. There are: "end_request: I/O error, dev sda sector 9578752" "Buffer I/O error on device sda2 ..." ... and end_request: I/O error, sdb, sector 5671568" So I guess if I want to use this old machine for anything, it's going to have to be with only one SATA drive. :( I had been hoping to maybe use it to replace the PIII-450MHz that's been my router / mail server for over 10 years. :P Created attachment 167181 [details]
dmesg from before the lockup
Created attachment 167191 [details]
lspci -nnv on my A7V600 (KT600 chipset)
|