Bug 32682
Summary: | libata freeze randomly hangs disk I/O | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Robin Rainton (robin) |
Component: | Serial ATA | Assignee: | Jeff Garzik (jgarzik) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | high | CC: | alan, bjoernv, robin, rtguille, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.38.2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Full dmesg output
lspci -vvv fdisk -lu dmesg output with libata.force=1.5Gbps |
Description
Robin Rainton
2011-04-05 06:19:18 UTC
Correction, the "libata.dma=0" argument does not correct this problem with Kernel 2.6.28.10. I will double check the others. Further correction, the "libata.dma=0" argument does not correct this problem with Kernel 2.6.35.9 either. Even further correction, having previously run for quite some time the 2.6.38.2 kernel hung a few times overnight, so the "libata.dma=0" argument does not correct this problem with that Kernel either. Apologise for the initially incorrect title then. This problem is therefore more serious than thought as I cannot find any work around for these hangs. I still stand by what I said about the hardware being reliable though - the 'frozen' messages are spread over all SATA ports and drives which simply cannot all be faulty. First of all, please always attach full dmesg output including the boot and error messages when reporting kernel bugs, preferably with CONFIG_PRINTK enabled. It might look boring and repetitive but it actually contains a lot of valuable information. I can't tell much from the single selected failure but it's most likely a power related issue. Get an extra PSU - just a cheap one, power it up separately and move half of the drives to the extra PSU and see whether the pattern of errors changes. Thanks. Created attachment 53642 [details]
Full dmesg output
There are 3 separate occurrences of the problem here on 3 different ports.
This happened overnight when there was very little load on the system.
As requested have attached dmesg output. I am convinced this is not hardware related as discussed. I have also experimented with disabling NCQ on the drives that support it (the 6 in RAID5 on the GeForce controller) and interestingly the problem has not resurfaced since doing that. Will report back in a few days to confirm if the system stays error free with this. AFAICS, nothing precludes hardware issue at this point. NCQ might affect the issue as would libata.force=1.5Gbps. When you have underlying hardware problem, especially power related one, random tweaks often change the result and aren't too useful for cause elimination. It's far easier to keep the software visible configuration identical and change underlying hardware configuration and see how error pattern changes. Oh, it will also be helpful if there are failures from the silicon image controller. Thanks. I have based the statement that both controllers are affected on the fact that when one greps /var/log/messages for the fault it occurs on all ports. Eg (duplicates removed for brevity - kernel boots included to show versions): Mar 28 17:44:53 plex kernel: Linux version 2.6.28.10 Mar 29 10:59:30 plex kernel: ata6.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x6 frozen Apr 3 04:23:08 plex kernel: ata2.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x6 frozen Apr 3 04:27:27 plex kernel: ata4.00: exception Emask 0x0 SAct 0x3fff SErr 0x0 action 0x6 frozen Apr 3 04:31:46 plex kernel: ata3.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x6 frozen Apr 3 04:32:57 plex kernel: ata1.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x6 frozen Apr 3 04:39:55 plex kernel: ata5.00: exception Emask 0x0 SAct 0x1fff SErr 0x0 action 0x6 frozen Apr 3 07:21:23 plex kernel: Linux version 2.6.35.9 Apr 3 08:27:05 plex kernel: ata7.00: exception Emask 0x0 SAct 0xfff SErr 0x0 action 0x6 frozen Apr 3 09:27:01 plex kernel: ata9.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x6 frozen Apr 3 15:46:16 plex kernel: ata10.00: exception Emask 0x0 SAct 0x1fff SErr 0x0 action 0x6 frozen Apr 3 16:41:54 plex kernel: Linux version 2.6.38.2 Apr 5 23:13:23 plex kernel: ata8.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x6 frozen However, given that all 10 ports are mentioned yet only 8 drives are present in the box is suspicious! I then went back and checked the recovery message that follows each failure (SATA link up) and notice that on each occasion the speed was 3.0Gbps. This indicates the Samsung drives (and therfore GeForce ports) only are the only ones affected (the SiL/Seagate combination runs at 1.5Gbps). It would appear that during testing of the different Kernels, when the system boots sometimes different versions configure the 'ata' numbers on different controllers. This may be to do with the order the modules are loaded in the initrd image. Checking above on the boot logs I have discovered that when the 2.6.28.10 kernel boots the GeForce controller is initialised first. Note that the fault occured on ports 1-6 with that kernel. Then while testing other kernels I discovered that the SiL controller is initialised first. Note that the fault occurs only on numbers 7,8,9,10 (so far) on later kernels. Ports above 4 here would again indicate GeForce controller. Again I apologise for the confusion. My statement that all ports on both controllers now seems incorrect. I will stick to the current kernel and boot image (2.6.38.2) and advise if any more faults occur with NCQ disabled. Try booting with kernel parameter 'pcie_aspm=off' before rebooting: # lspci -vvv > lspci-vvv_2.6.XX_aspm-default.out then reboot with pcie_aspm=off # lspci -vvv > lspci-vvv_2.6.XX_aspm-off.out Try to reproduce the erros. I Hope it helps. ---- In my case (on F13,F14 and only once with F15) the active state power management on the sata controller: * nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2) Generated these type of errors among others, even i/o error. Disabling pcie_aspm seems to fix it for F13/F14, F15 seems to not to suffer the issue, but i saw ONCE a sata error, still i use F15 (F15 seems to selectivelly disable it for that controller) I tested slackware, with kernels compiled by me and found that the same kernel version (for example 2.6.33) downloaded an compiled in slack64 13.1 on the same hw worked fine, and F13 (2.6.33.XX) on the same hw needed pcie_aspm disabled to workaround a bunch of sata errors. And the stragest thing of all is that i compared: *1 The output of lspci -vvv in slackware64 (2.6.33-aspm) with and without aspm *2 The output of lscpi -vvv in F13 2.6.33.??) with and without aspm in *1, diff showed that aspm is disabled or enabled, ok in *2, diff showed that in addition to *1, there were other differences between an aspm enabled and an aspm disabled kernel: > pcie link speed differencies.... ???why??? > + or - differencies in other places. In the end pcie_aspm is an efective workaroud, but i don't know why... Slackware64 worked flawless with pcie_aspm enabled. ------------ The kernel seems to printk some stuff in parallel, it was a hellish experience to red these logs with so many disks. is there a way to serialize at least the sata disk detection messages? Is this your current config? (if wrong, please correct me): Kernel: 2.6.38.2.el5-local Command line: ro root=/dev/r1/plex.root libata.dma=0 Controller #0: sata_sil 0000:01:09.0: version 2.4 [ LOCATION: 0000:01:09.0 ] * ATA1, ATA2, ATA3, ATA4 * DISKS: ATA1, ATA2 * HPA disks: ATA2 Controller #1: libata version 3.00 loaded. [ LOCATION: 0000:00:09.0 ] * ATA5, ATA6, ATA7, ATA8, ATA9, ATA10 * HPA disks: ATA5, ATA7, ATA9 Controller #0: scsi 0:0:0:0 > sda > ATA1 > "ATA ST3200822AS 3.01 PQ: 0 ANSI: 5" > 390721968 512-byte logical blocks scsi 1:0:0:0 > sdb > ATA2 > "ATA ST3200822AS 3.01 PQ: 0 ANSI: 5" > 390719855 512-byte logical blocks scsi 2:0:0:0 EMPTY scsi 3:0:0:0 EMPTY Controller #1: scsi 4:0:0:0 > sdc > ATA5 > "ATA SAMSUNG HD501LJ CR10 PQ: 0 ANSI: 5" > 976771055 512-byte logical blocks scsi 5:0:0:0 > sdd > ATA6 > "ATA SAMSUNG HD502HJ 1AJ1 PQ: 0 ANSI: 5" > 976773168 512-byte logical blocks scsi 6:0:0:0 > sde > ATA7 > "ATA SAMSUNG HD501LJ CR10 PQ: 0 ANSI: 5" > 976771055 512-byte logical blocks scsi 7:0:0:0 > sdf > ATA8 > "ATA SAMSUNG HD502HJ 1AJ1 PQ: 0 ANSI: 5" > 976773168 512-byte logical blocks scsi 8:0:0:0 > sdg > ATA9 > "ATA SAMSUNG HD501LJ CR10 PQ: 0 ANSI: 5" > 976771055 512-byte logical blocks scsi 9:0:0:0 > sdh > ATA10 > "ATA SAMSUNG HD501LJ CR10 PQ: 0 ANSI: 5" > 976773168 512-byte logical blocks HPA: * ata2.00: HPA detected: current 390719855, native 390721968 * ata5.00: HPA detected: current 976771055, native 976773168 * ata7.00: HPA detected: current 976771055, native 976773168 * ata9.00: HPA detected: current 976771055, native 976773168 ------------------------ Can you post the output of: (list all partitions on all disks) # fdisk -lu # lspci -vvv and the content of /var/log/mesages and a current dmesg (with default boot parameters) Not all of your 500GB disks are of the same USABLE size. Not all of your 200GB disks are of the same USABLE size. So i don't know if it is intentional (it was configured this way) or not. Created attachment 53702 [details]
lspci -vvv
Created attachment 53712 [details]
fdisk -lu
The system boots from sda1 or sdb1.
Don't worry about the boot flags on the other first partitions - I had a mirror across all 6 x 500GB drives in the same way when that array was on it's own in another system and used for boot.
To clarify:
RAID1 (boot): sda1, sdb1
RAID1 (root inside LVM + others): sda2, sdb2
RAID1 (unused boot): sdc1 - sdh1
RAID5 (LVM with many volumes): sdc2 - sdh2
Attached lspci & fdisk as requested. It is impractical to attach /var/log/messages as this is full of DHCPD and SNMP requests and is massive. I can try and grep out this noise if dmesg & other attachements aren't enough. dmesg is unchanged since the attachment yesterday. After disabling NCQ the freeze has not happened since: Apr 5 23:15:24 plex kernel: ata5.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x6 frozen I will leave the system running for several more days before trying other kernel arguments suggested here to try and prove (or not) it is stable right now. HPA Sectors: 2113 (1081856 Bytes / ~1 Megabyte) sda > 390721968 sectors > UNPARTITIONED Sectors 5104 > HPA Sectors NONE sdb > 390719855 sectors > UNPARTITIONED Sectors 2991 > HPA Sectors 2113 sdc > 976771055 Sectors > UNPARTITIONED Sectors 2991 > HPA Sectors 2113 sdd > 976773168 Sectors > UNPARTITIONED Sectors 5104 > HPA Sectors NONE sde > 976771055 Sectors > UNPARTITIONED Sectors 2991 > HPA Sectors 2113 sdf > 976773168 Sectors > UNPARTITIONED Sectors 5104 > HPA Sectors NONE sdg > 976771055 Sectors > UNPARTITIONED Sectors 2991 > HPA Sectors 2113 sdh > 976773168 Sectors > UNPARTITIONED Sectors 5104 > HPA Sectors NONE I have googled that some mainboard reserve those 2113 sectors in the HPA when 'hardware raid/fakeraid' has been used on those drives. HW raid controllers also might use the HPA. I don't know if this is the reason of the errors, probaly not, because all partitions fit inside the partitionalble space. If there are no true HPA's in each drive (sdc,sde,sdg and sdb), there is another bug/issue probaly not related. Check that those drive either have a valid hpa (and someone forgot to erase it before partitioning or there is no hpa and it is a different/another bug. ----- YOU: 00:09.0 SATA controller: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2) (prog-if 01 [AHCI 1.0]) Subsystem: ABIT Computer Corp. Unknown device 1c44 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 (750ns min, 250ns max) Interrupt: pin A routed to IRQ 40 Region 0: I/O ports at 09f0 [size=8] Region 1: I/O ports at 0bf0 [size=4] Region 2: I/O ports at 0970 [size=8] Region 3: I/O ports at 0b70 [size=4] Region 4: I/O ports at dc00 [size=16] Region 5: Memory at fe026000 (32-bit, non-prefetchable) [size=8K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [8c] #12 [0010] Capabilities: [b0] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable+ Address: 00000000fee0200c Data: 4169 Capabilities: [ec] HyperTransport: MSI Mapping ME: 00:09.0 SATA controller: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2) (prog-if 01 [AHCI 1.0]) Subsystem: ASUSTeK Computer Inc. M3N72-D Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 (750ns min, 250ns max) Interrupt: pin A routed to IRQ 40 Region 0: I/O ports at c080 [size=8] Region 1: I/O ports at c000 [size=4] Region 2: I/O ports at bc00 [size=8] Region 3: I/O ports at b880 [size=4] Region 4: I/O ports at b800 [size=16] Region 5: Memory at fbd7a000 (32-bit, non-prefetchable) [size=8K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [8c] SATA HBA v1.0 InCfgSpace Capabilities: [b0] MSI: Enable+ Count=1/8 Maskable- 64bit+ Address: 00000000fee0100c Data: 4159 Capabilities: [ec] HyperTransport: MSI Mapping Enable+ Fixed+ Kernel driver in use: ahci If it is the same contoller (but different moboshop), then the workaround is booting with pcie_aspm=off It works with F13,F14. I never installed CentOS or RHEL on this machine. I am currently running F15 Alpha w/k 2.6.38.2-9.fc15.x86_64 (without workaround) Also, for me it was easy to reproduce the problem, for example with my two disks: # dd if=/dev/sda of=/dev/null & # dd if=/dev/sdb of=/dev/null & # find /&>/dev/null & Saturating (a read-only test is HIGHLY recommended) single disk (not using the md device, but sdX directly). Be carefull with the raid5. Just in case: Bug 611350 - ATA frozen , random pauses, every disk (workaround: pcie_aspm=off) Sorry improved the post: From: https://bugzilla.redhat.com/ Bug 611350 - ATA frozen , random pauses, every disk (workaround: pcie_aspm=off) After this week I am convinced this is to do with NCQ. After the last hang I disabled NCQ on all drives that supported it (the 6 drive RAID 5 array in this system) by: echo 1 > /sys/block/sdX/device/queue_depth Since then the system has been running all week without the problem. I rebooted this morning and left queue_depth untouched (the value was 31). The freeze again occurred after around 6 hours of uptime. I have now disabled NCQ once again as this appears to be a viable workaround. This was all on kernel 2.6.38.2 Robin, as I wrote before, correlation doesn't indicate causation. It could be a viable workaround for your case but that's as far as it goes. If you're interested, remove the NCQ workaround and boot with libata.force=1.5Gbps and see whether the problem goes away. Thanks. I tried booting with libata.force=1.5Gbps but dmesg reports the drives are connected at 3.0Gbps. Is that what you expect? Attached is a dmesg output for this. Created attachment 55082 [details]
dmesg output with libata.force=1.5Gbps
Use either libata.force=1.5G or libata.force=1.5 That's initrd failing to pass on the libata parameter to the respective module. I don't know why some distro initrds still don't implement it. :-( Anyways, you'll need to edit some module config and regenerate initrd so that initrd specifies the parameter to libata directly. Which distro are you on? Reartes, it is "libata.force=1.5Gbps". This is with Centos 5.6. I don't see why one has to edit the init script to pass an argument to a module - surely that is a bug in it's own right? Anyhow, that aside, what does one need to do to the init script? I assume I have to put something on the end of insmod? For ahci? Please confirm. Yeah, that's a bug of the initrd. Please report to the distro. I don't know about centos but in most cases it comes down to adding a file under /etc/modprobe.d and regenerating the initrd. mkinitrd will pick up the parameter from there and include it inside initrd. I know, but these are more easy to type. http://www.mjmwired.net/kernel/Documentation/kernel-parameters.txt The VAL specifies the configuration to force. As long as there's no ambiguity shortcut notation is allowed. For example, both 1.5 and 1.5G would work for 1.5Gbps. The following configurations can be forced. @Tejun > Robin, as I wrote before, correlation doesn't indicate causation. It could > be > a viable workaround for your case but that's as far as it goes. If you're > interested, remove the NCQ workaround and boot with libata.force=1.5Gbps and > see whether the problem goes away. I do fully agree with that statement. In my case Nvidia [10de:0ad4], pcie_aspm=off works with fedora's kernel. I never experienced the problem with slackware 13.1 with either the slack kernel nor kernel downladed by me and manually enabled pcie_aspm. I noticed that forcing no ncq the messages diminished, then i ended disabling everything!!! PIO MODE!!! and some messages still apeared. until the pcie_aspm=off workaround. I do not think (but i have no prof) that there is a problem with kernel.org kernel. It seems fedora / rhel6beta1 the only rhel tested, at leat in my specific case. The test i performed with slack userspace and my downloaded, configured and compiled kernels showed no issues at all. nothing. The most biz stuff is that i noticed that fedora does strange things with the pci/pcie bus when pcie_aspm=off is set. i am no pci/pcie guru, but it is strange and i really think (with no proof) that pcie_aspm fixes the problem by accident. I still don't know the true cause, but at least with F15 i only experienced 1 message. @ Robin is your nvdida sata controller the same as mine ? mine is [10de:0ad4]. check with # lspci -nn | grep SATA > Yeah, that's a bug of the initrd. Please report to the distro. I don't know > about centos but in most cases it comes down to adding a file under > /etc/modprobe.d and regenerating the initrd. mkinitrd will pick up the > parameter from there and include it inside initrd. It seems probably. Try the sorter forms anyways, they might work. # lspci -nn | grep SATA 00:09.0 SATA controller [0106]: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2) 01:09.0 RAID bus controller [0104]: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02) I'm not sure the shorter forms of the argument will work, looking here: http://lxr.linux.no/linux+v2.6.38/drivers/ata/libata-core.c#L6293 Later I will manually add "force=1.5Gbps" to the libata insmod command in the RD. BTW, this is a stock kernel compiled by myself, not from any distro. I don't see how there is an initrd bug here either. When one specifies arguments on the Grub command line the kernel should take note of them - that's nothing to do with the RD is it? Further update on this: After manually adding "force=1.5Gbps" to the libata insmod command in the RD the system has been stable for almost 2 weeks now. So there now appear to be 2 solutions to this problem: 1. Disable NCQ 2. force=1.5Gbps To me, it definitely looks like link layer problem which gets masked if transfer rate and/or power draw isn't high enough. Most likely power related issue but could be cabling. Please prepare a separate PSU and connect all the hard drives to it and see whether the problem goes away. The two PSUs don't have to share common ground, SATA links don't make direct electrical connections and don't require common ground between host and device. There really isn't much which can be debugged from software at this point. Thanks. |