Bug 14515
Summary: | ATA controller losing interrupt, system stall | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Lloyd Weehuizen (lloyd) |
Component: | IDE | Assignee: | io_ide (io_ide) |
Status: | CLOSED DOCUMENTED | ||
Severity: | normal | CC: | 21cnbao, alan, andrewnz.simpson, hancockrwd, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Kernel Log
PCI Devices |
Created attachment 23603 [details]
PCI Devices
Yes. I've encountered the same problem "lost interrupt" with Lloyd Weehuizen after upgrading kernel to 2.6.31. What I am using is HP laptop. I think recent commits must have destroyed some ATA controllers. Rolling back to kernel 2.6.30 reduces the occurrence of this issue, but it still occurs. Ubuntu bug #445852 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/445852 Reporting as affecting Acer Aspire One, Asus EEE and Dell Mini 9 (All SSD models). More prevalent with SSD's that have been upgraded by the user to newer models, but also reported on the stock 8Gb SSD on the AA1. This has nothing to do with the harddrive. It's the optical drive choking up on commands (TEST_UNIT_READY) used for polling for media presence events. As it's PATA and the hard drive shares bus with the optical drive, while the optical drive is choking, the hard drive can't access the bus so hard drive access also stalls. Disabling media presence polling should work around the problem for now. Long term, I think we shouldn't use TUR for media presence polling. Windows doesn't use it and devices with crappy firmware which chokes on repeated TUR aren't too rare. That's in the pipeline but unfortunately I don't think it will be ready for this release cycle. It's gonna be a pretty pervasive change. :-( So, for now, "hal-disable-polling /dev/sr0" seems like the only solution. Thanks. AFAIK the cdrom code normally uses GET EVENT STATUS NOTIFICATION for this rather than TEST UNIT READY. It would be interesting to know where exactly the TUR is coming from.. TUR is coming from open and I have patches to implement in-kernel polling so that polling doesn't have to go through open but it depends on workqueue patches which are yet to be merged. I think I can pull it for 2.6.34 but I might be too optimistic. :-P Thanks. O.K., it would appear then that this is not related to Ubuntu bug #445852, since the machines in this bug report don't have cdrom drives. Trying "hal-disable-polling --device /dev/sr0", just gives an error. Should I open a different bug report? Andrew, yes and please attach kernel log with printk timestamp turned on which includes boot and the failures and the output of "lspci -nn". Thanks. |
Created attachment 23602 [details] Kernel Log I've just encountered a problem when moving from Ubuntu 9.04 to 9.10. There are large stalls every now and again and looking at the kernel log, I'm getting lost interrupts on the ATA interface. This did not happen with the previous 2.6.28 kernel. Here's a sample from the log. [ 86.816503] ata1: lost interrupt (Status 0x58) [ 86.820070] ata1: drained 32768 bytes to clear DRQ. [ 86.928979] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 86.929068] ata1.01: cmd a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0 [ 86.929070] cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 86.929072] res 58/00:01:00:00:00/00:00:00:00:00/b0 Emask 0x2 (HSM violation) [ 86.929277] ata1.01: status: { DRDY DRQ } [ 86.929353] ata1: soft resetting link This is on an Acer Aspire laptop with a PIIX chipset and an Intel Mobile 945 GMA The hard disk is a PATA WD600UE. See attached kernel log and pci dump.