Created attachment 73168 [details] Errors from dmesg (kernel 3.2) When /sys/class/scsi_host/host*/link_power_management_policy is set to min_power, ata errors (see below) show up in dmesg whenever there is some significant disk I/O going on. The systems continues to run, but I/O halts for several seconds whenever the error occurs, making it dead-slow. This is on a thinkpad T61 with the following S-ATA controller: 00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 03) The issue did not occur with the 3.0 kernel from ubuntu 11.10, but showed up when running 3.2 after upgrading to ubuntu 12.04. I have tested 3.4-rc5 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc5-precise/, and the error is present here too. First line of the error message from dmesg, see attachment for more: ata1.00: exception Emask 0x10 SAct 0xfe SErr 0x48c0002 action 0xe Ubuntu bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/993507
Created attachment 73169 [details] dmesg errors with 3.4
Created attachment 73170 [details] lspci -vvnn output
Created attachment 73171 [details] /proc/scsi/scsi
Created attachment 73172 [details] /proc/modules
Created attachment 73173 [details] uname -r (3.2 kernel)
I'm seeing this issue on a ThinkPad X1 Carbon with Linux 3.6.5 x86_64. 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
Hello, a similar issue has been fixed this week! If the drive is broken we will fix your issue soon. Please attach the output of sudo hdparm -C /dev/sd* and sudo hdparm -i --Istdout /dev/sda* where * is your disk drive letter. With that data we will create a patch to workaround your issue.
$ sudo hdparm -C /dev/sda: drive state is: active/idle $ sudo hdparm -i --Istdout /dev/sda1: Model=HITACHI HTS541612J9SA00, FwRev=SBDIC7JP, SerialNo=SB2D51EVG4M3UE Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=DualPortCache, BuffSize=7516kB, MaxMultSect=16, MultSect=off CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=yes: mode=0x80 (128) WriteCache=enabled Drive conforms to: ATA/ATAPI-7 T13 1532D revision 1: ATA/ATAPI-2,3,4,5,6,7 * signifies the current active mode 045a 3fff c837 0010 0000 0000 003f 0000 0000 0000 2020 2020 2020 5342 3244 3531 4556 4734 4d33 5545 0003 3ab8 0004 5342 4449 4337 4a50 4849 5441 4348 4920 4854 5335 3431 3631 324a 3953 4130 3020 2020 2020 2020 2020 2020 2020 2020 2020 8010 0000 0f00 4000 0200 0200 0007 3fff 0010 003f fc10 00fb 0100 4bb0 0df9 0000 0007 0003 0078 0078 0078 0078 0000 0000 0000 0000 0000 0000 001f 0702 0000 005e 0044 00fc 001a 746b 7f09 6163 7469 3c09 6163 203f 0024 0000 40fe fffe 0000 80fe 0000 0000 0000 0000 0000 4bb0 0df9 0000 0000 0000 0000 0000 8848 5000 cca5 4dc2 1945 0000 0000 0000 0000 0000 0000 0000 4004 4004 0000 0000 0000 0000 0000 0000 0000 0001 000b 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 4005 4000 8000 0000 4449 0000 0000 5858 5858 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 72a5
Can you please also post full dmesg output including the errors? Thanks.
Your 5K160 drive is from the 2006 era so it is in the danger zone for blacklist. Since Tejun focuses on errors run sudo smartctl -x /dev/sda after boot. Then sudo smartctl -a /dev/sda and again sudo smartctl -x /dev/sda. Those will give us an idea of your drive's health.
Created attachment 124201 [details] Full dmesg from boot to error message
Created attachment 124211 [details] Output of requested smartctl commands
See the last two attachments for the requested output. Here's what I did: 1. reboot 2. run requested smartctl commands 3. echo min_power |sudo tee /sys/class/scsi_host/host*/link_power_management_policy 4. start thunderbird to generate disk activity and provoke the requested error message 5. dump dmesg to file, including error messages Current kernel version (uname -r): 3.8.0-25-generic
Hmmm... In the initial report, you said that the problem didn't occur with 3.0 but started appearing with 3.2. I've gone through the changes in that time period but can't spot anything which may affect lpm related issues. Would it be possible for you to verify with 3.0 kernel that the errors definitely don't occur there? And if so, would it be possible for you to bisect the kernels between 3.0 and 3.2? If the errors are reliably reproducible, while somewhat laborious, it shouldn't be too difficult. Thanks.
I've bisected kernel versions from http://kernel.ubuntu.com/~kernel-ppa/mainline. I assume these are unmodified upstream builds (their wiki says "All of the upstream kernels are published at http://kernel.ubuntu.com/~kernel-ppa/mainline/"). It seems the bug was introduced with the 3.1 kernel. I was not able to reproduce the bug with 3.0.101, but it appeared immediately with 3.1.0. I will proceed with commit bisecting as soon as I find some time for it.
This bug relates to a very old kernel. Closing as obsolete.