I am getting "ata1: link is slow to respond, please be patient (ready=0)" and then after failing to softreset, it gets "ata1.00: link online but device misclassified" on a laptop: Aspire 5560G laptop with an SSD but only on the Ubuntu standard kernel config, not my custom kernel configs that I use on servers that I've setup. Once this happens the laptop has to be power cycled to get the SSD working again.
The SSD is a AR120GBE and the AHCI controller is 1022:7804, subsystem: 1025:059f
At first I thought the problem was with the Ubuntu kernel, but the timeout also occurs on the mainline kernels: 3.2.0 and 3.2.14 with no added patches but using the Ubuntu kernel config. I am using the following method to test the ubuntu config:
make mrproper && make menuconfig
then I load the config and save to .config
make && make modules
then I copy arch/x86/boot/bzImage to a network boot option with a custom minimal initrd and boot with the laptop.
The Ubuntu bug url is https://bugs.launchpad.net/ubuntu/+source/linux/+bug/965863
Created attachment 72803 [details]
This a dmesg log from one of the Ubuntu kernels with the timeout problem
Created attachment 72804 [details]
standard Desktop Ubuntu kernel config that causes timeouts
Created attachment 72805 [details]
My custom server kernel config for kernel series 3.0 that works without timeouts on the laptop.
Created attachment 72806 [details]
My custom hardened server kernel config for kernel series 3.2 that works without timeouts on the laptop. Even though this is configured for grsecurity I just loaded it on mainline 3.2.14 without any extra patches for this laptop timeout test.
Hmmm... this could have been caused by the recent engine start change. Can you please try 3.3.1?
Same timeout with 3.3.1 and Ubuntu config. Works okay with custom Hardened 3.2 config. I also have also just tried 3.0 mainline with Ubuntu 3.2.0-21=generic config and got the timeout.
Created attachment 72929 [details]
Tried another config which disables EFI and EDD options and still getting the timeout.
On kernel 3.0 x86_64, I think I have narrowed the problem down. Setting CONFIG_HZ from 1000 to 250 causes the timeout if I try the load kernel from a power cycle boot up. Sometimes I don't get the timeout if the kernel was loaded from a soft reboot. Is there anything immediately obvious why changing CONFIG_HZ would cause ahci ports to timeout?
I just tried a few more x64_64 kernel builds with various configs. Configs are attached below.
Created attachment 72949 [details]
Tried this config with Mainline 3.3.2 and got the timeout. I also tested the same config with CONFIG_HZ changed to 1000 and 100 and both those HZ configs didn't timeout.
Created attachment 72950 [details]
Tried this config with Mainline 3.4-rc3 and got the timeout. I also tested the same config with CONFIG_HZ changed to 1000 and that didn't timeout.
I just tried kernel: 2.6.27 with CONFIG_HZ=250 and got the timeout there as well. Also, trying nohz=off on the command line doesn't help.
Is there any more info I can provide to help solve the problem?
Not that I can think of - other than classifying your machine is having utterly outweirded us not much obvious to progress this.