Bug 43039

Summary: Acer Aspire 5560G Laptop: link online but device misclassified: only with some kernel configs
Product: IO/Storage Reporter: Matthew Stapleton (matthew4196)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: NEW ---    
Severity: normal CC: alan, jlee, szg0000, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0.0 - 3.3.1 Tree: Mainline
Regression: No
Attachments: dmesg.txt

Description Matthew Stapleton 2012-04-04 01:48:41 UTC
I am getting "ata1: link is slow to respond, please be patient (ready=0)" and then after failing to softreset, it gets "ata1.00: link online but device misclassified" on a laptop: Aspire 5560G laptop with an SSD but only on the Ubuntu standard kernel config, not my custom kernel configs that I use on servers that I've setup. Once this happens the laptop has to be power cycled to get the SSD working again.

The SSD is a AR120GBE and the AHCI controller is 1022:7804, subsystem: 1025:059f

At first I thought the problem was with the Ubuntu kernel, but the timeout also occurs on the mainline kernels: 3.2.0 and 3.2.14 with no added patches but using the Ubuntu kernel config.  I am using the following method to test the ubuntu config:
make mrproper && make menuconfig
  then I load the config and save to .config
make && make modules
  then I copy arch/x86/boot/bzImage to a network boot option with a custom minimal initrd and boot with the laptop.

The Ubuntu bug url is https://bugs.launchpad.net/ubuntu/+source/linux/+bug/965863
Comment 1 Matthew Stapleton 2012-04-04 01:50:27 UTC
Created attachment 72803 [details]

This a dmesg log from one of the Ubuntu kernels with the timeout problem
Comment 2 Matthew Stapleton 2012-04-04 01:52:11 UTC
Created attachment 72804 [details]

standard Desktop Ubuntu kernel config that causes timeouts
Comment 3 Matthew Stapleton 2012-04-04 01:53:30 UTC
Created attachment 72805 [details]

My custom server kernel config for kernel series 3.0 that works without timeouts on the laptop.
Comment 4 Matthew Stapleton 2012-04-04 01:55:20 UTC
Created attachment 72806 [details]

My custom hardened server kernel config for kernel series 3.2 that works without timeouts on the laptop.  Even though this is configured for grsecurity I just loaded it on mainline 3.2.14 without any extra patches for this laptop timeout test.
Comment 5 Tejun Heo 2012-04-04 16:58:48 UTC
Hmmm... this could have been caused by the recent engine start change. Can you please try 3.3.1?

Comment 6 Matthew Stapleton 2012-04-05 00:37:06 UTC
Same timeout with 3.3.1 and Ubuntu config.  Works okay with custom Hardened 3.2 config.  I also have also just tried 3.0 mainline with Ubuntu 3.2.0-21=generic config and got the timeout.
Comment 7 Matthew Stapleton 2012-04-16 00:26:17 UTC
Created attachment 72929 [details]

Tried another config which disables EFI and EDD options and still getting the timeout.
Comment 8 Matthew Stapleton 2012-04-18 05:57:16 UTC
On kernel 3.0 x86_64, I think I have narrowed the problem down.  Setting CONFIG_HZ from 1000 to 250 causes the timeout if I try the load kernel from a power cycle boot up.  Sometimes I don't get the timeout if the kernel was loaded from a soft reboot.  Is there anything immediately obvious why changing CONFIG_HZ would cause ahci ports to timeout?
Comment 9 Matthew Stapleton 2012-04-18 07:26:42 UTC
I just tried a few more x64_64 kernel builds with various configs.  Configs are attached below.
Comment 10 Matthew Stapleton 2012-04-18 07:30:59 UTC
Created attachment 72949 [details]

Tried this config with Mainline 3.3.2 and got the timeout.  I also tested the same config with CONFIG_HZ changed to 1000 and 100 and both those HZ configs didn't timeout.
Comment 11 Matthew Stapleton 2012-04-18 07:33:53 UTC
Created attachment 72950 [details]

Tried this config with Mainline 3.4-rc3 and got the timeout.  I also tested the same config with CONFIG_HZ changed to 1000 and that didn't timeout.
Comment 12 Matthew Stapleton 2012-04-19 02:40:24 UTC
I just tried kernel: 2.6.27 with CONFIG_HZ=250 and got the timeout there as well.  Also, trying nohz=off on the command line doesn't help.
Comment 13 Matthew Stapleton 2012-05-28 00:56:41 UTC
Is there any more info I can provide to help solve the problem?
Comment 14 Alan 2012-09-04 12:45:00 UTC
Not that I can think of - other than classifying your machine is having utterly outweirded us not much obvious to progress this.