Bug 62351

Summary: Marvell PCIe SSD controller 0x9183 suspend/resume problem
Product: IO/Storage Reporter: Patrik Jakobsson (patrik.r.jakobsson)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: NEW ---    
Severity: normal CC: eugenecormier, fresneda, ilya.kuzmich, linux-ide, llandwerlin, szg00000, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg after a suspend/resume

Description Patrik Jakobsson 2013-09-30 07:05:58 UTC
Some Macbook Air 6,2 (2013) models come with a Marvell SATA controller (0x9183) for the PCIe connected SSD. The libahci module loads properly and everything is fine after booting but after resuming from suspend it starts spitting out:

[  882.692469] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  882.693016] ata1.00: unexpected _GTF length (8)
[  882.693524] ata1.00: unexpected _GTF length (8)
[  882.693575] ata1.00: configured for UDMA/33
[  882.693622] ata1: EH complete
[  882.793794] ata1: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
[  882.793798] ata1: irq_stat 0x00400000, PHY RDY changed
[  882.793799] ata1: SError: { PHYRdyChg }
[  882.793803] ata1: hard resetting link

After some time performance degrades to unusable levels and a reboot is required. It seems to try tuning down the link speed and eventually lands on 1.5 Gbps. I've tried adding the device as a "board_ahci_yes_fbs" but that didn't help. I've tried "skip_host_reset" but still nothing.

Any help appreciated

Thanks
Patrik Jakobsson
Comment 1 Patrik Jakobsson 2013-09-30 07:07:03 UTC
Created attachment 109991 [details]
dmesg after a suspend/resume
Comment 2 Tejun Heo 2013-09-30 19:14:58 UTC
Hello, Patrik.

Can you please do "echo min_power > /sys/class/scsi_host/host0/link_power_management_policy" before suspend and see whether anything changes?

Thanks.
Comment 3 Patrik Jakobsson 2013-09-30 21:38:20 UTC
Hi Tejun

Success!

Setting min_power before suspend makes the error go away. Sadly pm-utils sets max_performance so it needs to be configured and for some reason it doesn't take my /etc/pm/config.d/sata_alpm into account. I ended up copying /usr/lib/pm-utils/power.d/sata_alpm to /etc/pm/power.d and forcing it to min_power. Also added a link to it in /etc/pm/sleep.d/ so it sets min_power before suspend.

Strangely the SATA device randomly appears on /sys/class/scsi_host/host0 and ../host1 but I just set them both (the other is USB) and that works fine.

Thanks
Patrik
Comment 4 Tejun Heo 2013-09-30 21:47:14 UTC
So, when the machine boots up libata doesn't touch the LPM setting and just considers it "max_performance", which may or may not be true. I think it'd work the same if you just set the lpm knob to max_performance explicitly. The problem is that your BIOS is most likely configuring DIPM on both the device and host sides during boot; however, after coming back from suspend, the device side seems to be configured but the host side isn't, so the device's LPM operations register as link events to the controller leading to the spurious failures.

Maybe we should set max_performance mode explicitly during boot requiring the user to explicitly set min_power mode if [s]he wants to but then we might cause power regression on some setups.

Anyways, can you please verify echoing max_performance also makes the issue go away?
Comment 5 Patrik Jakobsson 2013-09-30 23:02:29 UTC
Here are all the test cases:

After boot: LPM policy reports max_performance and min_power and max_performance can be switched back and forth without any errors.

Before suspend: If I don't touch LPM policy after boot (still reporting max_performance) I can successfully suspend and LPM policy still reports max_performance (though probably incorrect) after resume.

Before suspend: If I explicity set max_performance after boot and go into suspend it fails. If I set min_power back in this state, the errors stop.

Before suspend: If I explicity set min_power after boot and go into suspend it succeeds.

So I must either never touch the LPM policy or set it to min_power to not get the errors on resume. min_power seems to be what the hardware is set to but doesn't get reported correctly (as you say) until I've actually set it to something.
Comment 6 Tejun Heo 2013-10-01 14:44:21 UTC
So, if you set "max_performance" explicitly, it works fine before suspending but after resuming it causes problems? What happens if you set "max_performance" explicitly instead of "min_power" after resume while the errors are happening? Does that resolve the issue too?

Thanks.
Comment 7 Patrik Jakobsson 2013-10-03 01:40:10 UTC
> So, if you set "max_performance" explicitly, it works fine before suspending
> but after resuming it causes problems?

Yes, the only time I can set max_performance without getting errors is before first suspend.

> What happens if you set "max_performance" explicitly instead of "min_power"
> after resume while the errors are happening? Does that resolve the issue too?

Setting max_performance explicitly after resume doesn't help. And if I first set min_power after resume to stop the errors and then try max_performance again, it will start spitting out errors again.
Comment 8 Lionel Landwerlin 2013-12-26 14:29:09 UTC
*** Bug 67721 has been marked as a duplicate of this bug. ***