Bug 60596 - SATA Multiplier Timeout instead of automatic spin-up severly prolongs boot
Summary: SATA Multiplier Timeout instead of automatic spin-up severly prolongs boot
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jeff Garzik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-21 10:26 UTC by Rene Schickbauer
Modified: 2013-11-12 14:57 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.8.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Patch: Disable Soft resets (472 bytes, patch)
2013-07-21 10:26 UTC, Rene Schickbauer
Details | Diff

Description Rene Schickbauer 2013-07-21 10:26:49 UTC
Created attachment 106976 [details]
Patch: Disable Soft resets

I have an external housing with 15 disks, (with 3 SATA multipliers a 5 disks each as far as i can tell).

[ 5.544035] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 5.544331] ata4.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[ 5.545460] ata4.00: hard resetting link

This housing is configured to NOT spin up the disks on power on.

Booting up take extremly long, on the order of 5 minutes or so. The problem is, instead of spinning up the disks via software, after accessing each disk for the first time, the kernel goes into an error condition and starts from the beginning:

[ 16.428193] ata4.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 16.428225] ata4.02: hard resetting link
[ 26.428023] ata4.02: softreset failed (timeout)
[ 29.428038] ata4.15: qc timeout (cmd 0xe4)
[ 29.428051] ata4.02: failed to read SCR 0 (Emask=0x5)
[ 29.428055] ata4.02: reset failed, giving up
[ 29.428104] ata4.15: hard resetting link
[ 39.436023] ata4.15: softreset failed (timeout)
[ 39.436068] ata4.15: hard resetting link
[ 41.636029] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 41.636260] ata4.00: hard resetting link
[ 41.988175] ata4.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 41.988201] ata4.01: hard resetting link
[ 42.340177] ata4.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 42.340203] ata4.02: hard resetting link
[ 42.692175] ata4.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 42.692200] ata4.03: hard resetting link
[ 52.692023] ata4.03: softreset failed (timeout)
[ 55.692028] ata4.15: qc timeout (cmd 0xe4)
....
and so on.

Looking at the storage housing, i can see that the kernel tries to access the first disk (and i hear it spinning up) and the screen show immediatly a timeout. Then nothing happens for a number of seconds, the kernel accesses the first disk again (succeding) and times out while the seconds disk is spinning up and so on.

After the first five disks are spun up (first multiplier), the kernel accesses all disks once more (after the last multiplier reset) and succeeds, and then gets stuck on the second multiplier. And then on the third...

I can't be certain of that, but i'm pretty sure i hear the already spun up disks parking/unparking the heads when the multiplier gets reset (which can't be good, either).

I think the correct solution would be to implement a staggered spin-up of the disks. Like sending a spin-up command (or whatever the kernel usually uses to wake up disks) to all ports of the multiplier, delayed by 250ms or so, before doing the the disk detection. This should speed up boot considerably and will most likely reduce wear on the disks.

I managed to get the boot delays down to a few seconds by disabling soft resets on the multiplexer (see attached patch). This is the same workaround that's already in the kernel for some other multiplexers.

[ 6.990717] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[ 9.116035] ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 9.116322] ata10.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[ 9.117397] ata10.00: hard resetting link
[ 9.436262] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 9.436343] ata10.01: hard resetting link
[ 9.756282] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 9.756369] ata10.02: hard resetting link
[ 9.814423] Adding 4191228k swap on /dev/sdb5. Priority:-1 extents:1 across:4191228k
[ 10.076264] ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 10.076350] ata10.03: hard resetting link
[ 10.396281] ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 10.396364] ata10.04: hard resetting link
[ 10.451568] EXT4-fs (sdb1): re-mounted. Opts: errors=remount-ro
[ 10.716264] ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 10.716350] ata10.05: hard resetting link
[ 11.036265] ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[ 12.191655] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[ 12.194589] XFS (sda): Mounting Filesystem


This bug was originally reported to uBuntu via Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1193809

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-23-generic 3.8.0-23.34
ProcVersionSignature: Ubuntu 3.8.0-23.34-generic 3.8.11
Uname: Linux 3.8.0-23-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.9.2-0ubuntu8.1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Sun Jun 23 12:38:49 2013
HibernationDevice: RESUME=UUID=40dffe0d-945c-43d2-b196-783d72d41937
InstallationDate: Installed on 2013-01-28 (145 days ago)
InstallationMedia: Xubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.1)
IwConfig:
 eth5 no wireless extensions.

 eth6 no wireless extensions.

 lo no wireless extensions.
Lsusb:
 Bus 002 Device 002: ID 046d:c018 Logitech, Inc. Optical Wheel Mouse
 Bus 002 Device 003: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: empty empty
MarkForUpload: True
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-23-generic root=UUID=9bbfc83a-95f8-413a-b412-5b2091205864 ro
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-23-generic N/A
 linux-backports-modules-3.8.0-23-generic N/A
 linux-firmware 1.106
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to raring on 2013-04-26 (57 days ago)
dmi.bios.date: 10/22/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 'V2.06 '
dmi.board.asset.tag: empty
dmi.board.name: S2932/S2932-E/S2932-SI
dmi.board.vendor: TYAN Computer Corporation
dmi.board.version: empty
dmi.chassis.asset.tag: empty
dmi.chassis.type: 3
dmi.chassis.vendor: empty
dmi.chassis.version: empty
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr'V2.06':bd10/22/2009:svnempty:pnempty:pvrempty:rvnTYANComputerCorporation:rnS2932/S2932-E/S2932-SI:rvrempty:cvnempty:ct3:cvrempty:
dmi.product.name: empty
dmi.product.version: empty
dmi.sys.vendor: empty
Comment 1 Tejun Heo 2013-07-22 19:58:33 UTC
The default behavior for SATA PMP has recently changed such that SRST is not used, so the behavior w/ NO_SRST is good enough, things should be okay by default soonish.

Thanks.

Note You need to log in before you can comment on or make changes to this bug.