Bug 14616 - [2.6.32 regression] sata_nv: commit 6489e3262e6b188a1a009b65e8a94b7aa17645b7 slows down system boot
[2.6.32 regression] sata_nv: commit 6489e3262e6b188a1a009b65e8a94b7aa17645b7 ...
Status: CLOSED INVALID
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA
All Linux
: P1 normal
Assigned To: Tejun Heo
:
Depends on: 14329
Blocks: 14230
  Show dependency treegraph
 
Reported: 2009-11-16 19:49 UTC by Artem S. Tashkinov
Modified: 2009-11-20 07:48 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.32
Tree: Mainline
Regression: Yes


Attachments
dmesg for 2.6.31.4 (good) and 2.6.31.6 (bad) (17.85 KB, application/octet-stream)
2009-11-16 19:49 UTC, Artem S. Tashkinov
Details
dmesg for 2.6.32-rc7 (not so good even with libata.force=norst) (9.51 KB, application/octet-stream)
2009-11-18 08:07 UTC, Artem S. Tashkinov
Details

Description Artem S. Tashkinov 2009-11-16 19:49:01 UTC
Created attachment 23804 [details]
dmesg for 2.6.31.4 (good) and 2.6.31.6 (bad)

Kernel known to work: <=2.6.31.5
Kernel known to fail: >=2.6.31.6, 2.6.32

Kernel commit 6489e3262e6b188a1a009b65e8a94b7aa17645b7 introduces a 1 second delay at system boot. Please, fix it.

I have attached dmesg output to show the problem.
Comment 1 Tejun Heo 2009-11-17 05:16:29 UTC
The previous code only worked becaues in most cases BIOS brought up the link already which isn't always the case.  Actually trying to bring the link up is what's taking time there, so I'm afraid we can't remove it because it adds 1 second delay.  If you have special configuration where 1 second delay isn't acceptable and you're sure link resetting is not necessary, please give a shot at libata.force=norst parameter.
Comment 2 Tejun Heo 2009-11-17 05:17:12 UTC
Resolving as WILL_NOT_FIX for now.  Please feel free to add further comment or reopen.

Thanks.
Comment 3 Artem S. Tashkinov 2009-11-17 08:47:18 UTC
Is it possible to check out if SATA link is already up, without trying to bring it up no matter what?

Because speaking frankly I hate adding any kernel boot strings as I see them as dirty hacks which in most cases mean the internal kernel logic doesn't work well or doesn't deal with the real world scenarios.
Comment 4 Artem S. Tashkinov 2009-11-17 08:55:58 UTC
Besides, the original bug deals with resuming after any kind of suspend, that means we don't have to introduce this workaround for the initial boot sequence.
Comment 5 Tejun Heo 2009-11-17 13:59:54 UTC
The standard behavior is to bring up link and reset unconditionally.  It was an obvious bug fix and taking a couple of secs there is acceptable behavior which will suit most users and safely cover most corner cases.  If for whatever reason you can't live with one sec delay there, your requirement is quite different from others, so the suggestion of boot parameter.  It's a normal latency necessary to initialize the hardware and I don't think tinkering with it to reduce 1 sec delay is a good idea.

Thanks.
Comment 6 Artem S. Tashkinov 2009-11-17 14:20:37 UTC
Tejun,

There are no bug reports related to finding/initializing storage media _upon boot_, so why there's a need to introduce this "safe" behaviour even though everything worked before?

If you think this fix is a _MUST_ then feel free to close this bug as INVALID and I'll never bother writing anything here again.
Comment 7 Tejun Heo 2009-11-17 14:37:13 UTC
Because that's how all other SATA drivers behave or the BIOS might choose not to bring up certain links if it's not occupied during BIOS detection and we will be missing hotplug events.  Also, there are cases where BIOS might not be the one right before this boot.  Resume is one case, kexec boot is another.  There are many corner cases so the baseline is to fully initialize the hardware to the best of the driver's knowledge.  It's basically how the whole thing is designed and supposed to behave.

As you can see from the other bug report, when we deviate from the baseline, corner cases start breaking.  Please note that when things break during resume, emergency kexec boot or whatnot, it's usually quite difficult to reproduce, painful to report and diagnose, so risking subtle breakages for 1sec delay reduction doesn't make whole lot of sense.

So, resolving as INVALID.

Hmmm.... one thing we can try is to avoid writing to SControl if the link already seems online but it will be a very risky change which might end up with long and thin regression distribution graph.  With parallel probing, I seriously don't think several secs for port probing matters at all tho.

Thanks.
Comment 8 Artem S. Tashkinov 2009-11-18 06:19:10 UTC
libata.force=norst hack didn't help much.

Boot time was reduced by 0.3 seconds, which is still 0.7 seconds more/longer than without this "safe" patch.
Comment 9 Tejun Heo 2009-11-18 06:25:29 UTC
Hey, that's 30% reduction.  :-)

Joke aside, I wonder what the difference is tho.  Ah, right, with hardreset ruled out, prereset thinks it needs to resume the link so link resuming will still happen.  Hmmm... Adding a parameter to inhibit unconditional link resuming isn't difficult at all but I'm still not convinced this matters at all.  Why does a 1 sec parallel delay matter so much?

Thanks.
Comment 10 Artem S. Tashkinov 2009-11-18 08:07:22 UTC
Created attachment 23818 [details]
dmesg for 2.6.32-rc7 (not so good even with libata.force=norst)

Something booted in almost no time, i.e. perfectly, no it's not that perfect.

Do you see my point? :)
Comment 11 Artem S. Tashkinov 2009-11-20 07:44:50 UTC
So, how can I get back "my" 0.7 seconds?
Comment 12 Tejun Heo 2009-11-20 07:48:23 UTC
Frankly, I don't care.  Changing the default behvaior is too risky.  We can introduce a parameter to skip link resume if link already seems online but how many people are gonna use that?  If you need the .7s that much, just patch and roll your own kernel.

Thanks.

Note You need to log in before you can comment on or make changes to this bug.