Bug 201151
Summary: | illegal qc_active transition prevents optical drive detection | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | tones111 |
Component: | Serial ATA | Assignee: | Tejun Heo (tj) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | axboe |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.18.* | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Ignore ATA_TAG_INTERNAL and 0 both being set
Use hw tag for ap->qc_active mask Ignore 0/32 both being set Mask swap dmesg output after applying patch "Ignore 0/32 both being set" lspci output Mask swap Mask swap v2 |
Description
tones111
2018-09-16 14:08:49 UTC
The regression exists in v4.19-rc3. I've managed to bisect the problem to... commit 28361c403683c2b00d4f5e76045f3ccd299bf99d Author: Jens Axboe <axboe@kernel.dk> Date: Fri May 11 12:51:09 2018 -0600 libata: add extra internal command Hmm, that's a little odd. The port active mask shows tag 32 active, which is hw tag 0. For some reason, tag 0 is set on the ahci side. Can you try the patch I'm about to attach? Created attachment 278633 [details]
Ignore ATA_TAG_INTERNAL and 0 both being set
Can you also provide a full dmesg? Actually, I think we should just mark the active mask with the real hw tag instead. This should work for both the normal case, as well as yours where we end up seeing the internally mapped tag show up in the done mask. Created attachment 278635 [details]
Use hw tag for ap->qc_active mask
Back to square one, please just try the original patch, patch #2 can't possibly work... I'll add it again, looks like there's no way to mark it not-obsolete. Created attachment 278637 [details]
Ignore 0/32 both being set
Looking at the code, from the various callers of ata_qc_complete_multiple(), only sil24 can actually trigger this. Since you're going to be testing various patches anyway, I'm going to toss another one into the mix for you to test... Created attachment 278641 [details]
Mask swap
Created attachment 278649 [details]
dmesg output after applying patch "Ignore 0/32 both being set"
Created attachment 278651 [details]
lspci output
I tried both patches, in isolation, against 4.19-rc3. The "Ignore 0/32 both being set" patch resolves the issue for me. The "Mask swap" patch did not help. I've attached the dmesg output from the boot with the good patch as well as some lspci output in case that helps. thanks for the fast response! After adding some debug printouts, it appears tag 0 is getting set and passed via qc_active during ata_cq_complete_multiple in libahci.c static void ahci_handle_port_interrupt(...) { ... if (pp->fbs_enabled) { if (ap->qc_active) { qc_active = readl(port_mmio + PORT_SCR_ACT); qc_active |= readl(port_mmio + PORT_CMD_ISSUE); } } else { /* pp->active_link is valid iff any command is in flight */ if (ap->qc_active && pp->active_link->sactive) qc_active = readl(port_mmio + PORT_SCR_ACT); else qc_active = readl(port_mmio + PORT_CMD_ISSUE); // <<<---- HERE } rc = ata_qc_complete_multiple(ap, qc_active); ... I'm not familiar with this code, so I'm not sure if that would be expected or not. I think the best/safest is just to mask the hw invalid tag32 and set tag0, if tag32 is set. Can you confirm this one works too? Created attachment 278671 [details]
Mask swap
Created attachment 278673 [details]
Mask swap v2
Added a newer version, sorry for all the various versions... But I believe this one should work, as it just mirrors the right tag between the internal and hardware mask. I'm curious if it works for your case though, as we don't expect bit0 to be set in the hardware mask. It should be cleared, which means that command is done. It'll work for you if it is indeed just a spurious interrupt. If it doesn't work, then we might need to add some more debugging to figure out wtf is going wrong here. The first "mask swap" patch breaks communication with my hard drive, so it drops out to a prompt. The second patch ("mask swap v2") is working for me. Thanks again Ok great, that makes me feel better. Can I add your tested-by to the patch?
> On Sep 19, 2018, at 7:59 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201151
>
> --- Comment #19 from tones111@hotmail.com ---
> The first "mask swap" patch breaks communication with my hard drive, so it
> drops out to a prompt.
>
> The second patch ("mask swap v2") is working for me.
>
> Thanks again
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
|