Running any 4.18.* kernel I see the following messages early in the boot sequence and my system is unable to detect (no /dev/sr0 created) my optical drive. journalctl -b excerpt: Sep 15 19:57:40 mobile kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Sep 15 19:57:40 mobile kernel: ata2: illegal qc_active transition (100000000->00000001) Sep 15 19:57:40 mobile kernel: ata2.00: ATAPI: MATSHITADVD-RAM UJ8C2 S, 1.00, max UDMA/133 Sep 15 19:57:40 mobile kernel: ata2.00: failed to set xfermode (err_mask=0x40) Sep 15 19:57:40 mobile kernel: usb 4-2: New USB device found, idVendor=04f2, idProduct=b40a, bcdDevice=69.64 Sep 15 19:57:40 mobile kernel: usb 4-2: New USB device strings: Mfr=3, Product=1, SerialNumber=2 Sep 15 19:57:40 mobile kernel: usb 4-2: Product: USB2.0 HD UVC WebCam Sep 15 19:57:40 mobile kernel: usb 4-2: Manufacturer: Chicony Electronics Co.,Ltd. Sep 15 19:57:40 mobile kernel: usb 4-2: SerialNumber: 0x0001 Sep 15 19:57:40 mobile kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Sep 15 19:57:40 mobile kernel: ata2: illegal qc_active transition (100000000->00000001) Sep 15 19:57:40 mobile kernel: ata2.00: failed to set xfermode (err_mask=0x40) Sep 15 19:57:40 mobile kernel: ata2.00: limiting speed to UDMA/133:PIO3 Sep 15 19:57:40 mobile kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Sep 15 19:57:40 mobile kernel: ata2: illegal qc_active transition (100000000->00000001) Sep 15 19:57:40 mobile kernel: ata2.00: configured for UDMA/133 Sep 15 19:57:40 mobile kernel: ata2.00: TEST_UNIT_READY failed (err_mask=0x40) Sep 15 19:57:40 mobile kernel: ata2.00: disabled Sep 15 19:57:40 mobile kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Booting a 4.17.* kernel does not exhibit this behavior and my optical drive is detected and works fine. journalctl -b excerpt from a working 4.17.13-arch1-1-ARCH kernel: Sep 16 08:40:28 mobile kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Sep 16 08:40:28 mobile kernel: ata1.00: ATA-8: HGST HTS545050A7E680, GG2OAF10, max UDMA/133 Sep 16 08:40:28 mobile kernel: ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA Sep 16 08:40:28 mobile kernel: ata1.00: configured for UDMA/133 Sep 16 08:40:28 mobile kernel: scsi 0:0:0:0: Direct-Access ATA HGST HTS545050A7 AF10 PQ: 0 ANSI: 5 Sep 16 08:40:28 mobile kernel: ata2.00: ATAPI: MATSHITADVD-RAM UJ8C2 S, 1.00, max UDMA/133 Sep 16 08:40:28 mobile kernel: ata2.00: configured for UDMA/133 Sep 16 08:40:28 mobile kernel: scsi 1:0:0:0: CD-ROM MATSHITA DVD-RAM UJ8C2 S 1.00 PQ: 0 ANSI: 5 Sep 16 08:40:28 mobile kernel: sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB) Sep 16 08:40:28 mobile kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks Sep 16 08:40:28 mobile kernel: sd 0:0:0:0: [sda] Write Protect is off Sep 16 08:40:28 mobile kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Sep 16 08:40:28 mobile kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 16 08:40:28 mobile kernel: tsc: Refined TSC clocksource calibration: 1497.192 MHz Sep 16 08:40:28 mobile kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1594c60bad9, max_idle_ns: 440795256279 ns Sep 16 08:40:28 mobile kernel: clocksource: Switched to clocksource tsc Sep 16 08:40:28 mobile kernel: sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray Sep 16 08:40:28 mobile kernel: cdrom: Uniform CD-ROM driver Revision: 3.20 Sep 16 08:40:28 mobile kernel: sr 1:0:0:0: Attached scsi CD-ROM sr0 A possibly interesting wrinkle is that I can sometimes boot a 4.18 kernel and early-exit the bootloader countdown, see fewer illegal transition messages, and have a /dev/sr0 once booted. However, after the system wakes from sleep the optical drive is no longer functional.
The regression exists in v4.19-rc3. I've managed to bisect the problem to... commit 28361c403683c2b00d4f5e76045f3ccd299bf99d Author: Jens Axboe <axboe@kernel.dk> Date: Fri May 11 12:51:09 2018 -0600 libata: add extra internal command
Hmm, that's a little odd. The port active mask shows tag 32 active, which is hw tag 0. For some reason, tag 0 is set on the ahci side. Can you try the patch I'm about to attach?
Created attachment 278633 [details] Ignore ATA_TAG_INTERNAL and 0 both being set
Can you also provide a full dmesg?
Actually, I think we should just mark the active mask with the real hw tag instead. This should work for both the normal case, as well as yours where we end up seeing the internally mapped tag show up in the done mask.
Created attachment 278635 [details] Use hw tag for ap->qc_active mask
Back to square one, please just try the original patch, patch #2 can't possibly work... I'll add it again, looks like there's no way to mark it not-obsolete.
Created attachment 278637 [details] Ignore 0/32 both being set
Looking at the code, from the various callers of ata_qc_complete_multiple(), only sil24 can actually trigger this. Since you're going to be testing various patches anyway, I'm going to toss another one into the mix for you to test...
Created attachment 278641 [details] Mask swap
Created attachment 278649 [details] dmesg output after applying patch "Ignore 0/32 both being set"
Created attachment 278651 [details] lspci output
I tried both patches, in isolation, against 4.19-rc3. The "Ignore 0/32 both being set" patch resolves the issue for me. The "Mask swap" patch did not help. I've attached the dmesg output from the boot with the good patch as well as some lspci output in case that helps. thanks for the fast response!
After adding some debug printouts, it appears tag 0 is getting set and passed via qc_active during ata_cq_complete_multiple in libahci.c static void ahci_handle_port_interrupt(...) { ... if (pp->fbs_enabled) { if (ap->qc_active) { qc_active = readl(port_mmio + PORT_SCR_ACT); qc_active |= readl(port_mmio + PORT_CMD_ISSUE); } } else { /* pp->active_link is valid iff any command is in flight */ if (ap->qc_active && pp->active_link->sactive) qc_active = readl(port_mmio + PORT_SCR_ACT); else qc_active = readl(port_mmio + PORT_CMD_ISSUE); // <<<---- HERE } rc = ata_qc_complete_multiple(ap, qc_active); ... I'm not familiar with this code, so I'm not sure if that would be expected or not.
I think the best/safest is just to mask the hw invalid tag32 and set tag0, if tag32 is set. Can you confirm this one works too?
Created attachment 278671 [details] Mask swap
Created attachment 278673 [details] Mask swap v2
Added a newer version, sorry for all the various versions... But I believe this one should work, as it just mirrors the right tag between the internal and hardware mask. I'm curious if it works for your case though, as we don't expect bit0 to be set in the hardware mask. It should be cleared, which means that command is done. It'll work for you if it is indeed just a spurious interrupt. If it doesn't work, then we might need to add some more debugging to figure out wtf is going wrong here.
The first "mask swap" patch breaks communication with my hard drive, so it drops out to a prompt. The second patch ("mask swap v2") is working for me. Thanks again
Ok great, that makes me feel better. Can I add your tested-by to the patch? > On Sep 19, 2018, at 7:59 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=201151 > > --- Comment #19 from tones111@hotmail.com --- > The first "mask swap" patch breaks communication with my hard drive, so it > drops out to a prompt. > > The second patch ("mask swap v2") is working for me. > > Thanks again > > -- > You are receiving this mail because: > You are on the CC list for the bug.