Bug 216696 - Linux unusable upon plugging encrypted SanDisk Extreme 55AE USB 3.0 SSD, causes xHCI controller crash and drops USB keyboard/mouse
Summary: Linux unusable upon plugging encrypted SanDisk Extreme 55AE USB 3.0 SSD, caus...
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: AMD Linux
: P1 blocking
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-17 04:17 UTC by Kamil Kaminski
Modified: 2022-11-18 02:39 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (10.21 KB, text/plain)
2022-11-17 04:17 UTC, Kamil Kaminski
Details
dmesg B550 mobo (122.21 KB, text/plain)
2022-11-18 01:01 UTC, Kamil Kaminski
Details

Description Kamil Kaminski 2022-11-17 04:17:15 UTC
Created attachment 303189 [details]
dmesg

Plugging in an encrypted 2TB SanDisk Extreme 55AE USB 3.0 Type-C SSD causes xHCI controller crash and dropping USB keyboard/mouse and any other USB device connected to the computer. dmesg gets spammed with following errors:

[    3.359704] sd 1:0:0:0: [sda] Sense Key : Data Protect [current] 
[    3.359706] sd 1:0:0:0: [sda] Add. Sense: Logical unit access not authorized
[    3.378662] sd 1:0:0:0: [sda] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[    3.378664] sd 1:0:0:0: [sda] tag#2 Sense Key : Data Protect [current] 
[    3.378666] sd 1:0:0:0: [sda] tag#2 Add. Sense: Logical unit access not authorized
[    3.378667] sd 1:0:0:0: [sda] tag#2 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[    3.378667] critical target error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0

and finally:
[    8.371890] xhci_hcd 0000:0b:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0018 address=0xffcf0000 flags=0x0020]
[   13.669065] xhci_hcd 0000:0b:00.3: xHCI host not responding to stop endpoint command.
[   13.669070] xhci_hcd 0000:0b:00.3: USBSTS: 0x00000005 HCHalted HSE
[   13.669074] xhci_hcd 0000:0b:00.3: xHCI host controller not responding, assume dead
[   13.669086] xhci_hcd 0000:0b:00.3: HC died; cleaning up
[   13.669116] usb 4-3: cmd cmplt err -108
[   13.669119] usb 4-3: cmd cmplt err -108
[   13.669124] usb 3-2: USB disconnect, device number 2
[   13.669184] usb 4-3: USB disconnect, device number 2

Having the said drive plugged in when booting the Linux or booting a live usb distro such as Fedora 37 causes the xHCI controller to crash and drops important USB devices such as keyboard & mouse strangling the user from being able to type or login to tty or DE.

SanDisk Extreme features a drive encryption, it works fine out of the box on Windows or MacOS, it comes with its own read-only UDF partition where unlock.exe resides that can be launched to unlock (decrypt) the drive revealing a ~2TB exFAT partition.

Please see attached dmesg log file. If this is not the right subsystem to file the bug let me know. I really would like to get my system booting Linux again since I have the SanDisk Extreme plugged in 24/7. Thanks
Comment 1 Mario Limonciello (AMD) 2022-11-17 23:13:05 UTC
Considering there is a page fault, did you already experiment with iommu=pt?
Comment 2 Kamil Kaminski 2022-11-18 00:40:31 UTC
Hi Mario, it did not cross my mind it could be AMD IOMMU related, adding iommu=pt to kernel boot line did in fact make things better! My USB keyboard and mouse are now surviving.
Comment 3 Mario Limonciello (AMD) 2022-11-18 00:42:22 UTC
I don't think that proves it to be an AMD IOMMU bug.  It could be an ACPI IVRS table (BIOS) issue or a driver bug.
Can we see the whole dmesg with iommu=pt?  Besides keyboard and mouse, what happens to the other USB device now?
Comment 4 Kamil Kaminski 2022-11-18 01:01:22 UTC
At the moment I'm on my other system with B550 chipset and Ryzen 5800X and have 2TB SanDisk Extreme 55AE plugged into that, however this system was also exhibiting same exact problem as original (B450 motherboard with Ryzen 5600X), so I can give you a dmesg from current B550 system (see attachment), and yes other USB devices or working fine now (I have an USB SD card reader plugged in and Oculus CV1 and they're working fine)
Comment 5 Kamil Kaminski 2022-11-18 01:01:51 UTC
Created attachment 303203 [details]
dmesg B550 mobo
Comment 6 Mario Limonciello (AMD) 2022-11-18 02:39:34 UTC
So you can see in there the IO_PAGE_FAULT and decode it using https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf p149/p150.

The domain ID 0x18 is selected by the driver, I don't believe it's meaningful in this case.  It seems to me that the device/driver for the device is causing the XHCI controller to try to write to address 0xffcf0000.

Note You need to log in before you can comment on or make changes to this bug.