Bug 214983
Summary: | KIOXIA KBG40ZNV256G NVME SSD killed by resume from s2idle | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | pbs3141 |
Component: | NVMe | Assignee: | IO/NVME Virtual Default Assignee (io_nvme) |
Status: | NEW --- | ||
Severity: | normal | CC: | kbusch |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.15.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lspci output
nvme output full dmesg of affected system relocate hmb disabling on s2idle dmesg my crappy printk patch dmesg output after my crappy printk patch dmesg patch |
Description
pbs3141
2021-11-10 18:00:10 UTC
Created attachment 299505 [details]
lspci output
Created attachment 299507 [details]
nvme output
Created attachment 299509 [details]
full dmesg of affected system
The dmesg says your controller is in a fatal status, but a controller reset is supposed to clear it. It's not clearing it though. The acpi firmware tells the driver to use the "simple" suspend to prepare for D3, and the driver is going to honor that. It also looks like you've disabled APST, so we can rule that out as well. I'm not sure at the moment what else to try, I may come back with a debug patch later. Created attachment 299515 [details]
relocate hmb disabling on s2idle
So, I notice your device has an HMB. The spec recommends a host disable HMB prior to shutting down the controller, but it's not required by spec, and this driver doesn't do that. I wonder if this controller requires the recommended sequence...
I've attached an experimental patch that will disable HMB first. Could you see if this is successful?
I tested your patch, and although the 30 second freeze had now gone, the drive still disconnects after the 30 seconds. I've attached a new dmesg. Sorry about the slight delay, I'm in UTC+9. I also had some teething problems with the kernel build; subsequent builds should be much faster. Created attachment 299521 [details]
dmesg
The dmesg actually looks the same as before. There's still a memory access io fault around the suspend sequence, and controller still reports fatal status on resume. I was hoping the io fault was hmb access that the patch could have prevented, but can not tell what the fault is about just from the dmesg. It looks like the kernel version is the same as before the patch, though. Are you sure it's applied? If you're building from a git repo, there should be a '+' since there's more commits beyond the tagged kernel. I guess we could add a printk to make absolutely sure it's definitely running the patch. I suspected so too at first, but I just threw in a printk like you said and it appears in the dmesg, so the patch is definitely running. I've attached my crappy printk patch and dmesg for completeness, though there's nothing interesting in either of them. My Comment 6 is also a red herring. The freeze disappearing was not due to the patch, it was due to me changing the set of the commands I had run before suspending. Other notes ----------- - For anyone else who stumbles up this thread in the future, it is necessary to run mount, dmesg and cat all at least once before suspending. Otherwise you will not be able to use them later to write out the dmesg after the error. Created attachment 299523 [details]
my crappy printk patch
Created attachment 299525 [details]
dmesg output after my crappy printk patch
Thanks, the print was helpful, but we may need more before the return statements. It looks like the io fault happens after the nvme driver completed its suspend sequence. If that is the case, the device has no business accessing host memory, especially since we moved the hmb disable ahead of shutdown. I am assuming the io faults are related to the controller fatal status. If that assumption is correct, I'm not sure what we can do from the driver to help since it happens after the driver completes shutdown. We may need someone from Kioxa to explain the stuck csts.cfs bit. Ok, I'll stuff it with printks and see what happens :D I wrapped every line in pre/post printks and logged return values, and the result was this: [ 137.679215] Pre: ret = nvme_set_host_mem(ndev, 0); [ 137.695890] Post: ret = nvme_set_host_mem(ndev, 0); [ 137.695893] 0 [ 137.695895] Pre: return nvme_disable_prepare_reset(ndev, true); [ 137.837283] ACPI: EC: interrupt blocked [ 137.933840] ACPI: button: The lid device is not compliant to SW_LID. [ 137.943184] ACPI: EC: interrupt unblocked [ 137.974130] nvme 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0009 address=0xc8961000 flags=0x0000] [ 137.974134] nvme 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0009 address=0xc8961000 flags=0x0000] So the page fault is still happening after the suspend. The rest of the dmesg, as well as the patch that generated it, are attached. Created attachment 299533 [details]
dmesg
Created attachment 299535 [details]
patch
it's not completely clear since you "surround" a 'return' statement, so the post message never gets printed. However, I suspect it does complete prior to the page fault event. Now, I am speculating that the fault and the controller fatal status are related, but there's no way I can confirm that. Going back through your dmesg.... The fault always happens on the same address, c8961000. Where on earth is the device getting this address from, and why is it accessing it at this point? The address falls within this e820 range: BIOS-e820: [mem 0x00000000c76d7000-0x00000000ccffdfff] reserved So it's not usable. Something very bizarre is happening here: where did the device get this address, and why is it accessing it after the driver shut it down? I'm not sure there's anything we can do from the driver side to help here. > It's not completely clear since you "surround" a 'return' statement, so the > post message never gets printed. However, I suspect it does complete prior to > the page fault event. It was cavalier of me not to surround the return statement. But your suspicion is correct, I just fixed the loophole and got [ 67.967564] Pre: ret = nvme_set_host_mem(ndev, 0); [ 67.981593] Post: ret = nvme_set_host_mem(ndev, 0); [ 67.981598] ret = 0 [ 67.981601] Pre: ret = nvme_disable_prepare_reset(ndev, true); [ 68.128321] Post: ret = nvme_disable_prepare_reset(ndev, true); > So it's not usable. Something very bizarre is happening here: where did the > device get this address, and why is it accessing it after the driver shut it > down? I'm not sure there's anything we can do from the driver side to help > here. So, how to take it from here? The immediate options I can see are - Ask KIOXIA about the stuck bit and the page fault, related or not. For this, the contact victor.gladkov@kioxia.com may be useful, having posted on this list in the past. If KIOXIA don't have a clue, ask HP. - Figure out what the kernel does to trigger the page fault. Perhaps this would suggest moving the discussion to another kernel subsystem, where further progress could be made. Reaching out to Kioxa is probably the most reasonable approach at this point. Ok, I dropped them a message at their technical enquiry page, https://customer-us.kioxia.com/inquiry/product The message says The KBG40ZNV256G SSD drive is behaving oddly under Linux, at least on the HP 14-fq1021nr laptop, resulting in broken suspend. The odd behaviour is as follows: 1. When the laptop wakes up from suspend, the drive is stuck in fatal status 0x3. The error bit remains stuck even after a controller reset. This leads to the drive not being usable after suspend. 2. Just before the laptop goes into suspend, the drive generates a page fault on address c8961000, which falls within a reserved address range according to the BIOS. This may or may not be helpful in diagnosing the first problem. We'd be very grateful for your assistance in explaining this behaviour. The corresponding discussion on kernel.org is https://bugzilla.kernel.org/show_bug.cgi?id=214983. Sorry if I've misrepresented what you've said! Thanks, your message sounds good to me. Following some obscure advice I came across on the arch wiki, I found that booting with iommu=soft fixes the issue, though this can only be considered a temporary workaround until the real issue is fixed. So you were right that the page fault was related. Does this help to understand what's going on any better? IIUC, that option should have the kernel bounce the destination through a valid address, so i guess soft iommu makes sense why it'd cure the io page fault. This option is outside the driver, though, so I'm a little outside my domain here. If you want to take this to the mailing list, the linux-nvme@lists.infradead.org may have more knowledgable people. |