Bug 215679 - NVMe/writeback wb_workfn/blocked for more than 30 seconds
Summary: NVMe/writeback wb_workfn/blocked for more than 30 seconds
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: NVMe (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: IO/NVME Virtual Default Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-13 20:13 UTC by Imre Deak
Modified: 2022-05-13 14:31 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.17.0-rc7
Tree: Mainline
Regression: Yes


Attachments
dmesg log after suspend resume, io stuck (80.18 KB, text/plain)
2022-03-13 20:13 UTC, Imre Deak
Details
dmesg log after suspend resume, io ok (68.33 KB, text/plain)
2022-03-13 20:15 UTC, Imre Deak
Details
attachment-31105-0.html (1.32 KB, text/html)
2022-05-06 12:33 UTC, keith.busch
Details
dmesg with the timeout on 5.18.0-rc5 (62.17 KB, patch)
2022-05-06 18:29 UTC, Imre Deak
Details | Diff
dmesg without the timeout on 5.18.0-rc5 (66.02 KB, patch)
2022-05-06 18:30 UTC, Imre Deak
Details | Diff

Description Imre Deak 2022-03-13 20:13:53 UTC
Created attachment 300564 [details]
dmesg log after suspend resume, io stuck

After system suspend/resume filesystem IO will stall, producing a 'kworker blocked for more than x sec" in dmesg, recovering after a long delay. See the attached dmesg-suspend-resume-nvme-stuck.txt. I also noticed the same issue happening right after booting or after runtime suspend transitions.

The same issue also happens on multiple SKL systems in the i915 team's CI farm, see:

https://gitlab.freedesktop.org/drm/intel/-/issues/4547

I bisected the problem to
commit 4f5022453acd0f7b28012e20b7d048470f129894
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Oct 18 08:45:39 2021 -0600

    nvme: wire up completion batching for the IRQ path

By reverting it on top of 5.17.0-rc7, I can't reproduce the problem. Attached   dmesg-suspend-resume-nvme-ok.txt with the revert, captured after a few suspend/resume.
Comment 1 Imre Deak 2022-03-13 20:15:00 UTC
Created attachment 300565 [details]
dmesg log after suspend resume, io ok
Comment 2 Keith Busch 2022-03-13 20:29:02 UTC
Thanks for the bisect, that helps a lot. It sounds like a completion is missed somewhere in the batch attempt, I'll take a look this week. I'm not sure if we're getting an rc8 or not, so any potential fix may have to wait for stable.
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-17 13:41:50 UTC
(In reply to Keith Busch from comment #2)
> Thanks for the bisect, that helps a lot. It sounds like a completion is
> missed somewhere in the batch attempt, I'll take a look this week. I'm not
> sure if we're getting an rc8 or not, so any potential fix may have to wait
> for stable.

So, we got a rc8; was any progress made to get this fixed? Otherwise I'll add it to the regression tracking so Linus gets aware of it.
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-04-09 13:06:47 UTC
(In reply to Keith Busch from comment #2)
> Thanks for the bisect, that helps a lot. It sounds like a completion is
> missed somewhere in the batch attempt, I'll take a look this week. I'm not
> sure if we're getting an rc8 or not, so any potential fix may have to wait
> for stable.

Keith, is this still on your todo list? It looks like nothing happend in about a month here.
Comment 5 Keith Busch 2022-04-14 16:03:42 UTC
Sorry for the delay. After not finding any issues from code inspection, I tried recreating this, but haven't had any luck. I'll post this to the mailing list since more people subscribe to that.
Comment 6 Keith Busch 2022-04-14 19:08:42 UTC
On Sun, Mar 13, 2022 at 08:13:53PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215679
> 
>             Bug ID: 215679
>            Summary: NVMe/writeback wb_workfn/blocked for more than 30
>                     seconds
>            Product: IO/Storage
>            Version: 2.5
>     Kernel Version: 5.17.0-rc7
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: NVMe
>           Assignee: io_nvme@kernel-bugs.kernel.org
>           Reporter: imre.deak@intel.com
>         Regression: No
> 
> Created attachment 300564 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=300564&action=edit
> dmesg log after suspend resume, io stuck
> 
> After system suspend/resume filesystem IO will stall, producing a 'kworker
> blocked for more than x sec" in dmesg, recovering after a long delay. See the
> attached dmesg-suspend-resume-nvme-stuck.txt. I also noticed the same issue
> happening right after booting or after runtime suspend transitions.
> 
> The same issue also happens on multiple SKL systems in the i915 team's CI
> farm,
> see:
> 
> https://gitlab.freedesktop.org/drm/intel/-/issues/4547
> 
> I bisected the problem to
> commit 4f5022453acd0f7b28012e20b7d048470f129894
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Mon Oct 18 08:45:39 2021 -0600
> 
>     nvme: wire up completion batching for the IRQ path
> 
> By reverting it on top of 5.17.0-rc7, I can't reproduce the problem. Attached 
> dmesg-suspend-resume-nvme-ok.txt with the revert, captured after a few
> suspend/resume.

Forwarding to linux-nvme for higher visibility.
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-04-21 11:27:48 UTC
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Jens, this regression is now a month old and the culprit was already
identified when the issue was reported, nevertheless it seems no real
progress was made to address this, at least afaics. What's up there?
Should the culprit be reverted?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot poke

On 14.04.22 21:08, Keith Busch wrote:
> On Sun, Mar 13, 2022 at 08:13:53PM +0000, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215679
>>
>>             Bug ID: 215679
>>            Summary: NVMe/writeback wb_workfn/blocked for more than 30
>>                     seconds
>>            Product: IO/Storage
>>            Version: 2.5
>>     Kernel Version: 5.17.0-rc7
>>           Hardware: x86-64
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: high
>>           Priority: P1
>>          Component: NVMe
>>           Assignee: io_nvme@kernel-bugs.kernel.org
>>           Reporter: imre.deak@intel.com
>>         Regression: No
>>
>> Created attachment 300564 [details]
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=300564&action=edit
>> dmesg log after suspend resume, io stuck
>>
>> After system suspend/resume filesystem IO will stall, producing a 'kworker
>> blocked for more than x sec" in dmesg, recovering after a long delay. See
>> the
>> attached dmesg-suspend-resume-nvme-stuck.txt. I also noticed the same issue
>> happening right after booting or after runtime suspend transitions.
>>
>> The same issue also happens on multiple SKL systems in the i915 team's CI
>> farm,
>> see:
>>
>> https://gitlab.freedesktop.org/drm/intel/-/issues/4547
>>
>> I bisected the problem to
>> commit 4f5022453acd0f7b28012e20b7d048470f129894
>> Author: Jens Axboe <axboe@kernel.dk>
>> Date:   Mon Oct 18 08:45:39 2021 -0600
>>
>>     nvme: wire up completion batching for the IRQ path
>>
>> By reverting it on top of 5.17.0-rc7, I can't reproduce the problem.
>> Attached  
>> dmesg-suspend-resume-nvme-ok.txt with the revert, captured after a few
>> suspend/resume.
> 
> Forwarding to linux-nvme for higher visibility.
>
Comment 8 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-06 11:55:41 UTC
Imre, I pretty unhappy how the maintainers handle this report. Sorry, things shouldn't take this long for a bisected regression. But before escalating the problem further, could you please confirm that the regression is still present in latest mainline, aka 5.18-rc5 or later?
Comment 9 keith.busch 2022-05-06 12:33:19 UTC
Created attachment 300895 [details]
attachment-31105-0.html

Take it to the list. As far as I can tell, this report isn't real, and I am
pretty unhappy with how you're handling this.

On Fri, May 6, 2022, 5:55 AM <bugzilla-daemon@kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=215679
>
> --- Comment #8 from The Linux kernel's regression tracker (Thorsten
> Leemhuis) (regressions@leemhuis.info) ---
> Imre, I pretty unhappy how the maintainers handle this report. Sorry,
> things
> shouldn't take this long for a bisected regression. But before escalating
> the
> problem further, could you please confirm that the regression is still
> present
> in latest mainline, aka 5.18-rc5 or later?
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.
> You are watching the assignee of the bug.
Comment 10 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-06 14:30:53 UTC
(In reply to keith.busch from comment #9)

> Take it to the list.

Sorry, now I'm confused. Take if *off* the list or tracked regression reports? Or take it to the mailing list for discussion? But the latter would make sense if this is true:

> As far as I can tell, this report isn't real,

So I assume it's the former. A quick explanation on why you think this report isn't real (just a few words) would have been good (or is this somewhere mentioned in this bug report? sorry, then I missed it and can't see it now), but FWIW, if Imre wouldn't have replied I'd like have dropped the ball myself.

> and I am pretty unhappy with how you're handling this.

Why? From my point it looks a lot like this is a regression that is not handled appropriately. If there is a reason why this report is bogus then sorry, I just missed it.
Comment 11 Keith Busch 2022-05-06 16:12:13 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10)
> (In reply to keith.busch from comment #9)
> 
> > Take it to the list.
> 
> Sorry, now I'm confused. Take if *off* the list or tracked regression
> reports? Or take it to the mailing list for discussion? But the latter would
> make sense if this is true:

The linux-nvme mailing list. bugzilla might be okay as a starting point, but the list is the appropriate place for nvme discussion.

> > As far as I can tell, this report isn't real,
> 
> So I assume it's the former. A quick explanation on why you think this
> report isn't real (just a few words) would have been good (or is this
> somewhere mentioned in this bug report? sorry, then I missed it and can't
> see it now), but FWIW, if Imre wouldn't have replied I'd like have dropped
> the ball myself.

Because no one reported it to the official mailing list.

But truthfully, I apologize for coming off as impatient in my previous message.
Comment 12 Imre Deak 2022-05-06 18:29:17 UTC
Created attachment 300897 [details]
dmesg with the timeout on 5.18.0-rc5

Fwiw, the problem is still reproducible on 5.18.0-rc5, and without the commit I mentioned it's not reproducible. Please find the logs attached.
Comment 13 Imre Deak 2022-05-06 18:30:21 UTC
Created attachment 300898 [details]
dmesg without the timeout on 5.18.0-rc5
Comment 14 Keith Busch 2022-05-06 21:07:23 UTC
(In reply to Imre Deak from comment #12)
> Fwiw, the problem is still reproducible on 5.18.0-rc5, and without the
> commit I mentioned it's not reproducible.

As far as I can tell, it looks like you're just booting with an nvme ext4 root partition. I have been unsuccessful at reproducing the observation. Do you have any other hints on how I might be able to recreate this?
Comment 15 Imre Deak 2022-05-06 21:43:00 UTC
I run 'rtcwake -m mem -s 7' right after booting to reproduce the problem. This leads to the timeout in the log already during the first or if not then during the second suspend/resume sequence; at least in some if not in all the cases during the suspend phase.

If it helps I can work on arranging remote access for you to this machine.
Comment 16 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-09 05:51:07 UTC
(In reply to Keith Busch from comment #11)
>
> The linux-nvme mailing list. bugzilla might be okay as a starting point, but
> the list is the appropriate place for nvme discussion.

My problem is: I have to deal with lot of subsystems and developers and some of them prefer to keep things in bugzilla. I can't keep track of who prefer what, so I just look and try to continue in the place where they interacted (to the best of my judgement, which sometimes is wrong).
 
> [...] 
> But truthfully, I apologize for coming off as impatient in my previous
> message.

No worries!
Comment 17 Keith Busch 2022-05-09 15:12:16 UTC
(In reply to Imre Deak from comment #15)
> If it helps I can work on arranging remote access for you to this machine.

I'm not having any luck running in virtual machines, but I don't have any real hardware that I could attempt this kind of test. If you want to send me login details privately, you can reach me at kbusch@kernel.org and I'll take a look.
Comment 18 Imre Deak 2022-05-10 10:49:27 UTC
Keith, ok. Before doing that could you check the stats below, if with that it's worth continuing debugging the issue, or we should just replace the module (since these particular available_spare etc. metrics could cause the problem)?

testrunner@skl-6600u:~$ sudo nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0x1
temperature : 34 C
available_spare : 44%
available_spare_threshold : 50%
percentage_used : 39%
data_units_read : 153 585 846
data_units_written : 152 504 340
host_read_commands : 2 017 620 857
host_write_commands : 3 581 240 952
controller_busy_time : 67 613
power_cycles : 179 437
power_on_hours : 36 652
unsafe_shutdowns : 4 206
media_errors : 0
num_err_log_entries : 30
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 34 C
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
Comment 19 Keith Busch 2022-05-10 13:59:31 UTC
(In reply to Imre Deak from comment #18)
> Keith, ok. Before doing that could you check the stats below, if with that
> it's worth continuing debugging the issue, or we should just replace the
> module (since these particular available_spare etc. metrics could cause the
> problem)?

That doesn't sound related to your observation. Even if the hardware is becoming flakey as it ages, that shouldn't cause a stuck task.
Comment 20 Tomi Sarvela 2022-05-11 10:49:59 UTC
I'll take this hardware under changes. First thing to do is update the Dell BIOS. This might not be direct cause, but shouldn't hurt either, and it's more comfortable to work on newest BIOS.

If the issue persists after updating BIOS, I'll see if NVMe can be changed easily. This NVMe seems to be towards the end of its life, over half of the spare blocks are already used. This maybe shouldn't cause delays in writes, but very often it does. Reason could also relate to NVMe Firmware (I haven't checked if Toshiba offers updates). Changing the NVMe will clear these problems.

If BIOS update and NVMe change are not helping, we'll try to arrange remote access to this host.
Comment 21 Tomi Sarvela 2022-05-11 13:48:20 UTC
I've updated the BIOS and changed the NVMe: the BIOS was pretty vocal (audibly so) about the health of the NVMe.

This will probably fix the issue, as it seemed to be very NVMe related. I have the old half-spares-gone Toshiba NVMe drive if there's interest investigating driver functionality with faulty drive.
Comment 22 Tomi Sarvela 2022-05-13 12:34:32 UTC
Dell XPS 13 fi-skl-6600u is back in the farm, and looks all green:
https://intel-gfx-ci.01.org/tree/drm-tip/fi-skl-6600u.html

I tried to reproduce the issue with 'rtcwake -m mem -s 7' but no luck.

i915-CI shard-skl hosts mentioned in fd.o#4547 issue are Chromebooks with eMMC: the problem is probably different there.
Comment 23 Imre Deak 2022-05-13 14:31:05 UTC
Okay then, the BIOS update and NVMe module change seem to have fixed the issue, I'm fine to close this ticket.

Note You need to log in before you can comment on or make changes to this bug.