Bug 215129 - Linux kernel hangs during power down
Summary: Linux kernel hangs during power down
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
: 215359 (view as bug list)
Depends on:
Reported: 2021-11-24 21:14 UTC by Martin Stolpe
Modified: 2022-01-13 21:31 UTC (History)
7 users (show)

See Also:
Kernel Version: 5.15
Regression: Yes
Bisected commit-id:

Kernel log after timeout occured (7.96 KB, text/plain)
2021-11-24 21:14 UTC, Martin Stolpe

Description Martin Stolpe 2021-11-24 21:14:53 UTC
Created attachment 299703 [details]
Kernel log after timeout occured

On my system the kernel is waiting for a task during shutdown which doesn't complete.

The commit which causes this behavior is: [f32a213765739f2a1db319346799f130a3d08820] ethtool: runtime-resume netdev parent before ethtool ioctl ops

This bug causes also that the system gets unresponsive after starting Steam: https://steamcommunity.com/app/221410/discussions/2/3194736442566303600/
Comment 1 Martin Stolpe 2021-11-24 22:05:10 UTC
Wireless card is a QCA6174 based card with ath10k_pci driver.
Comment 2 Artem S. Tashkinov 2021-11-25 09:15:57 UTC
CC'ing the auther of the patch.
Comment 3 Heiner Kallweit 2021-11-25 10:06:05 UTC
The hint to ath10k_pci is misleading here, the actual issue is in the interaction between net core and Intel network driver (igb).

In a nutshell the issue is:
- The core changes result in network driver's runtime_resume() being called from a context where RTNL is held.
- This conflicts with few Intel drivers taking RTNL in their resume path.

This has been initially discussed e.g. here, but there's no tangible result yet.

What you can do as workaround for the time being:
Disable Runtime Power Management for the network adapter:
echo on > /sys/class/net/<interface>/device/power/control
Comment 4 Martin Stolpe 2021-11-25 21:55:22 UTC
I've blacklisted the igb driver and the problem doesn't occur. Thanks!
Comment 5 Heiner Kallweit 2021-11-27 10:19:33 UTC
Right, blacklisting the driver also works, however just for people who don't need the wired network.
Following patch should solve the issue, however I have no test hw. Could you please test it (after removing igb blacklisting)?

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index dd208930f..8073cce73 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -9254,7 +9254,7 @@ static int __maybe_unused igb_suspend(struct device *dev)
 	return __igb_shutdown(to_pci_dev(dev), NULL, 0);
-static int __maybe_unused igb_resume(struct device *dev)
+static int __maybe_unused __igb_resume(struct device *dev, bool rpm)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -9297,17 +9297,24 @@ static int __maybe_unused igb_resume(struct device *dev)
 	wr32(E1000_WUS, ~0);
-	rtnl_lock();
+	if (!rpm)
+		rtnl_lock();
 	if (!err && netif_running(netdev))
 		err = __igb_open(netdev, true);
 	if (!err)
-	rtnl_unlock();
+	if (!rpm)
+		rtnl_unlock();
 	return err;
+static int __maybe_unused igb_resume(struct device *dev)
+	return __igb_resume(dev, false);
 static int __maybe_unused igb_runtime_idle(struct device *dev)
 	struct net_device *netdev = dev_get_drvdata(dev);
@@ -9326,7 +9333,7 @@ static int __maybe_unused igb_runtime_suspend(struct device *dev)
 static int __maybe_unused igb_runtime_resume(struct device *dev)
-	return igb_resume(dev);
+	return __igb_resume(dev, true);
 static void igb_shutdown(struct pci_dev *pdev)
@@ -9442,7 +9449,7 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
  *  @pdev: Pointer to PCI device
  *  Restart the card from scratch, as if from a cold-boot. Implementation
- *  resembles the first-half of the igb_resume routine.
+ *  resembles the first-half of the __igb_resume routine.
 static pci_ers_result_t igb_io_slot_reset(struct pci_dev *pdev)
@@ -9482,7 +9489,7 @@ static pci_ers_result_t igb_io_slot_reset(struct pci_dev *pdev)
  *  This callback is called when the error recovery driver tells us that
  *  its OK to resume normal operation. Implementation resembles the
- *  second-half of the igb_resume routine.
+ *  second-half of the __igb_resume routine.
 static void igb_io_resume(struct pci_dev *pdev)
Comment 6 Martin Stolpe 2021-11-29 20:53:59 UTC
I've tried the patch with 5.15.5 and the problem does not occur. Thank you!
Comment 7 Benjamin Radel 2021-12-07 10:38:24 UTC
Just a quick follow-up: I can confirm that this patch fixes the issue on my hardware as well, thank you! Is there any chance that this gets merged into the main kernel soon? So far I haven't seen this patch in linux or linux-next and the issue is really rather annoying :).

Cheers, Benjamin
Comment 8 Heiner Kallweit 2021-12-07 11:38:00 UTC
The patch is on its way via the Intel network driver tree:
Comment 9 aperotti 2021-12-10 15:01:50 UTC
(In reply to Heiner Kallweit from comment #5)
> Following patch should solve the issue, however I have no test hw. Could you
> please test it (after removing igb blacklisting)?

Tested on 5.15.7 with igb nics: worked like a charm, thanks!
Comment 10 The Linux kernel's regression tracker (Thorsten Leemhuis) 2021-12-17 09:55:39 UTC
Hi, this is your Linux kernel regression tracker speaking.

(In reply to Heiner Kallweit from comment #8)
> The patch is on its way via the Intel network driver tree:
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/tnguy/net-queue/+/
> refs/heads/dev-queue

thx for the patch, but what is taking this patch so long to get upstreamed (which is a requirement to get it backported to stable)? Or was it merged and I just missed it? Or were problems found?

Reminder, it's a regression in 5.15.y we are talking about. This is the only Linux version currently distributed by kernel.org that available for users that need something from 5.11 or later and want a stable and secure kernel at the same time.
Comment 11 Enrico Demarin 2021-12-19 20:59:11 UTC
*** Bug 215359 has been marked as a duplicate of this bug. ***
Comment 12 Enrico Demarin 2021-12-22 16:29:42 UTC
Bug is still present in 5.11, some igb fixes made it through but not this one
Comment 13 Enrico Demarin 2021-12-22 16:30:16 UTC
I meant 5.15.11 just released :)
Comment 14 Mikhail Kondrashov 2021-12-23 03:06:23 UTC
I've same issue but with "igc".
Comment 15 The Linux kernel's regression tracker (Thorsten Leemhuis) 2021-12-23 06:51:15 UTC
(In reply to Enrico Demarin from comment #12)
> Bug is still present in 5.15.11, some igb fixes made it through but not this
> one

I recently poked the developers and the fix is no on its way:

(In reply to Mikhail Kondrashov from comment #14)
> I've same issue but with "igc".

Related, but different and fixed by:

Sadly this patch is not on the way yet it seems :-/
/me grumbles
Comment 16 Enrico Demarin 2022-01-13 21:31:21 UTC
I confirm this is fixed in 5.15.12

Note You need to log in before you can comment on or make changes to this bug.