Bug 202571 - WARN_ONCE when suspending igb
Summary: WARN_ONCE when suspending igb
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: Intel Linux
: P1 low
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-13 16:50 UTC by Arvind Sankar
Modified: 2020-03-10 19:39 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.20.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel config (89.69 KB, text/plain)
2019-02-19 04:51 UTC, Arvind Sankar
Details
lspci -vvv output for igb devices (22.20 KB, text/plain)
2019-02-19 04:56 UTC, Arvind Sankar
Details
Test patch (926 bytes, application/mbox)
2019-02-19 15:20 UTC, Kai-Heng Feng
Details
Possible patch (800 bytes, patch)
2019-02-19 16:52 UTC, Arvind Sankar
Details | Diff
igb: Fix WARN_ONCE on runtime suspend (4.80 KB, patch)
2019-03-02 15:52 UTC, Arvind Sankar
Details | Diff
normal lspci -vvv output (34.56 KB, text/plain)
2019-03-06 01:47 UTC, You-Sheng Yang
Details
lspci -vvv output when link is still down (34.56 KB, text/plain)
2019-03-06 01:50 UTC, You-Sheng Yang
Details

Description Arvind Sankar 2019-02-13 16:50:01 UTC
Since commit 1fb3a7a75e2efcc83ef21f2434069cddd6fae6f5 (igb: Fix an issue that PME is not enabled during runtime suspend), PCI PM code generates a WARN_ONCE upon boot.

[   11.946372] ------------[ cut here ]------------
[   11.946384] PCI PM: State of device not saved by igb_runtime_suspend+0x0/0x80
[   11.946406] WARNING: CPU: 27 PID: 265 at drivers/pci/pci-driver.c:1280 pci_pm_runtime_suspend+0x127/0x130
[   11.946407] Modules linked in: 8021q zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) zlib_inflate x86_pkg_temp_thermal kvm_intel kvm coretemp hwmon irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_uncore intel_rapl_perf i2c_i801 intel_pch_thermal ipmi_si ipmi_devintf ipmi_msghandler
[   11.946448] CPU: 27 PID: 265 Comm: kworker/27:1 Tainted: P           O    T 4.20.8+ #20
[   11.946450] Hardware name: Supermicro Super Server/X10SDV-7TP8F, BIOS 2.0 06/13/2018
[   11.946458] Workqueue: pm pm_runtime_work
[   11.946466] RIP: 0010:pci_pm_runtime_suspend+0x127/0x130
[   11.946470] Code: f7 0f 00 8b 44 24 04 eb a1 44 39 e8 74 9a 80 3d cb 7b d0 00 00 75 91 48 c7 c7 d8 b3 d5 bd c6 05 bb 7b d0 00 01 e8 fe 37 cf ff <0f> 0b 31 c0 e9 77 ff ff ff 41 55 41 54 45 31 e4 55 53 48 89 fb 48
[   11.946473] RSP: 0018:ffffb85142a0fd88 EFLAGS: 00010282
[   11.946476] RAX: 0000000000000000 RBX: ffff92ceffe690b0 RCX: ffffffffbde29cb8
[   11.946479] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffbed55d88
[   11.946482] RBP: ffffffffbdaa7980 R08: 0000000000000001 R09: 0000000000000446
[   11.946484] R10: 0000000000000003 R11: 0000000000000000 R12: ffff92ceffe69000
[   11.946487] R13: 0000000000000000 R14: ffffffffbd162550 R15: 0000000000000000
[   11.946491] FS:  0000000000000000(0000) GS:ffff92cedfac0000(0000) knlGS:0000000000000000
[   11.946494] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.946497] CR2: 00007fbdb68e6c38 CR3: 000000006e20a001 CR4: 00000000003606e0
[   11.946499] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   11.946502] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   11.946503] Call Trace:
[   11.946515]  ? __switch_to_asm+0x34/0x70
[   11.946523]  __rpm_callback+0x70/0x1b0
[   11.946529]  ? __switch_to_asm+0x34/0x70
[   11.946534]  ? pci_dev_put+0x20/0x20
[   11.946540]  rpm_callback+0x66/0x90
[   11.946546]  ? pci_dev_put+0x20/0x20
[   11.946552]  rpm_suspend+0x15a/0x570
[   11.946559]  pm_runtime_work+0x8c/0xa0
[   11.946566]  process_one_work+0x1d2/0x350
[   11.946573]  ? wq_calc_node_cpumask.constprop.50+0x20/0x20
[   11.946578]  worker_thread+0x28/0x3d0
[   11.946585]  ? wq_calc_node_cpumask.constprop.50+0x20/0x20
[   11.946589]  kthread+0x103/0x120
[   11.946594]  ? kthread_create_on_node+0x90/0x90
[   11.946600]  ret_from_fork+0x35/0x40
[   11.946604] ---[ end trace 7d317c4a3f4ca539 ]---
Comment 1 Arvind Sankar 2019-02-18 23:29:32 UTC
How was this change supposed to work? Looking at the code, it seems like it cannot avoid hitting the warning path, in which case it won't call pci_save_state/pci_finish_runtime_suspend anyway, since igb_runtime_suspend still sets the power state before returning?

In my case after returning from igb_runtime_suspend, it has gone from D0 (==prev) to D3Hot (==pci_dev->current_state), and hits this code in pci_pm_runtime_suspend which issues the warning and returns early.

        if (pm && pm->runtime_suspend
            && !pci_dev->state_saved && pci_dev->current_state != PCI_D0
            && pci_dev->current_state != PCI_UNKNOWN) {
                WARN_ONCE(pci_dev->current_state != prev,
                        "PCI PM: State of device not saved by %pF\n",
                        pm->runtime_suspend);
                return 0;
        }
Comment 2 Arvind Sankar 2019-02-19 04:51:41 UTC
Created attachment 281203 [details]
Kernel config
Comment 3 Arvind Sankar 2019-02-19 04:56:55 UTC
Created attachment 281205 [details]
lspci -vvv output for igb devices
Comment 4 Kai-Heng Feng 2019-02-19 15:20:01 UTC
Created attachment 281209 [details]
Test patch

Please test this patch, thanks!
Comment 5 Arvind Sankar 2019-02-19 15:44:29 UTC
(In reply to Kai-Heng Feng from comment #4)
> Created attachment 281209 [details]
> Test patch
> 
> Please test this patch, thanks!

It doesn't help -- not sure I understand the patch: it just makes it unconditionally not call pci_save_state, but how would that change anything, it's not getting called now anyway?

Also: should have mentioned in terms of hardware configuration -- none of the igb ports have a cable plugged into them in my setup at present.

In your tests, do you not see the warning?
Comment 6 Kai-Heng Feng 2019-02-19 15:51:01 UTC
Sorry I don't have access to igb devices right now - I'll work on this next week.
Comment 7 Arvind Sankar 2019-02-19 16:52:16 UTC
Created attachment 281211 [details]
Possible patch

This quiets the warning for me.
Comment 8 Arvind Sankar 2019-02-19 16:54:02 UTC
(In reply to Kai-Heng Feng from comment #6)
> Sorry I don't have access to igb devices right now - I'll work on this next
> week.

Thanks. I'll be traveling next week let me know if you want any additional information about the system.
Comment 9 Kai-Heng Feng 2019-02-26 08:20:16 UTC
I think your patch is correct. Do you plan to send it upstream?

My patch in #4 can come as a separate patch. I don't find anywhere in the function that touches config space, so just let PCI core to handle that.
Comment 10 Arvind Sankar 2019-02-27 05:29:04 UTC
(In reply to Kai-Heng Feng from comment #9)
> I think your patch is correct. Do you plan to send it upstream?
> 
> My patch in #4 can come as a separate patch. I don't find anywhere in the
> function that touches config space, so just let PCI core to handle that.

I'd like to understand how the original change was supposed to work in case something specific with my config is causing issues. From the code at least I can't see how it would have done anything but trigger warning, and still not call the finish runtime stuff. Did it actually work in your testing?
Comment 11 Kai-Heng Feng 2019-02-27 18:07:05 UTC
(In reply to Arvind Sankar from comment #10)
> I'd like to understand how the original change was supposed to work in case
> something specific with my config is causing issues. From the code at least

"Original change" means my patch which triggers the warning right?
Your config is perfectly fine.

> I can't see how it would have done anything but trigger warning, and still
> not call the finish runtime stuff. Did it actually work in your testing?

static int __maybe_unused igb_suspend(struct device *dev)
{
        int retval;
        bool wake;
        struct pci_dev *pdev = to_pci_dev(dev);

        retval = __igb_shutdown(pdev, &wake, 0);
        if (retval)
                return retval;

        if (wake) {
                pci_prepare_to_sleep(pdev);
        } else {
                pci_wake_from_d3(pdev, false);
                pci_set_power_state(pdev, PCI_D3hot);
        }

        return 0;
}

It depends on the variable 'wake'. The card I tested seems to support whatever igb_enable_mng_pass_thru() enables, so the 'wake' evals as TRUE, hence the device is still in D0.
Comment 12 Kai-Heng Feng 2019-02-27 18:22:25 UTC
Sorry, I pasted the wrong function, should be igb_runtiem_suspend()instead of igb_suspend(), but you get the idea.

Anyway, I think most of the pci_*() routines in igb_runtiem_suspend() and __igb_shutdown() can be skipped as PCI core already handles them.
Comment 13 Michael Niewöhner 2019-02-28 18:12:34 UTC
Hi there,

I can confirm this bug, too, on Linux 4.20.12:


[   48.481549] ------------[ cut here ]------------
[   48.481555] PCI PM: State of device not saved by igb_runtime_suspend+0x0/0x80 [igb]
[   48.481562] WARNING: CPU: 7 PID: 165 at drivers/pci/pci-driver.c:1280 pci_pm_runtime_suspend+0x173/0x180
[   48.481563] Modules linked in: nls_ascii nls_cp437 vfat fat raid1 md_mod arc4 wmi_bmof rtl8821ae intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp btcoexist kvm_intel rtl_pci rtlwifi efi_pstore kvm mac80211 irqbypass intel_cstate intel_uncore intel_rapl_perf cfg80211 iTCO_wdt efivars pcspkr iTCO_vendor_support rfkill intel_pch_thermal wmi button acpi_pad msr nct6683 efivarfs ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO) zcommon(PO) znvpair(PO) spl(O) algif_skcipher af_alg dm_crypt dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper xhci_pci igb e1000e i2c_algo_bit dca xhci_hcd usbcore
[   48.481583] CPU: 7 PID: 165 Comm: kworker/7:1 Tainted: P           O      4.20.12 #1
[   48.481584] Hardware name: LENOVO 30BK0020GE/103D, BIOS S06KT39A 12/19/2018
[   48.481586] Workqueue: pm pm_runtime_work
[   48.481588] RIP: 0010:pci_pm_runtime_suspend+0x173/0x180
[   48.481588] Code: ff 44 39 e8 0f 84 64 ff ff ff 80 3d e9 61 d1 00 00 0f 85 57 ff ff ff 48 c7 c7 60 ee 82 8b c6 05 d5 61 d1 00 01 e8 47 2f ca ff <0f> 0b 31 c0 e9 3d ff ff ff 0f 1f 40 00 0f 1f 44 00 00 53 48 89 fb
[   48.481589] RSP: 0018:ffffae5ac3b7fd98 EFLAGS: 00010282
[   48.481590] RAX: 0000000000000000 RBX: ffff9120bb5800b0 RCX: 0000000000000006
[   48.481591] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9120bfbd54e0
[   48.481591] RBP: ffffffffc02cc7a0 R08: 0000000000000058 R09: 0000000000000004
[   48.481592] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9120bb580000
[   48.481592] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9120bb5800b0
[   48.481593] FS:  0000000000000000(0000) GS:ffff9120bfbc0000(0000) knlGS:0000000000000000
[   48.481593] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.481594] CR2: 000055e43062b5d8 CR3: 000000042020a001 CR4: 00000000003606e0
[   48.481595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   48.481595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   48.481595] Call Trace:
[   48.481598]  ? __switch_to_asm+0x40/0x70
[   48.481600]  ? pci_has_legacy_pm_support+0x60/0x60
[   48.481601]  __rpm_callback+0x7d/0x1d0
[   48.481602]  rpm_callback+0x4f/0x70
[   48.481604]  ? pci_has_legacy_pm_support+0x60/0x60
[   48.481604]  rpm_suspend+0x138/0x660
[   48.481605]  ? __switch_to_asm+0x40/0x70
[   48.481607]  pm_runtime_work+0x7a/0x80
[   48.481609]  process_one_work+0x18c/0x380
[   48.481611]  worker_thread+0x30/0x370
[   48.481612]  kthread+0x10f/0x130
[   48.481614]  ? process_one_work+0x380/0x380
[   48.481615]  ? kthread_park+0x80/0x80
[   48.481616]  ret_from_fork+0x35/0x40
[   48.481617] ---[ end trace 0799d14756480c9a ]---
[  365.755070] igb 0000:03:00.3 ens3f3: PCIe link lost
[  365.760285] igb 0000:03:00.2 ens3f2: PCIe link lost
[  365.765474] igb 0000:03:00.1 ens3f1: PCIe link lost
[  365.770661] igb 0000:03:00.0 ens3f0: PCIe link lost
[103711.060321] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
Comment 14 Arvind Sankar 2019-03-01 21:25:20 UTC
(In reply to Kai-Heng Feng from comment #11)
> (In reply to Arvind Sankar from comment #10)
> > I'd like to understand how the original change was supposed to work in case
> > something specific with my config is causing issues. From the code at least
> 
> "Original change" means my patch which triggers the warning right?
> Your config is perfectly fine.
> 
> > I can't see how it would have done anything but trigger warning, and still
> > not call the finish runtime stuff. Did it actually work in your testing?
> 
> static int __maybe_unused igb_suspend(struct device *dev)
> {
>         int retval;
>         bool wake;
>         struct pci_dev *pdev = to_pci_dev(dev);
> 
>         retval = __igb_shutdown(pdev, &wake, 0);
>         if (retval)
>                 return retval;
> 
>         if (wake) {
>                 pci_prepare_to_sleep(pdev);
>         } else {
>                 pci_wake_from_d3(pdev, false);
>                 pci_set_power_state(pdev, PCI_D3hot);
>         }
> 
>         return 0;
> }
> 
> It depends on the variable 'wake'. The card I tested seems to support
> whatever igb_enable_mng_pass_thru() enables, so the 'wake' evals as TRUE,
> hence the device is still in D0.

Was that card was not capable of generating PME from D3? According to lspci at least my igb is capable of PME from D3 so even with wake being true it would have been powered down into D3Hot.

If you remove the pci_save_state unconditionally I think you also need to apply my patch to the igb_suspend function as well otherwise I think it will generate the same WARN when suspending.
Comment 15 Arvind Sankar 2019-03-02 15:52:11 UTC
Created attachment 281461 [details]
igb: Fix WARN_ONCE on runtime suspend

Here's a revised patch to make suspend and runtime_suspend consistent, removing  pci_save_state and power state setting from both.
Comment 16 You-Sheng Yang 2019-03-05 10:26:41 UTC
(In reply to Arvind Sankar from comment #15)
> Created attachment 281461 [details]
> igb: Fix WARN_ONCE on runtime suspend
> 
> Here's a revised patch to make suspend and runtime_suspend consistent,
> removing  pci_save_state and power state setting from both.

Maybe it's another issue, but when I applied this patch on 4.15 & 4.19, there are chances that the first few hotplugs still leave interface in down state after booting without cable connected. /proc/interrupts lists no instance of this card, and there is no ACPI_NOTIFY_DEVICE_WAKE event ever propagated either. But when it begins to respond, the problem vanishes until the next boot (1/3 chances or so).
Comment 17 Kai-Heng Feng 2019-03-05 10:31:56 UTC
(In reply to You-Sheng Yang from comment #16)
> Maybe it's another issue, but when I applied this patch on 4.15 & 4.19,
> there are chances that the first few hotplugs still leave interface in down
> state after booting without cable connected. /proc/interrupts lists no
> instance of this card, and there is no ACPI_NOTIFY_DEVICE_WAKE event ever
> propagated either. But when it begins to respond, the problem vanishes until
> the next boot (1/3 chances or so).

Hmm, this card should use PME interrupts directly instead of ACPI GPE.

Can you attach `lspci -vvv` for this device when issue happens?
Comment 18 You-Sheng Yang 2019-03-06 01:47:14 UTC
Created attachment 281521 [details]
normal lspci -vvv output

Captured right after boot with cable disconnected. The link is up as expected when cable is plugged.
Comment 19 You-Sheng Yang 2019-03-06 01:50:06 UTC
Created attachment 281523 [details]
lspci -vvv output when link is still down

Captured right after boot with cable disconnected. The link is NOT up as expected when cable is plugged.

The difference between the two lspci output:

$ diff -Nu Desktop/lspci.vvv.normal Desktop/lspci.vvv.link-not-up 
--- Desktop/lspci.vvv.normal	2019-03-06 01:43:06.000000000 +0800
+++ Desktop/lspci.vvv.link-not-up	2019-03-06 01:43:06.000000000 +0800
@@ -431,7 +431,7 @@
 	Region 4: Memory at 4000000000 (64-bit, non-prefetchable) [size=1M]
 	Capabilities: [50] Power Management version 3
 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
-		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
+		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
 	Capabilities: [80] Vendor Specific Information: Len=14 <?>
 	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
 		Address: 00000000fee004f8  Data: 0000
Comment 20 Arvind Sankar 2019-03-06 01:57:58 UTC
(In reply to You-Sheng Yang from comment #16)
> (In reply to Arvind Sankar from comment #15)
> > Created attachment 281461 [details]
> > igb: Fix WARN_ONCE on runtime suspend
> > 
> > Here's a revised patch to make suspend and runtime_suspend consistent,
> > removing  pci_save_state and power state setting from both.
> 
> Maybe it's another issue, but when I applied this patch on 4.15 & 4.19,
> there are chances that the first few hotplugs still leave interface in down
> state after booting without cable connected. /proc/interrupts lists no
> instance of this card, and there is no ACPI_NOTIFY_DEVICE_WAKE event ever
> propagated either. But when it begins to respond, the problem vanishes until
> the next boot (1/3 chances or so).

Does your card wakeup work without the patch?
Comment 21 Kai-Heng Feng 2019-03-06 03:30:15 UTC
Please try linux-next, particular for commit c528f7bd362b097eeeafa6fbbeccd9750b79c7ba ("Revert "PCI/PME: Implement runtime PM callbacks").
Comment 22 You-Sheng Yang 2019-03-06 07:43:00 UTC
(In reply to Kai-Heng Feng from comment #21)
> Please try linux-next, particular for commit
> c528f7bd362b097eeeafa6fbbeccd9750b79c7ba ("Revert "PCI/PME: Implement
> runtime PM callbacks").

Packages with specified commits built, but don't have the card at hand currently. Probably gonna take a couple of days.
Comment 23 You-Sheng Yang 2019-03-14 05:34:34 UTC
(In reply to Kai-Heng Feng from comment #21)
> Please try linux-next, particular for commit
> c528f7bd362b097eeeafa6fbbeccd9750b79c7ba ("Revert "PCI/PME: Implement
> runtime PM callbacks").

Hi, tested with kernel built from linux-next tree. No more previous problem observed.

Note You need to log in before you can comment on or make changes to this bug.