Bug 212767

Summary: `freeze`-suspend causes stacktrace, and "config space inaccessible" for multiple PCI devices
Product: Power Management Reporter: Konstantin Kharlamov (hi-angel)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjw)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: rui.zhang, yu.c.chen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.10.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: `dmesg -w` for `echo freeze > /sys/power/state`
`dmesg -w` for `echo mem > /sys/power/state`
Fix for the bug
sudo lspci -vvv

Description Konstantin Kharlamov 2021-04-25 22:58:24 UTC
Created attachment 296473 [details]
`dmesg -w` for `echo freeze > /sys/power/state`

As description says. Problem is seen on macbook 2013.

From docs I get impression that `freeze` seems to be the safest suspend method. However, on the hw in question the `mem`-suspend works fine, whereas `freeze` causes a bunch of errors and stacktraces.

The user-visible problem is that external monitor can no longer be detected until either α) reboot is done or β) `mem`-suspend cycle is done.

Logs for successful suspend (`mem`), and unsuccessful (`freeze`) are attached.

Kernels versions tested are: 5.10.9, 5.11.15.

# Steps to reproduce

1. As root, execute `echo freeze > /sys/power/state`
2. Look at `dmesg`

## Expected

Dmesg has no PCI errors and stack traces.

## Actual

Dmesg has errors such as:

    [   70.592149] pcieport 0000:05:00.0: can't change power state from D3hot to D0 (config space inaccessible)
    [   70.594304] pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
    [   70.596369] thunderbolt 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
    [   70.658615] ------------[ cut here ]------------
    [   70.658616] tb_cfg_write: -108
    [   70.658647] WARNING: CPU: 1 PID: 2592 at drivers/thunderbolt/ctl.c:1017 tb_cfg_write+0xb6/0xd0 [thunderbolt]
    [   70.658662] Modules linked in: uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat tun bridge stp llc ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc nls_utf8 hfsplus intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btusb kvm btrtl iTCO_wdt btbcm snd_hda_codec_cirrus btintel snd_hda_codec_generic intel_pmc_bxt mei_hdcp ledtrig_audio snd_hda_codec_hdmi iTCO_vendor_support snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation snd_soc_core bluetooth irqbypass rapl snd_compress intel_cstate snd_pcm_dmaengine
    [   70.658707]  intel_uncore soundwire_cadence applesmc thunderbolt snd_hda_codec i2c_i801 pcspkr i2c_smbus lpc_ich snd_hda_core ac97_bus snd_hwdep ecdh_generic snd_seq rfkill ecc snd_seq_device joydev bcm5974 apple_mfi_fastcharge snd_pcm mei_me snd_timer snd mei acpi_als soundcore kfifo_buf sbs industrialio sbshc apple_bl spi_pxa2xx_platform dw_dmac binfmt_misc ip_tables xfs i915 hid_apple i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel cec uas drm usb_storage video applespi fuse
    [   70.658740] CPU: 1 PID: 2592 Comm: kworker/u8:6 Not tainted 5.11.15-200.fc33.x86_64 #1
    [   70.658742] Hardware name: Apple Inc. MacBookAir6,2/Mac-7DF21CB3ED6977E5, BIOS MBA61.88Z.0107.B00.1804111137 04/11/2018
    [   70.658744] Workqueue: events_unbound async_run_entry_fn
    [   70.658749] RIP: 0010:tb_cfg_write+0xb6/0xd0 [thunderbolt]
    [   70.658759] Code: ff 83 fa 0f 74 ca 83 fa 01 19 c0 48 83 c4 18 83 e0 9a 5b 5d 83 e8 05 41 5c 41 5d c3 89 c6 48 c7 c7 4a ae 9a c0 e8 60 bc 1e c6 <0f> 0b 8b 44 24 0c 48 83 c4 18 5b 5d 41 5c 41 5d c3 66 0f 1f 84 00
    [   70.658761] RSP: 0018:ffffa9438120fd50 EFLAGS: 00010282
    [   70.658763] RAX: 0000000000000012 RBX: 0000000000000002 RCX: ffff93a92b298ac8
    [   70.658764] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff93a92b298ac0
    [   70.658766] RBP: ffff93a8c9731540 R08: ffffffff87a64ec0 R09: 0000000000000002
    [   70.658767] R10: 0000000000000001 R11: ffffffff8853b122 R12: 0000000000000002
    [   70.658768] R13: 0000000000000000 R14: 0000000000000010 R15: 0000000000000000
    [   70.658769] FS:  0000000000000000(0000) GS:ffff93a92b280000(0000) knlGS:0000000000000000
    [   70.658771] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   70.658772] CR2: 000055eb12309fc6 CR3: 000000013fa10004 CR4: 00000000001706e0
    [   70.658774] Call Trace:
    [   70.658778]  tb_switch_reset+0x71/0x120 [thunderbolt]
    [   70.658790]  tb_resume_noirq+0x1d/0x140 [thunderbolt]
    [   70.658800]  tb_domain_resume_noirq+0x43/0x60 [thunderbolt]
    [   70.658813]  ? pci_pm_thaw_noirq+0x80/0x80
    [   70.658818]  dpm_run_callback+0x4c/0x120
    [   70.658823]  device_resume_noirq+0x13b/0x220
    [   70.658827]  async_resume_noirq+0x19/0x30
    [   70.658831]  async_run_entry_fn+0x39/0x160
    [   70.658834]  process_one_work+0x1ec/0x380
    [   70.658837]  worker_thread+0x53/0x3e0
    [   70.658839]  ? process_one_work+0x380/0x380
    [   70.658841]  kthread+0x11b/0x140
    [   70.658845]  ? __kthread_bind_mask+0x60/0x60
    [   70.658848]  ret_from_fork+0x22/0x30
    [   70.658854] ---[ end trace 5f8714c360d0db9d ]---
    [   70.658857] ------------[ cut here ]------------
Comment 1 Konstantin Kharlamov 2021-04-25 23:00:17 UTC
Created attachment 296475 [details]
`dmesg -w` for `echo mem > /sys/power/state`
Comment 2 Konstantin Kharlamov 2021-05-05 18:42:16 UTC
It seems to me, like the problem is that `quirk_apple_poweroff_thunderbolt` is not supposed to be called on s2idle (aka freeze). I gotta try disabling it.

Does anybody know,  how can I distinguish suspend modes from within the kernel code? That is, whether current suspend mode is s2idle or another one?
Comment 3 Konstantin Kharlamov 2021-05-05 23:23:05 UTC
So, I made a test by live-patching this quirk out and doing s2idle, and problems are no longer observed. This means I'm on the right track. I only need to figure out how to detect s2idle from within the quirk, then send a patch.
Comment 4 Konstantin Kharlamov 2021-05-06 20:39:23 UTC
FTR: I sent a patch, it is here: https://lore.kernel.org/linux-pci/20210506173820.21876-1-Hi-Angel@yandex.ru/T/#u
Comment 5 Konstantin Kharlamov 2021-05-06 20:40:03 UTC
Created attachment 296677 [details]
Fix for the bug
Comment 6 Konstantin Kharlamov 2021-05-07 13:45:30 UTC
Created attachment 296689 [details]
sudo lspci -vvv
Comment 7 Konstantin Kharlamov 2021-07-19 12:53:49 UTC
FTR: patch is included since 5.13.3 kernel and later (and it was backported on older releases as well).