Bug 212767 - `freeze`-suspend causes stacktrace, and "config space inaccessible" for multiple PCI devices
Summary: `freeze`-suspend causes stacktrace, and "config space inaccessible" for multi...
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-25 22:58 UTC by Konstantin Kharlamov
Modified: 2021-07-19 12:53 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.10.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments
`dmesg -w` for `echo freeze > /sys/power/state` (125.14 KB, application/octet-stream)
2021-04-25 22:58 UTC, Konstantin Kharlamov
Details
`dmesg -w` for `echo mem > /sys/power/state` (117.03 KB, application/octet-stream)
2021-04-25 23:00 UTC, Konstantin Kharlamov
Details
Fix for the bug (1.57 KB, application/mbox)
2021-05-06 20:40 UTC, Konstantin Kharlamov
Details
sudo lspci -vvv (65.61 KB, text/plain)
2021-05-07 13:45 UTC, Konstantin Kharlamov
Details

Description Konstantin Kharlamov 2021-04-25 22:58:24 UTC
Created attachment 296473 [details]
`dmesg -w` for `echo freeze > /sys/power/state`

As description says. Problem is seen on macbook 2013.

From docs I get impression that `freeze` seems to be the safest suspend method. However, on the hw in question the `mem`-suspend works fine, whereas `freeze` causes a bunch of errors and stacktraces.

The user-visible problem is that external monitor can no longer be detected until either α) reboot is done or β) `mem`-suspend cycle is done.

Logs for successful suspend (`mem`), and unsuccessful (`freeze`) are attached.

Kernels versions tested are: 5.10.9, 5.11.15.

# Steps to reproduce

1. As root, execute `echo freeze > /sys/power/state`
2. Look at `dmesg`

## Expected

Dmesg has no PCI errors and stack traces.

## Actual

Dmesg has errors such as:

    [   70.592149] pcieport 0000:05:00.0: can't change power state from D3hot to D0 (config space inaccessible)
    [   70.594304] pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
    [   70.596369] thunderbolt 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
    [   70.658615] ------------[ cut here ]------------
    [   70.658616] tb_cfg_write: -108
    [   70.658647] WARNING: CPU: 1 PID: 2592 at drivers/thunderbolt/ctl.c:1017 tb_cfg_write+0xb6/0xd0 [thunderbolt]
    [   70.658662] Modules linked in: uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat tun bridge stp llc ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc nls_utf8 hfsplus intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btusb kvm btrtl iTCO_wdt btbcm snd_hda_codec_cirrus btintel snd_hda_codec_generic intel_pmc_bxt mei_hdcp ledtrig_audio snd_hda_codec_hdmi iTCO_vendor_support snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation snd_soc_core bluetooth irqbypass rapl snd_compress intel_cstate snd_pcm_dmaengine
    [   70.658707]  intel_uncore soundwire_cadence applesmc thunderbolt snd_hda_codec i2c_i801 pcspkr i2c_smbus lpc_ich snd_hda_core ac97_bus snd_hwdep ecdh_generic snd_seq rfkill ecc snd_seq_device joydev bcm5974 apple_mfi_fastcharge snd_pcm mei_me snd_timer snd mei acpi_als soundcore kfifo_buf sbs industrialio sbshc apple_bl spi_pxa2xx_platform dw_dmac binfmt_misc ip_tables xfs i915 hid_apple i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel cec uas drm usb_storage video applespi fuse
    [   70.658740] CPU: 1 PID: 2592 Comm: kworker/u8:6 Not tainted 5.11.15-200.fc33.x86_64 #1
    [   70.658742] Hardware name: Apple Inc. MacBookAir6,2/Mac-7DF21CB3ED6977E5, BIOS MBA61.88Z.0107.B00.1804111137 04/11/2018
    [   70.658744] Workqueue: events_unbound async_run_entry_fn
    [   70.658749] RIP: 0010:tb_cfg_write+0xb6/0xd0 [thunderbolt]
    [   70.658759] Code: ff 83 fa 0f 74 ca 83 fa 01 19 c0 48 83 c4 18 83 e0 9a 5b 5d 83 e8 05 41 5c 41 5d c3 89 c6 48 c7 c7 4a ae 9a c0 e8 60 bc 1e c6 <0f> 0b 8b 44 24 0c 48 83 c4 18 5b 5d 41 5c 41 5d c3 66 0f 1f 84 00
    [   70.658761] RSP: 0018:ffffa9438120fd50 EFLAGS: 00010282
    [   70.658763] RAX: 0000000000000012 RBX: 0000000000000002 RCX: ffff93a92b298ac8
    [   70.658764] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff93a92b298ac0
    [   70.658766] RBP: ffff93a8c9731540 R08: ffffffff87a64ec0 R09: 0000000000000002
    [   70.658767] R10: 0000000000000001 R11: ffffffff8853b122 R12: 0000000000000002
    [   70.658768] R13: 0000000000000000 R14: 0000000000000010 R15: 0000000000000000
    [   70.658769] FS:  0000000000000000(0000) GS:ffff93a92b280000(0000) knlGS:0000000000000000
    [   70.658771] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   70.658772] CR2: 000055eb12309fc6 CR3: 000000013fa10004 CR4: 00000000001706e0
    [   70.658774] Call Trace:
    [   70.658778]  tb_switch_reset+0x71/0x120 [thunderbolt]
    [   70.658790]  tb_resume_noirq+0x1d/0x140 [thunderbolt]
    [   70.658800]  tb_domain_resume_noirq+0x43/0x60 [thunderbolt]
    [   70.658813]  ? pci_pm_thaw_noirq+0x80/0x80
    [   70.658818]  dpm_run_callback+0x4c/0x120
    [   70.658823]  device_resume_noirq+0x13b/0x220
    [   70.658827]  async_resume_noirq+0x19/0x30
    [   70.658831]  async_run_entry_fn+0x39/0x160
    [   70.658834]  process_one_work+0x1ec/0x380
    [   70.658837]  worker_thread+0x53/0x3e0
    [   70.658839]  ? process_one_work+0x380/0x380
    [   70.658841]  kthread+0x11b/0x140
    [   70.658845]  ? __kthread_bind_mask+0x60/0x60
    [   70.658848]  ret_from_fork+0x22/0x30
    [   70.658854] ---[ end trace 5f8714c360d0db9d ]---
    [   70.658857] ------------[ cut here ]------------
Comment 1 Konstantin Kharlamov 2021-04-25 23:00:17 UTC
Created attachment 296475 [details]
`dmesg -w` for `echo mem > /sys/power/state`
Comment 2 Konstantin Kharlamov 2021-05-05 18:42:16 UTC
It seems to me, like the problem is that `quirk_apple_poweroff_thunderbolt` is not supposed to be called on s2idle (aka freeze). I gotta try disabling it.

Does anybody know,  how can I distinguish suspend modes from within the kernel code? That is, whether current suspend mode is s2idle or another one?
Comment 3 Konstantin Kharlamov 2021-05-05 23:23:05 UTC
So, I made a test by live-patching this quirk out and doing s2idle, and problems are no longer observed. This means I'm on the right track. I only need to figure out how to detect s2idle from within the quirk, then send a patch.
Comment 4 Konstantin Kharlamov 2021-05-06 20:39:23 UTC
FTR: I sent a patch, it is here: https://lore.kernel.org/linux-pci/20210506173820.21876-1-Hi-Angel@yandex.ru/T/#u
Comment 5 Konstantin Kharlamov 2021-05-06 20:40:03 UTC
Created attachment 296677 [details]
Fix for the bug
Comment 6 Konstantin Kharlamov 2021-05-07 13:45:30 UTC
Created attachment 296689 [details]
sudo lspci -vvv
Comment 7 Konstantin Kharlamov 2021-07-19 12:53:49 UTC
FTR: patch is included since 5.13.3 kernel and later (and it was backported on older releases as well).

Note You need to log in before you can comment on or make changes to this bug.