Bug 217278 - ast 0000:03:00.0: PM: **** DPM device timeout **** during S4 resuming
Summary: ast 0000:03:00.0: PM: **** DPM device timeout **** during S4 resuming
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: Intel Linux
: P1 blocking
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-31 01:43 UTC by wendy.wang
Modified: 2023-04-19 03:11 UTC (History)
0 users

See Also:
Kernel Version: v6.3-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Failure log in mainline v6.1 (6.26 KB, text/plain)
2023-04-02 02:53 UTC, wendy.wang
Details
Failure log in mainline v6.3-rc4 (6.11 KB, text/plain)
2023-04-02 02:54 UTC, wendy.wang
Details
Failure log in mainline v6.1 (6.26 KB, text/plain)
2023-04-02 02:54 UTC, wendy.wang
Details
0001-drm-ast-Fix-long-timeout-which-could-lead-to-panic.patch_v6.3-rc5 (4.76 KB, application/mbox)
2023-04-14 03:43 UTC, wendy.wang
Details

Description wendy.wang 2023-03-31 01:43:52 UTC
On Intel Emerald Rapids Server
ast driver will trigger unrecoverable kernel panic during S4 resuming.
If add "modprobe.blacklist=ast" kernel parameter to disable ast driver
Can avoid the kernel call trace.

[  456.012967] CPU251 is up
[  456.015985] smpboot: ++++++++++++++++++++=_---CPU UP  252
[  456.022183] smpboot: Booting Node 1 Processor 252 APIC 0xf9
[  456.028552] smpboot: Setting warm reset code and vector.
[  456.034650] smpboot: Asserting INIT
[  456.038648] smpboot: Waiting for send to finish...
[  456.044128] smpboot: Deasserting INIT
[  456.048334] smpboot: Waiting for send to finish...
[  456.053818] smpboot: #startup loops: 2
[  456.058131] smpboot: Sending STARTUP #1
[  456.062542] smpboot: After apic_write
[  456.066768] smpboot: Startup point 1
[  456.070898] smpboot: Waiting for send to finish...
[  456.076398] smpboot: Sending STARTUP #2
[  456.080832] smpboot: After apic_write
[  456.085082] smpboot: Startup point 1
[  456.089215] smpboot: Waiting for send to finish...
[  456.094718] smpboot: After Startup
[  456.124666] microcode: Load ucode for AP
[  456.129185] microcode: __load_ucode_intel: cp_data = 0x0000000000000000 cp_size = 0x0
000000000000 cp_size = 0x0ack at about 00000000937ed697
[  456.310542] smpboot: Stack at about 000000007168126e
[  456.316238] microcode: mc_cpu_starting: CPU253, err: 3
[  456.359827] CPU253 is up
[  456.362823] smpboot: ++++++++++++++++++++=_---CPU UP  254
[  456.369003] smpboot: Booting Node 1 Processor 254 APIC 0xfd
[  456.375378] smpboot: Setting warm reset code and vector.
[  456.381450] smpboot: Asserting INIT
[  456.385449] smpboot: Waiting for send to finish...
[  456.390924] smpboot: Deasserting INIT
[  456.395144] smpboot: Waiting for send to finish...
[  456.400590] smpboot: #startup loops: 2
[  456.404900] smpboot: Sending STARTUP #1
[  456.409299] smpboot: After apic_write
[  456.413555] smpboot: Startup point 1
[  456.417686] smpboot: Waiting for send to finish...
[  456.423193] smpboot: Sending STARTUP #2
[  456.427619] smpboot: After apic_write
[  456.431869] smpboot: Startup point 1
[  456.435993] smpboot: Waiting for send to finish...
[  456.441476] smpboot: After Startup
[  456.469575] microcode: Load ucode for AP
[  456.474101] microcode: __load_ucode_intel: cp_data = 0x0000000000000000 cp_size = 0x0
[  456.483382] smpboot: Stack [  457.507476] ttyS ttyS0: 1 input overrun(s)
[  457.509389] power_meter ACPI000D:00: Found ACPI power meter.
[  457.519717] serial 00:04: activated
[  457.524889] serial 00:05: activated
[  457.543671] nvme nvme0: 8/0/0 default/read/poll queues
[  457.840429] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  457.847500] ata6: SATA link down (SStatus 4 SControl 300)
[  457.853654] ata5: SATA link down (SStatus 4 SControl 300)
[  457.859819] ata1: SATA link down (SStatus 4 SControl 300)
[  457.865967] ata8: SATA link down (SStatus 4 SControl 300)
[  457.872129] ata4: SATA link down (SStatus 4 SControl 300)
[  457.878292] ata2: SATA link down (SStatus 4 SControl 300)
[  457.884470] ata3: SATA link down (SStatus 4 SControl 300)
[  457.890626] ata7.00: supports DRM functions and may not be fully accessible
[  457.900562] ata7.00: supports DRM functions and may not be fully accessible
[  457.909510] ata7.00: configured for UDMA/133
[  458.532192] virbr0: port 1(ens3) entered disabled state
[  461.085827] igb 0000:27:00.0 ens1: igb: ens1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  461.233851] igb 0000:a8:00.0 ens3: igb: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  461.347436] virbr0: port 1(ens3) entered blocking state
[  461.353412] virbr0: port 1(ens3) entered forwarding state
[  461.377813] igc 0000:01:00.0 enp1s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  469.667476] ast 0000:03:00.0: PM: **** DPM device timeout ****
[  469.674142] Call Trace:
[  469.676983]  <TASK>
[  469.679427]  __schedule+0x27a/0x6e0
[  469.683446]  ? _raw_spin_unlock_irqrestore+0x22/0x40
[  469.689117]  schedule+0x61/0xe0
[  469.692739]  schedule_timeout+0x7a/0xf0
[  469.697126]  ? __pfx_process_timeout+0x10/0x10
[  469.702216]  msleep+0x2d/0x40
[  469.705623]  ast_dp_launch+0x36/0xc0 [ast]
[  469.710328]  ast_post_gpu+0x1fa/0x240 [ast]
[  469.715096]  ast_pm_thaw+0x16/0x20 [ast]
[  469.719554]  pci_pm_thaw+0x44/0xa0
[  469.723450]  ? __pfx_pci_pm_thaw+0x10/0x10
[  469.728146]  dpm_run_callback+0x95/0x200
[  469.732653]  device_resume+0xea/0x210
[  469.736865]  ? __pfx_dpm_watchdog_handler+0x10/0x10
[  469.742413]  async_resume+0x1e/0x60
[  469.746401]  async_run_entry_fn+0x39/0x140
[  469.751098]  process_one_work+0x196/0x3c0
[  469.755691]  worker_thread+0x51/0x3a0
[  469.759902]  ? __pfx_worker_thread+0x10/0x10
[  469.764792]  kthread+0xea/0x120
[  469.768405]  ? __pfx_kthread+0x10/0x10
[  469.772695]  ret_from_fork+0x29/0x50
[  469.776793]  </TASK>
[  469.779346] Kernel panic - not syncing: ast 0000:03:00.0: unrecoverable failure
[  469.787630] CPU: 153 PID: 0 Comm: swapper/153 Tainted: G        W           6.3.0-rc4-2023-03-28-intel-next-02154-g257279c18b21 #1
[  469.800891] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0100.D45.2303121331 03/12/2023
[  469.812782] Call Trace:
[  469.815614]  <IRQ>
[  469.817926]  dump_stack_lvl+0x36/0x50
[  469.822124]  panic+0x343/0x350
[  469.825665]  ? __pfx_dpm_watchdog_handler+0x10/0x10
[  469.831240]  ? __pfx_dpm_watchdog_handler+0x10/0x10
[  469.836775]  dpm_watchdog_handler+0x56/0x60
[  469.841571]  call_timer_fn+0x2b/0x140
[  469.845766]  __run_timers.part.0+0x1e7/0x280
[  469.850652]  ? tick_sched_timer+0x73/0x90
[  469.855257]  ? _raw_spin_lock_irq+0x18/0x30
[  469.860041]  ? __hrtimer_run_queues+0x13c/0x2d0
[  469.865220]  ? lapic_timer_shutdown.part.0+0x33/0x50
[  469.870897]  ? lapic_timer_shutdown+0x14/0x20
[  469.875882]  run_timer_softirq+0x35/0x70
[  469.880361]  __do_softirq+0x102/0x31d
[  469.884545]  irq_exit_rcu+0xc1/0x130
[  469.888663]  sysvec_apic_timer_interrupt+0xa2/0xd0
[  469.894150]  </IRQ>
[  469.896589]  <TASK>
[  469.899023]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  469.904892] RIP: 0010:cpuidle_enter_state+0xd0/0x420
[  469.910547] Code: ff ff 8b 53 04 49 89 c5 8b 05 84 b5 ff 00 85 c0 0f 8f e3 02 00 00 31 ff e8 6d ab e7 fe 45 84 ff 0f 85 e3 01 00 00 fb 45 85 f6 <0f> 88 1e 01 00 00 48 8b 04 24 49 63 ce 48 6b d1 68 49 29 c5 48 89
[  469.931691] RSP: 0000:ff85092306c3fe88 EFLAGS: 00000206
[  469.937654] RAX: ff44661abdc40000 RBX: ffb7091ad765af00 RCX: 0000000000000000
[  469.945736] RDX: 0000000000000099 RSI: ffffffff99c06314 RDI: ffffffff99c0ac47
[  469.953814] RBP: 0000000000000003 R08: ffffffca120fb4c4 R09: 000000004b4b4b4b
[  469.961903] R10: 0000000000000010 R11: 00000000000025d6 R12: ffffffff9a27c940
[  469.970007] R13: 0000006d5a5cd0ff R14: 0000000000000003 R15: 0000000000000000
[  469.978117]  ? cpuidle_enter_state+0xc3/0x420
[  469.983056]  cpuidle_enter+0x2d/0x40
[  469.987155]  cpuidle_idle_call+0x101/0x170
[  469.991825]  do_idle+0x74/0xc0
[  469.995336]  cpu_startup_entry+0x1d/0x20
[  469.999822]  start_secondary+0x12d/0x130
[  470.004310]  secondary_startup_64_no_verify+0xe0/0xeb
[  470.010063]  </TASK>
[  470.125520] Kernel Offset: 0x17000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  470.575721] ---[ end Kernel panic - not syncing: ast 0000:03:00.0: unrecoverable failure ]---

[root@emr-2s7 ~]# lspci -vvv -s 03:00.0
03:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) (prog-if 00 [VGA controller])
        Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 16
        NUMA node: 0
        IOMMU group: 72
        Region 0: Memory at 94000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 95000000 (32-bit, non-prefetchable) [size=256K]
        Region 2: I/O ports at 1000 [size=128]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Kernel modules: ast
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-31 05:33:19 UTC
Did this work fine or earlier kernels? Which one?
Comment 2 wendy.wang 2023-03-31 09:14:47 UTC
Needs time to check mainline v6.1 fail or not.
Comment 3 wendy.wang 2023-04-02 02:53:54 UTC
Created attachment 304074 [details]
Failure log in mainline v6.1
Comment 4 wendy.wang 2023-04-02 02:54:28 UTC
Created attachment 304075 [details]
Failure log in mainline v6.3-rc4
Comment 5 wendy.wang 2023-04-02 02:54:50 UTC
Created attachment 304076 [details]
Failure log in mainline v6.1
Comment 6 wendy.wang 2023-04-02 02:55:39 UTC
Earlier mainline v6.1 version ast driver also failed.
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-04-02 08:15:03 UTC
Then it sounds like this is not a regression and thus nothing for me.

But FWIW, I doubt your report here will reach the maintainers of the driver. You might want to contact them by mail to Joel Stanley <joel@jms.id.au>, CCed to linux-aspeed@lists.ozlabs.org
Comment 8 wendy.wang 2023-04-14 03:43:53 UTC
Created attachment 304133 [details]
0001-drm-ast-Fix-long-timeout-which-could-lead-to-panic.patch_v6.3-rc5
Comment 9 wendy.wang 2023-04-14 03:47:58 UTC
Thanks Thorsten for your information,
I sent email and got response.
Jammy Huang <jammy_huang@aspeedtech.com> gave me the patch and ast driver rpm package, we confirmed that it can unblock the kernel panic during S4 entry.

Note You need to log in before you can comment on or make changes to this bug.