Bug 200585

Summary: [snd_hda_codec] general protection fault: 0000 while suspending
Product: Drivers Reporter: Martin Peres (martin.peres)
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: RESOLVED UNREPRODUCIBLE    
Severity: normal CC: lakshminarayana.vudum, tiwai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.18.0-rc3 Subsystem:
Regression: No Bisected commit-id:

Description Martin Peres 2018-07-18 12:17:17 UTC
<3>[  236.837169] Freezing of tasks failed after 20.000 seconds (2 tasks refusing to freeze, wq_busy=0):
<6>[  236.837198] systemd-udevd   D12744   302    252 0x80000104
<4>[  236.837214] Call Trace:
<4>[  236.837228]  ? __schedule+0x364/0xbf0
<4>[  236.837242]  schedule+0x2d/0x90
<4>[  236.837252]  __pm_runtime_barrier+0x9c/0x160
<4>[  236.837264]  ? wait_woken+0xa0/0xa0
<4>[  236.837275]  __pm_runtime_disable+0x84/0xe0
<4>[  236.837292]  snd_hda_codec_cleanup_for_unbind+0x203/0x210 [snd_hda_codec]
<4>[  236.837311]  hda_codec_driver_probe+0x47/0x100 [snd_hda_codec]
<4>[  236.837324]  driver_probe_device+0x306/0x480
<4>[  236.837335]  __driver_attach+0xdb/0x100
<4>[  236.837344]  ? driver_probe_device+0x480/0x480
<4>[  236.837354]  ? driver_probe_device+0x480/0x480
<4>[  236.837365]  bus_for_each_dev+0x74/0xc0
<4>[  236.837377]  bus_add_driver+0x15f/0x250
<4>[  236.837387]  ? 0xffffffffa00e3000
<4>[  236.837395]  driver_register+0x56/0xe0
<4>[  236.837404]  ? 0xffffffffa00e3000
<4>[  236.837413]  do_one_initcall+0x58/0x370
<4>[  236.837425]  ? do_init_module+0x1d/0x1ea
<4>[  236.837436]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  236.837447]  ? kmem_cache_alloc_trace+0x282/0x2e0
<4>[  236.837460]  do_init_module+0x56/0x1ea
<4>[  236.837471]  load_module+0x2435/0x2b20
<4>[  236.837490]  ? __se_sys_finit_module+0xd3/0xf0
<4>[  236.837501]  __se_sys_finit_module+0xd3/0xf0
<4>[  236.837515]  do_syscall_64+0x55/0x190
<4>[  236.837526]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  236.837537] RIP: 0033:0x7f58dc89d839
<4>[  236.837545] Code: Bad RIP value.
<4>[  236.837559] RSP: 002b:00007ffd689bab78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[  236.837574] RAX: ffffffffffffffda RBX: 000055df23929a20 RCX: 00007f58dc89d839
<4>[  236.837587] RDX: 0000000000000000 RSI: 00007f58dc57c0e5 RDI: 0000000000000011
<4>[  236.837599] RBP: 00007f58dc57c0e5 R08: 0000000000000000 R09: 00007ffd689bac90
<4>[  236.837612] R10: 0000000000000011 R11: 0000000000000246 R12: 0000000000000000
<4>[  236.837624] R13: 000055df23957ca0 R14: 0000000000020000 R15: 000055df23929a20
<6>[  236.837648] alsactl         D11880   627      1 0x80000106
<4>[  236.837662] Call Trace:
<4>[  236.837671]  ? __schedule+0x364/0xbf0
<4>[  236.837681]  ? rwsem_down_write_failed+0xa0/0x2d0
<4>[  236.837693]  ? rwsem_down_write_failed+0x1c2/0x2d0
<4>[  236.837704]  schedule+0x2d/0x90
<4>[  236.837712]  rwsem_down_write_failed+0x1c7/0x2d0
<4>[  236.837728]  ? call_rwsem_down_write_failed+0x13/0x20
<4>[  236.837740]  call_rwsem_down_write_failed+0x13/0x20
<4>[  236.837752]  ? snd_ctl_release+0x7e/0x120
<4>[  236.837762]  down_write+0x4e/0x60
<4>[  236.837770]  ? snd_ctl_release+0x7e/0x120
<4>[  236.837779]  snd_ctl_release+0x7e/0x120
<4>[  236.837791]  __fput+0xaf/0x1f0
<4>[  236.837801]  task_work_run+0x8e/0xc0
<4>[  236.837813]  do_exit+0x396/0xd40
<4>[  236.837823]  ? ksys_ioctl+0x35/0x60
<4>[  236.837835]  rewind_stack_do_exit+0x17/0x20
<6>[  236.837858] OOM killer enabled.

Boot logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4454/fi-kbl-8809g/boot0.log

Logs during the execution of our testsuite: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4454/fi-kbl-8809g/dmesg0.log

All the test artefacts: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4454/fi-kbl-8809g/igt@gem_exec_suspend@basic-s4-devices.html
Comment 1 Martin Peres 2018-07-18 12:24:10 UTC
The oops file has a better description:

<4>[    6.285553] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4>[    6.285619] CPU: 0 PID: 626 Comm: kworker/0:3 Not tainted 4.18.0-rc3-CI-CI_DRM_4454+ #1
<4>[    6.285685] Hardware name: Intel Corporation NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0040.2018.0516.1521 05/16/2018
<4>[    6.285741] Workqueue: pm pm_runtime_work
<4>[    6.285765] RIP: 0010:__x86_indirect_thunk_rax+0x10/0x20
<4>[    6.285793] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 <c3> 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 e8 07 00 00 00 f3 
<4>[    6.285963] RSP: 0018:ffffc900010c3d20 EFLAGS: 00010202
<4>[    6.285991] RAX: 6b6b6b6b6b6b6b6b RBX: ffff880270118008 RCX: 0000000000000001
<4>[    6.286029] RDX: 0000000000000000 RSI: 00000000aafad925 RDI: ffff880270118008
<4>[    6.286065] RBP: ffff8802591db3f8 R08: 00000000d381941d R09: 0000000000000001
<4>[    6.286103] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880270118560
<4>[    6.286139] R13: 0000000000000000 R14: ffffffffa0194a30 R15: 0000000000000008
<4>[    6.286176] FS:  0000000000000000(0000) GS:ffff88027ec00000(0000) knlGS:0000000000000000
<4>[    6.286218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    6.286248] CR2: 000055f779ae60a8 CR3: 0000000005210005 CR4: 00000000003606f0
<4>[    6.286285] Call Trace:
<4>[    6.286303]  ? alc_suspend+0x36/0x50 [snd_hda_codec_realtek]
<6>[    6.286331] input: HDA Intel PCH HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input15
<4>[    6.286383]  ? hda_call_codec_suspend+0x1d/0xa0 [snd_hda_codec]
<4>[    6.286416]  ? hda_codec_runtime_suspend+0x54/0xa0 [snd_hda_codec]
<4>[    6.286450]  ? __rpm_callback+0xb3/0x1b0
<4>[    6.286471]  ? hda_codec_runtime_resume+0x40/0x40 [snd_hda_codec]
<4>[    6.286502]  ? rpm_callback+0x1a/0x70
<4>[    6.286522]  ? hda_codec_runtime_resume+0x40/0x40 [snd_hda_codec]
<4>[    6.286555]  ? rpm_suspend+0x125/0x6d0
<4>[    6.286577]  ? pm_runtime_work+0x6e/0xb0
<4>[    6.286599]  ? process_one_work+0x248/0x6c0
<4>[    6.287861]  ? worker_thread+0x37/0x380
<4>[    6.289111]  ? process_one_work+0x6c0/0x6c0
<4>[    6.290352]  ? kthread+0x119/0x130
<4>[    6.291593]  ? kthread_flush_work_fn+0x10/0x10
<4>[    6.292822]  ? ret_from_fork+0x3a/0x50
<4>[    6.294046] Modules linked in: snd_hda_codec_realtek(+) snd_hda_codec_generic amdgpu i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul btusb btrtl btbcm snd_hda_codec_hdmi btintel bluetooth snd_hda_intel snd_hda_codec ghash_clmulni_intel chash gpu_sched e1000e ecdh_generic snd_hwdep ttm snd_hda_core igb snd_pcm mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel
<6>[    6.295228] input: HDA Intel PCH Line Out as /devices/pci0000:00/0000:00:1f.3/sound/card0/input16
<4>[    6.296904] ---[ end trace 8b569215a2200dc6 ]---
Comment 2 Takashi Iwai 2018-07-18 13:37:29 UTC
How can I reproduce the bug?
Comment 3 Martin Peres 2018-07-18 13:40:47 UTC
(In reply to Takashi Iwai from comment #2)
> How can I reproduce the bug?

Try suspending/resuming in a loop for a couple of thousand times :s I hope this will not require one of these bad boys to reproduce: https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/nuc8i7hvk.html

So far, we have only seen this issue once. I will keep you updated when we get more runs with this issue.
Comment 4 Takashi Iwai 2018-07-18 15:37:37 UTC
Do you mean S4 or S3?  The log you showed indicates the GPF at early boot, and is this while suspend/resume testing?  Somehow the information doesn't match with the description...

The more information is appreciated.  Thanks.
Comment 5 Martin Peres 2018-07-19 08:35:05 UTC
Sorry about that. I followed the code and it looks like the test is verifying that hibernating while the GPU is active is done properly.

The hibernation is triggered like this:

# echo devices > /sys/power/pm_test 
# echo disk > /sys/power/state

I hope this helps!
Comment 6 Lakshminarayana Vudum 2019-02-14 08:58:06 UTC
Last seen CI_DRM_4454 (7 months, 1 week / 4171 runs ago). This issue is not seen lately.
Comment 7 Martin Peres 2019-03-08 12:01:12 UTC
Still have not reproduced 8 months later. So much has changed since this bug was filed that it makes no sense to keep it open. Closing!