Bug 200585 - [snd_hda_codec] general protection fault: 0000 while suspending
Summary: [snd_hda_codec] general protection fault: 0000 while suspending
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Sound(ALSA) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jaroslav Kysela
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-18 12:17 UTC by Martin Peres
Modified: 2019-03-08 12:01 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.18.0-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Martin Peres 2018-07-18 12:17:17 UTC
<3>[  236.837169] Freezing of tasks failed after 20.000 seconds (2 tasks refusing to freeze, wq_busy=0):
<6>[  236.837198] systemd-udevd   D12744   302    252 0x80000104
<4>[  236.837214] Call Trace:
<4>[  236.837228]  ? __schedule+0x364/0xbf0
<4>[  236.837242]  schedule+0x2d/0x90
<4>[  236.837252]  __pm_runtime_barrier+0x9c/0x160
<4>[  236.837264]  ? wait_woken+0xa0/0xa0
<4>[  236.837275]  __pm_runtime_disable+0x84/0xe0
<4>[  236.837292]  snd_hda_codec_cleanup_for_unbind+0x203/0x210 [snd_hda_codec]
<4>[  236.837311]  hda_codec_driver_probe+0x47/0x100 [snd_hda_codec]
<4>[  236.837324]  driver_probe_device+0x306/0x480
<4>[  236.837335]  __driver_attach+0xdb/0x100
<4>[  236.837344]  ? driver_probe_device+0x480/0x480
<4>[  236.837354]  ? driver_probe_device+0x480/0x480
<4>[  236.837365]  bus_for_each_dev+0x74/0xc0
<4>[  236.837377]  bus_add_driver+0x15f/0x250
<4>[  236.837387]  ? 0xffffffffa00e3000
<4>[  236.837395]  driver_register+0x56/0xe0
<4>[  236.837404]  ? 0xffffffffa00e3000
<4>[  236.837413]  do_one_initcall+0x58/0x370
<4>[  236.837425]  ? do_init_module+0x1d/0x1ea
<4>[  236.837436]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  236.837447]  ? kmem_cache_alloc_trace+0x282/0x2e0
<4>[  236.837460]  do_init_module+0x56/0x1ea
<4>[  236.837471]  load_module+0x2435/0x2b20
<4>[  236.837490]  ? __se_sys_finit_module+0xd3/0xf0
<4>[  236.837501]  __se_sys_finit_module+0xd3/0xf0
<4>[  236.837515]  do_syscall_64+0x55/0x190
<4>[  236.837526]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  236.837537] RIP: 0033:0x7f58dc89d839
<4>[  236.837545] Code: Bad RIP value.
<4>[  236.837559] RSP: 002b:00007ffd689bab78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[  236.837574] RAX: ffffffffffffffda RBX: 000055df23929a20 RCX: 00007f58dc89d839
<4>[  236.837587] RDX: 0000000000000000 RSI: 00007f58dc57c0e5 RDI: 0000000000000011
<4>[  236.837599] RBP: 00007f58dc57c0e5 R08: 0000000000000000 R09: 00007ffd689bac90
<4>[  236.837612] R10: 0000000000000011 R11: 0000000000000246 R12: 0000000000000000
<4>[  236.837624] R13: 000055df23957ca0 R14: 0000000000020000 R15: 000055df23929a20
<6>[  236.837648] alsactl         D11880   627      1 0x80000106
<4>[  236.837662] Call Trace:
<4>[  236.837671]  ? __schedule+0x364/0xbf0
<4>[  236.837681]  ? rwsem_down_write_failed+0xa0/0x2d0
<4>[  236.837693]  ? rwsem_down_write_failed+0x1c2/0x2d0
<4>[  236.837704]  schedule+0x2d/0x90
<4>[  236.837712]  rwsem_down_write_failed+0x1c7/0x2d0
<4>[  236.837728]  ? call_rwsem_down_write_failed+0x13/0x20
<4>[  236.837740]  call_rwsem_down_write_failed+0x13/0x20
<4>[  236.837752]  ? snd_ctl_release+0x7e/0x120
<4>[  236.837762]  down_write+0x4e/0x60
<4>[  236.837770]  ? snd_ctl_release+0x7e/0x120
<4>[  236.837779]  snd_ctl_release+0x7e/0x120
<4>[  236.837791]  __fput+0xaf/0x1f0
<4>[  236.837801]  task_work_run+0x8e/0xc0
<4>[  236.837813]  do_exit+0x396/0xd40
<4>[  236.837823]  ? ksys_ioctl+0x35/0x60
<4>[  236.837835]  rewind_stack_do_exit+0x17/0x20
<6>[  236.837858] OOM killer enabled.

Boot logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4454/fi-kbl-8809g/boot0.log

Logs during the execution of our testsuite: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4454/fi-kbl-8809g/dmesg0.log

All the test artefacts: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4454/fi-kbl-8809g/igt@gem_exec_suspend@basic-s4-devices.html
Comment 1 Martin Peres 2018-07-18 12:24:10 UTC
The oops file has a better description:

<4>[    6.285553] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4>[    6.285619] CPU: 0 PID: 626 Comm: kworker/0:3 Not tainted 4.18.0-rc3-CI-CI_DRM_4454+ #1
<4>[    6.285685] Hardware name: Intel Corporation NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0040.2018.0516.1521 05/16/2018
<4>[    6.285741] Workqueue: pm pm_runtime_work
<4>[    6.285765] RIP: 0010:__x86_indirect_thunk_rax+0x10/0x20
<4>[    6.285793] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 <c3> 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 e8 07 00 00 00 f3 
<4>[    6.285963] RSP: 0018:ffffc900010c3d20 EFLAGS: 00010202
<4>[    6.285991] RAX: 6b6b6b6b6b6b6b6b RBX: ffff880270118008 RCX: 0000000000000001
<4>[    6.286029] RDX: 0000000000000000 RSI: 00000000aafad925 RDI: ffff880270118008
<4>[    6.286065] RBP: ffff8802591db3f8 R08: 00000000d381941d R09: 0000000000000001
<4>[    6.286103] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880270118560
<4>[    6.286139] R13: 0000000000000000 R14: ffffffffa0194a30 R15: 0000000000000008
<4>[    6.286176] FS:  0000000000000000(0000) GS:ffff88027ec00000(0000) knlGS:0000000000000000
<4>[    6.286218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    6.286248] CR2: 000055f779ae60a8 CR3: 0000000005210005 CR4: 00000000003606f0
<4>[    6.286285] Call Trace:
<4>[    6.286303]  ? alc_suspend+0x36/0x50 [snd_hda_codec_realtek]
<6>[    6.286331] input: HDA Intel PCH HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input15
<4>[    6.286383]  ? hda_call_codec_suspend+0x1d/0xa0 [snd_hda_codec]
<4>[    6.286416]  ? hda_codec_runtime_suspend+0x54/0xa0 [snd_hda_codec]
<4>[    6.286450]  ? __rpm_callback+0xb3/0x1b0
<4>[    6.286471]  ? hda_codec_runtime_resume+0x40/0x40 [snd_hda_codec]
<4>[    6.286502]  ? rpm_callback+0x1a/0x70
<4>[    6.286522]  ? hda_codec_runtime_resume+0x40/0x40 [snd_hda_codec]
<4>[    6.286555]  ? rpm_suspend+0x125/0x6d0
<4>[    6.286577]  ? pm_runtime_work+0x6e/0xb0
<4>[    6.286599]  ? process_one_work+0x248/0x6c0
<4>[    6.287861]  ? worker_thread+0x37/0x380
<4>[    6.289111]  ? process_one_work+0x6c0/0x6c0
<4>[    6.290352]  ? kthread+0x119/0x130
<4>[    6.291593]  ? kthread_flush_work_fn+0x10/0x10
<4>[    6.292822]  ? ret_from_fork+0x3a/0x50
<4>[    6.294046] Modules linked in: snd_hda_codec_realtek(+) snd_hda_codec_generic amdgpu i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul btusb btrtl btbcm snd_hda_codec_hdmi btintel bluetooth snd_hda_intel snd_hda_codec ghash_clmulni_intel chash gpu_sched e1000e ecdh_generic snd_hwdep ttm snd_hda_core igb snd_pcm mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel
<6>[    6.295228] input: HDA Intel PCH Line Out as /devices/pci0000:00/0000:00:1f.3/sound/card0/input16
<4>[    6.296904] ---[ end trace 8b569215a2200dc6 ]---
Comment 2 Takashi Iwai 2018-07-18 13:37:29 UTC
How can I reproduce the bug?
Comment 3 Martin Peres 2018-07-18 13:40:47 UTC
(In reply to Takashi Iwai from comment #2)
> How can I reproduce the bug?

Try suspending/resuming in a loop for a couple of thousand times :s I hope this will not require one of these bad boys to reproduce: https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/nuc8i7hvk.html

So far, we have only seen this issue once. I will keep you updated when we get more runs with this issue.
Comment 4 Takashi Iwai 2018-07-18 15:37:37 UTC
Do you mean S4 or S3?  The log you showed indicates the GPF at early boot, and is this while suspend/resume testing?  Somehow the information doesn't match with the description...

The more information is appreciated.  Thanks.
Comment 5 Martin Peres 2018-07-19 08:35:05 UTC
Sorry about that. I followed the code and it looks like the test is verifying that hibernating while the GPU is active is done properly.

The hibernation is triggered like this:

# echo devices > /sys/power/pm_test 
# echo disk > /sys/power/state

I hope this helps!
Comment 6 Lakshminarayana Vudum 2019-02-14 08:58:06 UTC
Last seen CI_DRM_4454 (7 months, 1 week / 4171 runs ago). This issue is not seen lately.
Comment 7 Martin Peres 2019-03-08 12:01:12 UTC
Still have not reproduced 8 months later. So much has changed since this bug was filed that it makes no sense to keep it open. Closing!

Note You need to log in before you can comment on or make changes to this bug.