One of the following commits introduced the a warning on Intel-GFX-CI's Brodwell and Haswell machines: 43f6c8d ALSA: hda - Minor code refactoring for Intel HDMI codec parsers fcc88d9 ALSA: hda - Bind with i915 component before codec binding 17890880 ALSA: hda - Skip card registration when no codec is found d94815f ALSA: hda - Fix endless loop of codec configure Here is the actual warning that we are seeing: [ 437.622081] ------------[ cut here ]------------ [ 437.622092] WARNING: CPU: 6 PID: 3239 at sound/hda/hdac_i915.c:367 snd_hdac_i915_init+0x13f/0x160 [snd_hda_core] [ 437.622093] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec crct10dif_pclmul snd_hwdep crc32_pclmul snd_hda_core ghash_clmulni_intel snd_pcm e1000e mei_me mei ptp pps_core prime_numbers lpc_ich [last unloaded: i915] [ 437.622127] CPU: 6 PID: 3239 Comm: kworker/6:3 Tainted: G U W 4.12.0-rc7-CI-CI_DRM_2782+ #1 [ 437.622129] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013 [ 437.622132] Workqueue: events azx_probe_work [snd_hda_intel] [ 437.622135] task: ffff8804080ea740 task.stack: ffffc90000cb8000 [ 437.622138] RIP: 0010:snd_hdac_i915_init+0x13f/0x160 [snd_hda_core] [ 437.622140] RSP: 0018:ffffc90000cbbda0 EFLAGS: 00010286 [ 437.622143] RAX: 0000000000000001 RBX: ffff880405e38008 RCX: 0000000000000001 [ 437.622145] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff880405e38008 [ 437.622146] RBP: ffffc90000cbbdd0 R08: ffff8804080eaf78 R09: 0000000000000000 [ 437.622148] R10: ffffc90000cbbdc8 R11: 0000000000000001 R12: ffff88041eb9a180 [ 437.622150] R13: ffff88040b996888 R14: ffff880405e387c0 R15: ffff880405e387c0 [ 437.622151] FS: 0000000000000000(0000) GS:ffff88041eb80000(0000) knlGS:0000000000000000 [ 437.622153] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 437.622155] CR2: 00007f9310357b00 CR3: 000000040046c000 CR4: 00000000001406e0 [ 437.622156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 437.622158] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 437.622159] Call Trace: [ 437.622164] azx_probe_work+0x4dd/0x930 [snd_hda_intel] [ 437.622171] process_one_work+0x1fe/0x670 [ 437.622175] worker_thread+0x49/0x3b0 [ 437.622179] kthread+0x10f/0x150 [ 437.622182] ? process_one_work+0x670/0x670 [ 437.622184] ? kthread_create_on_node+0x40/0x40 [ 437.622189] ret_from_fork+0x27/0x40 [ 437.622193] Code: 49 c7 84 24 48 05 00 00 00 00 00 00 44 89 fa 48 c7 c6 a8 6c 0c a0 4c 89 ef 48 c7 05 98 4f 00 00 00 00 00 00 e8 b3 18 4f e1 eb 8b <0f> ff 41 be f0 ff ff ff eb 81 41 be f4 ff ff ff e9 76 ff ff ff [ 437.622275] ---[ end trace f713c52cd85ad78a ]--- [ 437.631259] Setting dangerous option reset - tainting kernel
Thanks for the report, it must be fcc88d9. For HSW and BDW, we need to avoid calling snd_hdac_i915_init() twice. I'll work on it.
Could you try the patch below?
Created attachment 257225 [details] Fix patch
Thanks, I'll try it tomorrow!
Tested-by: Martin Peres <martin.peres@linux.intel.com> Thanks, I'll let you close the bug when you have pushed it upstream :) I'll use your patch in our CI for now.
Thanks! I already merged in this morning as I could check it locally, so sorry, your tested-by slipped :)
Wonderful, thanks for being so reactive :) Let's see when the next bug comes :)
I seem to have ill-tested the patch as it seems like it is is now preventing i915 to be reloaded. This could be to improper refcounting but I can't check until Monday. I at least verified that it really was caused by this patch before leaving the office. If you know what is going on, then go for it. Otherwise, let's debug this on Tuesday (when I am back from my short vacation) :)
I checked HSW machine here but couldn't find a problem. The difference is that now no sound card instance is registered for HDMI when modeset is disabled while it appeared registered uselessly. This may change the card index number.
I understand what you mean, but this does not seem to be our problem. What I see is that all our driver-reload tests are now failing. Looking at the code, it is the unload part that does not work, which could suggest that something is depending on i915, which prevents its unloading. I did not have time today to look more into that. Tell me if this is something you would want/need.
Do you mean "all" as really all i915 platforms failing, not only HSW/BDW? Also, can you give a way to reproduce the problem manually?
Yes, it is on all gen4+ platforms: https://intel-gfx-ci.01.org/CI/igt@drv_module_reload@basic-reload.html Here are the unit tests: https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ Please run the following commands: ./autogen.sh make ./tests/drv_module_reload --run-subtest basic-reload That should reproduce the problem Thanks for looking into it!
Never mind, I found a typo causing the problem. diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 03e34edc8f24..5ae8ddab6412 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1385,7 +1385,7 @@ static int azx_free(struct azx *chip) if (hda->need_i915_power) snd_hdac_display_power(bus, false); } - if (chip->driver_type & AZX_DCAPS_I915_COMPONENT) + if (chip->driver_caps & AZX_DCAPS_I915_COMPONENT) snd_hdac_i915_exit(bus); kfree(hda); I'll merge the fix for 4.13.
Created attachment 257349 [details] Fix a typo leading to the module reference unbalance
Wonderful! We should have it in drm-tip soon then. Thanks :)
This fixed it: fc18282 ALSA: hda - Fix unbalance of i915 module refcount