Bug 196219

Summary: WARNING at sound/hda/hdac_i915.c:367 on HSW and BDW platforms
Product: Drivers Reporter: Martin Peres (martin.peres)
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: RESOLVED CODE_FIX    
Severity: normal CC: tiwai, tomi.p.sarvela
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: Linux-next Subsystem:
Regression: No Bisected commit-id:
Attachments: Fix patch
Fix a typo leading to the module reference unbalance

Description Martin Peres 2017-06-29 13:52:17 UTC
One of the following commits introduced the a warning on Intel-GFX-CI's Brodwell and Haswell machines:

43f6c8d ALSA: hda - Minor code refactoring for Intel HDMI codec parsers
fcc88d9 ALSA: hda - Bind with i915 component before codec binding
17890880 ALSA: hda - Skip card registration when no codec is found
d94815f ALSA: hda - Fix endless loop of codec configure

Here is the actual warning that we are seeing:

[  437.622081] ------------[ cut here ]------------
[  437.622092] WARNING: CPU: 6 PID: 3239 at sound/hda/hdac_i915.c:367 snd_hdac_i915_init+0x13f/0x160 [snd_hda_core]
[  437.622093] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec crct10dif_pclmul snd_hwdep crc32_pclmul snd_hda_core ghash_clmulni_intel snd_pcm e1000e mei_me mei ptp pps_core prime_numbers lpc_ich [last unloaded: i915]
[  437.622127] CPU: 6 PID: 3239 Comm: kworker/6:3 Tainted: G     U  W       4.12.0-rc7-CI-CI_DRM_2782+ #1
[  437.622129] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
[  437.622132] Workqueue: events azx_probe_work [snd_hda_intel]
[  437.622135] task: ffff8804080ea740 task.stack: ffffc90000cb8000
[  437.622138] RIP: 0010:snd_hdac_i915_init+0x13f/0x160 [snd_hda_core]
[  437.622140] RSP: 0018:ffffc90000cbbda0 EFLAGS: 00010286
[  437.622143] RAX: 0000000000000001 RBX: ffff880405e38008 RCX: 0000000000000001
[  437.622145] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff880405e38008
[  437.622146] RBP: ffffc90000cbbdd0 R08: ffff8804080eaf78 R09: 0000000000000000
[  437.622148] R10: ffffc90000cbbdc8 R11: 0000000000000001 R12: ffff88041eb9a180
[  437.622150] R13: ffff88040b996888 R14: ffff880405e387c0 R15: ffff880405e387c0
[  437.622151] FS:  0000000000000000(0000) GS:ffff88041eb80000(0000) knlGS:0000000000000000
[  437.622153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  437.622155] CR2: 00007f9310357b00 CR3: 000000040046c000 CR4: 00000000001406e0
[  437.622156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  437.622158] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  437.622159] Call Trace:
[  437.622164]  azx_probe_work+0x4dd/0x930 [snd_hda_intel]
[  437.622171]  process_one_work+0x1fe/0x670
[  437.622175]  worker_thread+0x49/0x3b0
[  437.622179]  kthread+0x10f/0x150
[  437.622182]  ? process_one_work+0x670/0x670
[  437.622184]  ? kthread_create_on_node+0x40/0x40
[  437.622189]  ret_from_fork+0x27/0x40
[  437.622193] Code: 49 c7 84 24 48 05 00 00 00 00 00 00 44 89 fa 48 c7 c6 a8 6c 0c a0 4c 89 ef 48 c7 05 98 4f 00 00 00 00 00 00 e8 b3 18 4f e1 eb 8b <0f> ff 41 be f0 ff ff ff eb 81 41 be f4 ff ff ff e9 76 ff ff ff 
[  437.622275] ---[ end trace f713c52cd85ad78a ]---
[  437.631259] Setting dangerous option reset - tainting kernel
Comment 1 Takashi Iwai 2017-06-29 13:54:54 UTC
Thanks for the report, it must be fcc88d9.

For HSW and BDW, we need to avoid calling snd_hdac_i915_init() twice.  I'll work on it.
Comment 2 Takashi Iwai 2017-06-29 14:26:45 UTC
Could you try the patch below?
Comment 3 Takashi Iwai 2017-06-29 14:28:47 UTC
Created attachment 257225 [details]
Fix patch
Comment 4 Martin Peres 2017-06-29 21:24:01 UTC
Thanks, I'll try it tomorrow!
Comment 5 Martin Peres 2017-06-30 09:33:19 UTC
Tested-by: Martin Peres <martin.peres@linux.intel.com>

Thanks, I'll let you close the bug when you have pushed it upstream :) I'll use your patch in our CI for now.
Comment 6 Takashi Iwai 2017-06-30 09:35:41 UTC
Thanks!  I already merged in this morning as I could check it locally, so sorry, your tested-by slipped :)
Comment 7 Martin Peres 2017-06-30 12:04:40 UTC
Wonderful, thanks for being so reactive :) Let's see when the next bug comes :)
Comment 8 Martin Peres 2017-06-30 23:40:52 UTC
I seem to have ill-tested the patch as it seems like it is is now preventing i915 to be reloaded.

This could be to improper refcounting but I can't check until Monday. I at least verified that it really was caused by this patch before leaving the office.

If you know what is going on, then go for it. Otherwise, let's debug this on Tuesday (when I am back from my short vacation) :)
Comment 9 Takashi Iwai 2017-07-01 09:01:32 UTC
I checked HSW machine here but couldn't find a problem.

The difference is that now no sound card instance is registered for HDMI when modeset is disabled while it appeared registered uselessly.  This may change the card index number.
Comment 10 Martin Peres 2017-07-04 13:17:54 UTC
I understand what you mean, but this does not seem to be our problem.

What I see is that all our driver-reload tests are now failing. Looking at the code, it is the unload part that does not work, which could suggest that something is depending on i915, which prevents its unloading.

I did not have time today to look more into that. Tell me if this is something you would want/need.
Comment 11 Takashi Iwai 2017-07-04 13:46:07 UTC
Do you mean "all" as really all i915 platforms failing, not only HSW/BDW?
Also, can you give a way to reproduce the problem manually?
Comment 12 Martin Peres 2017-07-04 13:49:44 UTC
Yes, it is on all gen4+ platforms: https://intel-gfx-ci.01.org/CI/igt@drv_module_reload@basic-reload.html

Here are the unit tests: https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/

Please run the following commands:

./autogen.sh
make	
./tests/drv_module_reload --run-subtest basic-reload

That should reproduce the problem

Thanks for looking into it!
Comment 13 Takashi Iwai 2017-07-04 14:03:10 UTC
Never mind, I found a typo causing the problem.

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 03e34edc8f24..5ae8ddab6412 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1385,7 +1385,7 @@ static int azx_free(struct azx *chip)
 		if (hda->need_i915_power)
 			snd_hdac_display_power(bus, false);
 	}
-	if (chip->driver_type & AZX_DCAPS_I915_COMPONENT)
+	if (chip->driver_caps & AZX_DCAPS_I915_COMPONENT)
 		snd_hdac_i915_exit(bus);
 	kfree(hda);
 
I'll merge the fix for 4.13.
Comment 14 Takashi Iwai 2017-07-04 14:07:17 UTC
Created attachment 257349 [details]
Fix a typo leading to the module reference unbalance
Comment 15 Martin Peres 2017-07-04 17:52:25 UTC
Wonderful! We should have it in drm-tip soon then. Thanks :)
Comment 16 Tomi Sarvela 2017-07-05 07:21:33 UTC
This fixed it:
fc18282 ALSA: hda - Fix unbalance of i915 module refcount