Created attachment 281925 [details] Boot log Following message is seen in boot log "general protection fault: 0000 [#1] PREEMPT SMP NOPTI". The impact of this bug is that we are unable to get results for IGT tests. This issue was noticed on Icelake platform which is yet to release. As this issue is not related to i915 driver, created bug here. Machine details can be found from here https://intel-gfx-ci.01.org/hardware.html#fi-icl-u2 Full boot log is attached here. =========================================================================== 14.226066] general protection fault: 0000 [#1] PREEMPT SMP NOPTI <4>[ 14.226120] CPU: 5 PID: 60 Comm: kworker/5:1 Not tainted 5.0.0-g0a2a982693ac-drmtip_244+ #1 <4>[ 14.226189] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019 <4>[ 14.226300] Workqueue: events azx_probe_work [snd_hda_intel] <4>[ 14.226349] RIP: 0010:kobject_get+0xc/0x50 <4>[ 14.226386] Code: 88 fb 11 84 e8 0e 35 7b ff e8 35 d8 ff ff eb a6 48 c7 c2 58 fb 11 84 eb cb 0f 1f 44 00 00 48 85 ff 48 89 f8 74 37 48 83 ec 08 <f6> 47 3c 01 74 0f f0 ff 40 38 0f 88 a1 59 01 00 48 83 c4 08 c3 48 <4>[ 14.226387] RSP: 0018:ffffb1b2002c3c88 EFLAGS: 00010296 <4>[ 14.226389] RAX: 6b6b6b6b6b6b6b6b RBX: ffff90a4170d5908 RCX: 0000000000000001 <4>[ 14.226391] RDX: 0000000080000001 RSI: 00000000ffffffff RDI: 6b6b6b6b6b6b6b6b <4>[ 14.226392] RBP: ffffb1b2002c3d40 R08: 0000000000000020 R09: 0000000000000001 <4>[ 14.226393] R10: 0000000000000000 R11: ffff90a417117aea R12: 0000000000000000 <4>[ 14.226395] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff90a406055f00 <4>[ 14.226398] FS: 0000000000000000(0000) GS:ffff90a41ff40000(0000) knlGS:0000000000000000 <4>[ 14.226399] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 14.226400] CR2: 000056288cd19ab0 CR3: 0000000369214002 CR4: 0000000000760ee0 <4>[ 14.226401] PKRU: 55555554 <4>[ 14.226402] Call Trace: <4>[ 14.226408] kobject_add_internal+0x36/0x2d0 <4>[ 14.226412] kobject_add+0x71/0xd0 <4>[ 14.226419] ? add_widget_node+0x2d/0xa0 [snd_hda_core] <4>[ 14.226423] ? rcu_read_lock_sched_held+0x6f/0x80 <4>[ 14.227067] add_widget_node+0x59/0xa0 [snd_hda_core] <4>[ 14.227074] widget_tree_create+0xb9/0x110 [snd_hda_core] <4>[ 14.227139] hda_widget_sysfs_init+0x1a/0x40 [snd_hda_core] <4>[ 14.227143] snd_hdac_device_register+0x19/0x40 [snd_hda_core] <4>[ 14.227148] snd_hda_codec_configure+0x39/0x160 [snd_hda_codec] <4>[ 14.227153] azx_codec_configure+0x2a/0x60 [snd_hda_codec] <4>[ 14.227156] azx_probe_work+0x43e/0x7f0 [snd_hda_intel] <4>[ 14.227165] process_one_work+0x245/0x610 <4>[ 14.227169] worker_thread+0x37/0x380 <4>[ 14.227172] ? process_one_work+0x610/0x610 <4>[ 14.227174] kthread+0x119/0x130 <4>[ 14.227176] ? kthread_park+0x80/0x80 <4>[ 14.227179] ret_from_fork+0x24/0x50 <4>[ 14.227184] Modules linked in: snd_hda_codec_hdmi(+) snd_hda_codec_realtek(+) snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul btusb btrtl crc32_pclmul btbcm btintel ghash_clmulni_intel i915 snd_hda_intel bluetooth snd_hda_codec snd_hwdep cdc_ether snd_hda_core usbnet snd_pcm mii e1000e ptp ecdh_generic pps_core i2c_i801 mei_me mei prime_numbers <4>[ 14.227330] ---[ end trace 42f058964571473f ]---
Hrm, it's rather a straightforward code. Maybe the hardware provided some weird parameters so that the code tries to create too many nodes? For example, try to apply the patch below and check the additional debug message from there. --- a/sound/hda/hdac_sysfs.c +++ b/sound/hda/hdac_sysfs.c @@ -402,6 +402,10 @@ int hda_widget_sysfs_init(struct hdac_device *codec) if (codec->widgets) return 0; /* already created */ + pr_info("XXX codec %d: num_nodes=%d, start_nid=%d, afg=%d\n", + codec->num_nodes, codec->start_nid, codec->afg); + return 0; + err = widget_tree_create(codec); if (err < 0) { widget_tree_free(codec);
Takashi, do you have any possibility to apply the above patch by yourself? I will then gather the latest logs and attach here?
It's a debug patch so it's obviously not for upstream, so it's better to be applied in your side just for testing. It'll help for showing the possible weird hardware configuration on this specific chip.
Created attachment 282107 [details] Boot log with debug patch The original issue occured on fi-icl-u2 but the attached log is from fi-icl-u3. https://intel-gfx-ci.01.org/hardware.html here you can find the difference between these machines.
Takashi, thanks for your proposal patch. This issue only happened once so far (a month ago), and we are still waiting for another reproduction before applying the patch your proposed (minus the return 0?). Sorry for not getting involved more and letting this bug rot...
Last seen this issue 3 months, 4 weeks old. I propose to close this issue.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
On CI, last seen this issue on drmtip_694 (8 months, 3 weeks old).