We added a new machine in our CI lab (Intel's upcoming coffee lake platform) and we hit the following BUG when executing Intel-gpu-tools' unit test that reloads the i915 driver (igt@drv_module_reload@basic-reload): <1>[ 421.459968] BUG: unable to handle kernel paging request at ffffc90000c13fff <1>[ 421.459991] IP: pci_azx_readl+0x9/0x10 [snd_hda_intel] <4>[ 421.460005] PGD 45b626067 <4>[ 421.460006] P4D 45b626067 <4>[ 421.460014] PUD 45b627067 <4>[ 421.460022] PMD 45262d067 <4>[ 421.460030] PTE 0 <4>[ 421.460039] <4>[ 421.460051] Oops: 0000 [#1] PREEMPT SMP <4>[ 421.460062] Modules linked in: snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm i915 vgem ax88179_178a usbnet x86_pkg_temp_thermal mii intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000e ptp pps_core prime_numbers i2c_hid [last unloaded: i915] <4>[ 421.460129] CPU: 4 PID: 226 Comm: kworker/4:4 Tainted: G U 4.13.0-rc6-CI-CI_DRM_3001+ #1 <4>[ 421.460150] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X095.A04.1707240838 07/24/2017 <4>[ 421.460181] Workqueue: events azx_probe_work [snd_hda_intel] <4>[ 421.460196] task: ffff880455c1a880 task.stack: ffffc90000b9c000 <4>[ 421.460211] RIP: 0010:pci_azx_readl+0x9/0x10 [snd_hda_intel] <4>[ 421.460226] RSP: 0018:ffffc90000b9fd90 EFLAGS: 00010286 <4>[ 421.460239] RAX: ffffffffa00b9da0 RBX: 0000000000000fff RCX: 0000000000000001 <4>[ 421.460260] RDX: 0000000080000001 RSI: 000000004675af53 RDI: ffffc90000c13fff <4>[ 421.460279] RBP: ffffc90000b9fd90 R08: ffff880455c1b170 R09: 0000000000000000 <4>[ 421.460297] R10: 000000008f7e55c2 R11: 000000009cacc5d1 R12: 000000000000ffff <4>[ 421.460314] R13: 000000000000ffff R14: 00000000ffffffff R15: ffff88044bb64548 <4>[ 421.460331] FS: 0000000000000000(0000) GS:ffff88045d300000(0000) knlGS:0000000000000000 <4>[ 421.460351] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 421.460365] CR2: ffffc90000c13fff CR3: 00000004550b8000 CR4: 00000000003406e0 <4>[ 421.460385] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[ 421.460402] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4>[ 421.460419] Call Trace: <4>[ 421.460430] snd_hdac_bus_parse_capabilities+0x44/0x1f0 [snd_hda_core] <4>[ 421.460447] azx_probe_work+0x508/0x970 [snd_hda_intel] <4>[ 421.460463] process_one_work+0x224/0x650 <4>[ 421.460475] worker_thread+0x4e/0x3b0 <4>[ 421.460486] kthread+0x114/0x150 <4>[ 421.460495] ? process_one_work+0x650/0x650 <4>[ 421.460507] ? kthread_create_on_node+0x40/0x40 <4>[ 421.460520] ret_from_fork+0x27/0x40 <4>[ 421.460531] Code: 88 10 06 00 00 04 31 c0 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 89 3e 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 <8b> 07 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 66 89 3e 5d c3 <1>[ 421.460601] RIP: pci_azx_readl+0x9/0x10 [snd_hda_intel] RSP: ffffc90000b9fd90 <4>[ 421.460618] CR2: ffffc90000c13fff <4>[ 421.460628] ---[ end trace cd07de60fc683e7b ]--- <3>[ 421.811931] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33 <3>[ 421.811978] in_atomic(): 0, irqs_disabled(): 1, pid: 226, name: kworker/4:4 <4>[ 421.812020] INFO: lockdep is turned off. <4>[ 421.812815] irq event stamp: 160086 <4>[ 421.813562] hardirqs last enabled at (160085): [<ffffffff811f8e99>] __slab_alloc.isra.23.constprop.28+0x59/0x80 <4>[ 421.814345] hardirqs last disabled at (160086): [<ffffffff818c565f>] error_entry+0x6f/0xc0 <4>[ 421.815062] softirqs last enabled at (159980): [<ffffffff818c8a86>] __do_softirq+0x3a6/0x4ae <4>[ 421.815820] softirqs last disabled at (159973): [<ffffffff81086d5e>] irq_exit+0xae/0xc0 <4>[ 421.816598] CPU: 4 PID: 226 Comm: kworker/4:4 Tainted: G UD 4.13.0-rc6-CI-CI_DRM_3001+ #1 <4>[ 421.817411] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X095.A04.1707240838 07/24/2017 <4>[ 421.818144] Workqueue: events azx_probe_work [snd_hda_intel] <4>[ 421.818917] Call Trace: <4>[ 421.819686] dump_stack+0x68/0x9f <4>[ 421.820479] ___might_sleep+0x1e5/0x240 <4>[ 421.821189] __might_sleep+0x4a/0x80 <4>[ 421.821945] exit_signals+0x24/0x2a0 <4>[ 421.822709] do_exit+0x95/0xcf0 <4>[ 421.823464] ? kthread+0x114/0x150 <4>[ 421.824151] ? process_one_work+0x650/0x650 <4>[ 421.824888] rewind_stack_do_exit+0x17/0x20 We do not yet know what is the reproduction rate of this issue. We will share this information in the coming days, when we get more samples in our failure statistics. Is there anything you would like us to try in the mean time?
The bug has been fixed, closing!