Bug 198603

Summary: [nouveau] Card overheating with HDMI → DVI output plugged-in, WARNING in dmesg
Product: Drivers Reporter: Bruno Pagani (bruno.n.pagani)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: bruno.n.pagani
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.14 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output from the boot

Description Bruno Pagani 2018-01-28 21:24:11 UTC
At my desktop, I have an external monitor with a DVI input. I use an HDMI → DVI adapter to plug it in.

My HDMI output is handled by the Nvidia chip in an Optimus configuration, so I use Reverse PRIME with xrandr to be able to use it while running on the Intel chip.

From times to times, my laptop start overheating, and unplugging then plugging-in back the monitor is enough for it to cool down and reset the situation to normal state of affairs. Just upon unplugging, I have this in my dmesg:
```
[  618.325914] ------------[ cut here ]------------
[  618.325956] WARNING: CPU: 0 PID: 178 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x9b/0x330 [nouveau]
[  618.325957] Modules linked in: fuse joydev mousedev arc4 snd_hda_codec_conexant snd_hda_codec_generic input_leds nls_iso8859_1 iwlmvm psmouse nls_cp437 serio_raw vfat fat atkbd mac80211 iTCO_wdt libps2 iTCO_vendor_support nouveau hp_wmi snd_hda_intel sparse_keymap iwlwifi wmi_bmof snd_hda_codec intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_core snd_hwdep kvm snd_pcm hp_accel tpm_infineon e1000e cfg80211 snd_timer snd irqbypass intel_cstate intel_rapl_perf idma64 mxm_wmi ptp pps_core rtsx_pci_ms i2c_i801 soundcore ttm memstick rfkill intel_pch_thermal intel_lpss_pci processor_thermal_device shpchp intel_soc_dts_iosf thermal lis3lv02d i8042 input_polldev battery pinctrl_sunrisepoint int3403_thermal intel_lpss_acpi int340x_thermal_zone serio pinctrl_intel wmi intel_lpss tpm_tis led_class
[  618.325984]  tpm_tis_core tpm int3400_thermal acpi_thermal_rel hp_wireless acpi_pad ac evdev mac_hid sch_fq_codel coretemp msr ip_tables x_tables btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mod rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ahci aesni_intel aes_x86_64 crypto_simd libahci glue_helper cryptd nvme xhci_pci libata xhci_hcd nvme_core rtsx_pci scsi_mod usbcore usb_common i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart crc32c_intel
[  618.326007] CPU: 0 PID: 178 Comm: kworker/0:2 Tainted: G     U          4.14.15-1-ARCH #1
[  618.326008] Hardware name: HP HP ZBook Studio G3/80D4, BIOS N82 Ver. 01.16 04/14/2017
[  618.326028] Workqueue: events nouveau_display_hpd_work [nouveau]
[  618.326029] task: ffff9ac9d7672dc0 task.stack: ffffab8983d7c000
[  618.326047] RIP: 0010:nouveau_dp_detect+0x9b/0x330 [nouveau]
[  618.326048] RSP: 0018:ffffab8983d7fce0 EFLAGS: 00010293
[  618.326049] RAX: 0000000000000000 RBX: ffff9ac9d7643200 RCX: 0000000000000000
[  618.326049] RDX: ffffab899100e4e4 RSI: ffffab899100e4e4 RDI: 0000000001009007
[  618.326050] RBP: ffff9ac9ca1ac000 R08: ffffab8983d7fcf0 R09: ffffab8983d7fcea
[  618.326051] R10: 0000000000000000 R11: 0000000000000010 R12: ffff9ac9ca1a8800
[  618.326051] R13: ffff9ac9d9049000 R14: ffff9ac9d904f328 R15: ffff9ac9d7643218
[  618.326052] FS:  0000000000000000(0000) GS:ffff9ac9ff400000(0000) knlGS:0000000000000000
[  618.326053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  618.326053] CR2: 00007f5f62b5d000 CR3: 00000007e100a003 CR4: 00000000003606f0
[  618.326054] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  618.326055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  618.326055] Call Trace:
[  618.326074]  nouveau_connector_detect+0x2d7/0x4c0 [nouveau]
[  618.326090]  ? nouveau_display_acpi_ntfy+0x4c/0x60 [nouveau]
[  618.326093]  ? notifier_call_chain+0x47/0x70
[  618.326097]  ? drm_helper_probe_detect_ctx+0xbc/0xe0 [drm_kms_helper]
[  618.326100]  drm_helper_probe_detect_ctx+0xbc/0xe0 [drm_kms_helper]
[  618.326104]  drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper]
[  618.326120]  nouveau_display_hpd_work+0x2a/0x70 [nouveau]
[  618.326122]  process_one_work+0x1e0/0x420
[  618.326124]  worker_thread+0x2b/0x3d0
[  618.326126]  ? process_one_work+0x420/0x420
[  618.326127]  kthread+0x11a/0x130
[  618.326128]  ? kthread_create_on_node+0x70/0x70
[  618.326130]  ret_from_fork+0x35/0x40
[  618.326131] Code: 4c 24 0a 4c 8d 44 24 10 31 c9 ba 09 00 00 00 be 01 00 00 00 48 89 ef e8 14 64 f8 ff 85 c0 0f 85 8f 00 00 00 80 7c 24 0a 08 74 02 <0f> ff 48 89 ef e8 eb 61 f8 ff 44 0f b6 44 24 11 0f b6 4c 24 12 
[  618.326152] ---[ end trace 6c96f679868d0c6e ]---
[  618.326196] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-3
```

If I disconnect the monitor while no overheating is happening, nothing appears. Note that this issue was present before (at least in 4.12), but is much more common now (often happens after no more than 2 minutes).

Please tell me what you would want as information or what should I do to help you fix this. Thanks!
Comment 1 Bruno Pagani 2018-01-29 11:31:29 UTC
Some update: overheating and the error message seems not correlated anymore. The error message is happening every time I unplug (even after only 5s), and I have yet to see overheating going as hot as it was before, but this might be quite subjective. I’ll keep running with two screens from now on, so will see. At the very least there should still be investigation going on for this WARNING.

I’ve took a look at my former kernel logs when overheating was linked to an error on unplug, and the output looked like this:
```
[ 7039.070478] ------------[ cut here ]------------
[ 7039.070493] WARNING: CPU: 0 PID: 1652 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:169 nouveau_dp_detect+0x9b/0x320 [nouveau]
[ 7039.070493] Modules linked in: nouveau ttm fuse mousedev hp_wmi joydev mxm_wmi sparse_keymap iTCO_wdt iTCO_vendor_support i2c_designware_platform i2c_designware_core intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm irqbypass intel_cstate intel_rapl_perf input_leds psmouse snd_hda_codec_conexant snd_hda_codec_generic arc4 nls_iso8859_1 nls_cp437 snd_hda_intel vfat fat snd_hda_codec snd_hda_core snd_hwdep snd_pcm e1000e snd_timer iwlmvm snd soundcore ptp pps_core i2c_i801 mac80211 iwlwifi rtsx_pci_ms memstick cfg80211 rfkill idma64 intel_lpss_pci intel_pch_thermal shpchp processor_thermal_device intel_soc_dts_iosf tpm_infineon thermal battery int3403_thermal int340x_thermal_zone wmi hp_accel intel_lpss_acpi lis3lv02d intel_lpss input_polldev led_class int3400_thermal evdev acpi_thermal_rel mac_hid
[ 7039.070517]  hp_wireless tpm_tis tpm_tis_core acpi_pad ac tpm sch_fq_codel coretemp msr ip_tables x_tables btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mod dax rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ahci xhci_pci libahci nvme xhci_hcd libata nvme_core rtsx_pci scsi_mod usbcore usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crc32c_intel [last unloaded: bbswitch]
[ 7039.070538] CPU: 0 PID: 1652 Comm: kworker/0:0 Tainted: G        W  O    4.12.10-1-ARCH #1
[ 7039.070539] Hardware name: HP HP ZBook Studio G3/80D4, BIOS N82 Ver. 01.16 04/14/2017
[ 7039.070549] Workqueue: events nouveau_display_hpd_work [nouveau]
[ 7039.070550] task: ffff88d259c40000 task.stack: ffff9bfb03ffc000
[ 7039.070559] RIP: 0010:nouveau_dp_detect+0x9b/0x320 [nouveau]
[ 7039.070560] RSP: 0018:ffff9bfb03fffc88 EFLAGS: 00010293
[ 7039.070561] RAX: 0000000000000000 RBX: ffff88d232a05400 RCX: 0000000000000000
[ 7039.070561] RDX: 0000000000000008 RSI: ffff9bfb1100e4e4 RDI: 0000000001009007
[ 7039.070562] RBP: ffff9bfb03fffcd0 R08: ffff9bfb03fffc98 R09: ffff9bfb03fffc92
[ 7039.070562] R10: 0000000000000000 R11: 0000000000000010 R12: ffff88d234f1f800
[ 7039.070563] R13: ffff88d234f18800 R14: ffff88d21908f000 R15: ffff88d232a05418
[ 7039.070563] FS:  0000000000000000(0000) GS:ffff88d27f400000(0000) knlGS:0000000000000000
[ 7039.070564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7039.070564] CR2: 00002bfdda997140 CR3: 000000064ea09000 CR4: 00000000003406f0
[ 7039.070565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7039.070565] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7039.070566] Call Trace:
[ 7039.070569]  ? radix_tree_lookup+0xd/0x10
[ 7039.070579]  nouveau_connector_detect+0x2d1/0x4c0 [nouveau]
[ 7039.070582]  drm_helper_probe_detect_ctx+0xc2/0xe0 [drm_kms_helper]
[ 7039.070583]  ? drm_helper_probe_detect_ctx+0xc2/0xe0 [drm_kms_helper]
[ 7039.070585]  drm_helper_hpd_irq_event+0xa2/0x120 [drm_kms_helper]
[ 7039.070594]  nouveau_display_hpd_work+0x2e/0x70 [nouveau]
[ 7039.070596]  process_one_work+0x1de/0x430
[ 7039.070597]  worker_thread+0x47/0x3f0
[ 7039.070598]  kthread+0x125/0x140
[ 7039.070599]  ? process_one_work+0x430/0x430
[ 7039.070600]  ? kthread_create_on_node+0x70/0x70
[ 7039.070602]  ret_from_fork+0x25/0x30
[ 7039.070603] Code: c3 4c 8d 4d c2 4c 8d 45 c8 31 c9 ba 09 00 00 00 be 01 00 00 00 4c 89 e7 e8 53 4e f8 ff 85 c0 0f 85 8a 00 00 00 80 7d c2 08 74 02 <0f> ff 4c 89 e7 e8 3b 4c f8 ff 44 0f b6 45 c9 0f b6 4d ca 8b 15 
[ 7039.070620] ---[ end trace e2badeedc3409171 ]---
[ 7039.070646] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-3
```

It’s quite similar to me, but maybe not…
Comment 2 Bruno Pagani 2018-01-29 13:01:55 UTC
So I got overheating again, but the dmesg output was the same as in OP. The old log should probably be disregarded.

My hope is that fixing the cause of this WARNING will also fix the overheating, since I don’t get anything specific to the overheating anymore.
Comment 3 Bruno Pagani 2018-01-29 13:02:58 UTC
Created attachment 273913 [details]
dmesg output from the boot

I’m attaching the full dmesg output.

At 43.x is when I plug the output.

Starting from 94.x is unplugging.