This device is: 01:00.1 Audio device: NVIDIA Corporation GT216 HDMI Audio Controller (rev a1) After some minutes the read succeeds, but then the system rapidly becomes unusable with all cpu time being used in various kworker and migration tasks. 3.11 works ok. When .../config is read these messages appear in the log: [ 109.460605] ACPI Warning: \_SB_.PCI0.P0P2.DGPU._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20131115/nsarguments-95) [ 109.606870] ACPI: \_SB_.PCI0: Bus check notify on hotplug_event_root [ 109.842723] hda-intel 0000:01:00.1: Enabling via VGA-switcheroo This is Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=740933
Ok, I built a vanilla 3.12 to try. The behaviour is not exactly the same as the Debian kernel I was using, now the lspci "works" but the system chokes up in a similar way soon after. Next, vanilla 3.11, then maybe bisect. Ugh.
Trying vanilla 3.11, working OK so far.
5addcf0a5f0fadceba6bd562d0616a1c5d4c1a4d is the first bad commit commit 5addcf0a5f0fadceba6bd562d0616a1c5d4c1a4d Author: Dave Airlie <airlied@redhat.com> Date: Mon Sep 10 14:20:51 2012 +1000 nouveau: add runtime PM support (v0.9) This hooks nouveau up to the runtime PM system to enable dynamic power management for secondary GPUs in switchable and optimus laptops. a) rewrite suspend/resume printks to hide them during dynamic s/r to avoid cluttering logs b) add runtime pm suspend to irq handler, crtc display, ioctl handler, connector status, c) handle hdmi audio dynamic power on/off using magic register. v0.5: make sure we hit D3 properly fix fbdev_set_suspend locking interaction, we only will poweroff if we have no active crtcs/fbcon anyways. add reference for active crtcs. sprinkle mark last busy for autosuspend timeout v0.6: allow more flexible debugging - to avoid log spam add option to enable/disable dynpm got to D3Cold v0.7: add hdmi audio support. v0.8: call autosuspend from idle, so pci config space access doesn't go straight back to sleep, this makes starting X faster. only signal usage if we actually handle the irq, otherwise usb keeps us awake. fix nv50 display active powerdown v0.9: use masking function to enable hdmi audio set busy when we fail to suspend Signed-off-by: Dave Airlie <airlied@redhat.com> :040000 040000 dd34cf3bf24d742d91d5cb3565c828bdc818fa44 fcbbabdd4314fd2bc6e6273bf9914e22693bf327 M drivers
Created attachment 128991 [details] sysrq-t during lspci hang
Created attachment 129001 [details] sysrq-t after lspci hang - kworker looping?
This is what lspci seems to be doing when it's hung: [ 95.522084] lspci D ffff88024f551b40 0 2318 2294 0x00000000 [ 95.522088] ffff88024f551800 0000000000000082 0000000000014280 0000000000014280 [ 95.522092] ffff8800369ebfd8 ffff88024f551800 ffffffff819eec40 ffff8800369ebac8 [ 95.522096] ffff8800369eba40 00000000ffff3831 ffffffff819eec40 000000000000e118 [ 95.522101] Call Trace: [ 95.522105] [<ffffffff814a245d>] ? schedule_timeout+0x16d/0x2c0 [ 95.522110] [<ffffffff81066cb0>] ? ftrace_raw_output_tick_stop+0x60/0x60 [ 95.522115] [<ffffffff810115bd>] ? __switch_to+0x15d/0x4b0 [ 95.522119] [<ffffffff81068277>] ? msleep+0x27/0x30 [ 95.522127] [<ffffffffa058c8fa>] ? azx_get_response+0x7a/0x2a0 [snd_hda_intel] [ 95.522139] [<ffffffffa0602353>] ? codec_exec_verb+0x163/0x220 [snd_hda_codec] [ 95.522149] [<ffffffffa06010f4>] ? snd_hda_codec_read+0x54/0x80 [snd_hda_codec] [ 95.522159] [<ffffffffa0601d87>] ? hda_set_power_state+0x97/0x100 [snd_hda_codec] [ 95.522168] [<ffffffffa0601ec1>] ? hda_call_codec_resume+0xd1/0x1c0 [snd_hda_codec] [ 95.522178] [<ffffffffa0604280>] ? snd_hda_resume+0x20/0x40 [snd_hda_codec] [ 95.522185] [<ffffffffa058d7eb>] ? azx_resume+0x9b/0x150 [snd_hda_intel] [ 95.522191] [<ffffffff8134e735>] ? set_audio_state+0x55/0x60 [ 95.522219] [<ffffffffa04beabc>] ? nouveau_pmops_runtime_resume+0xcc/0x100 [nouveau] [ 95.522228] [<ffffffff812a0a43>] ? pci_pm_runtime_resume+0x73/0xb0 [ 95.522234] [<ffffffff8135e767>] ? __rpm_callback+0x27/0x70 [ 95.522238] [<ffffffff8135e7ca>] ? rpm_callback+0x1a/0x80 [ 95.522243] [<ffffffff8135f726>] ? rpm_resume+0x466/0x660 [ 95.522247] [<ffffffff8118ec56>] ? dput+0x16/0xf0 [ 95.522252] [<ffffffff8135f963>] ? __pm_runtime_resume+0x43/0x70 [ 95.522257] [<ffffffff8134ebd1>] ? vga_switcheroo_runtime_resume_hdmi_audio+0x81/0xd0 [ 95.522262] [<ffffffff8135e767>] ? __rpm_callback+0x27/0x70 [ 95.522266] [<ffffffff8135e7ca>] ? rpm_callback+0x1a/0x80 [ 95.522271] [<ffffffff8135f726>] ? rpm_resume+0x466/0x660 [ 95.522276] [<ffffffff8135f963>] ? __pm_runtime_resume+0x43/0x70 [ 95.522282] [<ffffffff812a313f>] ? pci_read_config+0x7f/0x2a0 [ 95.522290] [<ffffffff811e71d5>] ? sysfs_bin_read+0xd5/0x1b0 [ 95.522294] [<ffffffff8117935b>] ? vfs_read+0x8b/0x160 [ 95.522299] [<ffffffff81179fa3>] ? SyS_pread64+0x63/0xa0 [ 95.522304] [<ffffffff814adb39>] ? system_call_fastpath+0x16/0x1b
And this is what the kworker thread seems to be doing after lspci finally exits: [ 258.779028] kworker/u16:2 D ffff88024f85e350 0 206 2 0x00000000 [ 258.779037] Workqueue: hd-audio1 hdmi_repoll_eld [snd_hda_codec_hdmi] [ 258.779039] ffff88024f85e010 0000000000000046 0000000000014280 0000000000014280 [ 258.779043] ffff88024fae9fd8 ffff88024f85e010 ffff880253210000 ffff88024fae9cd8 [ 258.779048] ffff88024fae9c50 00000000ffffd79f ffff880253210000 000000000000e118 [ 258.779052] Call Trace: [ 258.779057] [<ffffffff814a245d>] ? schedule_timeout+0x16d/0x2c0 [ 258.779061] [<ffffffff81066cb0>] ? ftrace_raw_output_tick_stop+0x60/0x60 [ 258.779066] [<ffffffff81068277>] ? msleep+0x27/0x30 [ 258.779073] [<ffffffffa06118fa>] ? azx_get_response+0x7a/0x2a0 [snd_hda_intel] [ 258.779084] [<ffffffffa0672353>] ? codec_exec_verb+0x163/0x220 [snd_hda_codec] [ 258.779095] [<ffffffffa06710f4>] ? snd_hda_codec_read+0x54/0x80 [snd_hda_codec] [ 258.779101] [<ffffffffa04373dc>] ? snd_hdmi_get_eld+0x2c/0xb0 [snd_hda_codec_hdmi] [ 258.779107] [<ffffffffa0435694>] ? hdmi_present_sense+0x94/0x390 [snd_hda_codec_hdmi] [ 258.779112] [<ffffffffa04359b4>] ? hdmi_repoll_eld+0x24/0x60 [snd_hda_codec_hdmi] [ 258.779117] [<ffffffff81074c2d>] ? process_one_work+0x16d/0x420 [ 258.779121] [<ffffffff810757f6>] ? worker_thread+0x116/0x3b0 [ 258.779126] [<ffffffff810756e0>] ? rescuer_thread+0x330/0x330 [ 258.779129] [<ffffffff8107bb21>] ? kthread+0xc1/0xe0 [ 258.779133] [<ffffffff8107ba60>] ? kthread_create_on_node+0x180/0x180 [ 258.779138] [<ffffffff814ada8c>] ? ret_from_fork+0x7c/0xb0 [ 258.779142] [<ffffffff8107ba60>] ? kthread_create_on_node+0x180/0x180
Well, that was a ridiculous waste of time on my part - the bug is fixed in (at least) 3.14.0-rc6
Well, maybe I spoke too soon, it isn't exactly fixed - the system doesn't hang, but I get this in the log: [ 301.226037] ACPI Warning: \_SB_.PCI0.P0P2.DGPU._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20131218/nsarguments-95) [ 301.684458] hda-intel 0000:01:00.1: Enabling via VGA-switcheroo [ 306.923788] hda-intel 0000:01:00.1: Disabling via VGA-switcheroo [ 307.144358] hda-intel 0000:01:00.1: Cannot lock devices! [ 309.355868] ACPI Warning: \_SB_.PCI0.P0P2.DGPU._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20131218/nsarguments-95)