Bug 71631

Summary: On a Sony VAIO Z1 reading /sys/bus/pci/devices/0000:01:00.1/config hangs
Product: Other Reporter: John Hughes (john)
Component: OtherAssignee: other_other
Status: NEW ---    
Severity: normal CC: ghisha, john
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: sysrq-t during lspci hang
sysrq-t after lspci hang - kworker looping?

Description John Hughes 2014-03-07 10:47:41 UTC
This device is:

01:00.1 Audio device: NVIDIA Corporation GT216 HDMI Audio Controller (rev a1)

After some minutes the read succeeds, but then the system rapidly becomes unusable with all cpu time being used in various kworker and migration tasks.

3.11 works ok.

When .../config is read these messages appear in the log:


[  109.460605] ACPI Warning: \_SB_.PCI0.P0P2.DGPU._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20131115/nsarguments-95)
[  109.606870] ACPI: \_SB_.PCI0: Bus check notify on hotplug_event_root
[  109.842723] hda-intel 0000:01:00.1: Enabling via VGA-switcheroo

This is Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=740933
Comment 1 John Hughes 2014-03-07 14:48:45 UTC
Ok, I built a vanilla 3.12 to try.

The behaviour is not exactly the same as the Debian kernel I was using, now the lspci "works" but the system chokes up in a similar way soon after.

Next, vanilla 3.11, then maybe bisect.  Ugh.
Comment 2 John Hughes 2014-03-07 16:22:17 UTC
Trying vanilla 3.11, working OK so far.
Comment 3 John Hughes 2014-03-08 15:16:20 UTC
5addcf0a5f0fadceba6bd562d0616a1c5d4c1a4d is the first bad commit
commit 5addcf0a5f0fadceba6bd562d0616a1c5d4c1a4d
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Sep 10 14:20:51 2012 +1000

    nouveau: add runtime PM support (v0.9)
    
    This hooks nouveau up to the runtime PM system to enable
    dynamic power management for secondary GPUs in switchable
    and optimus laptops.
    
    a) rewrite suspend/resume printks to hide them during dynamic s/r
    to avoid cluttering logs
    b) add runtime pm suspend to irq handler, crtc display, ioctl handler,
    connector status,
    c) handle hdmi audio dynamic power on/off using magic register.
    
    v0.5:
    make sure we hit D3 properly
    fix fbdev_set_suspend locking interaction, we only will poweroff if we have no
    active crtcs/fbcon anyways.
    add reference for active crtcs.
    sprinkle mark last busy for autosuspend timeout
    
    v0.6:
    allow more flexible debugging - to avoid log spam
    add option to enable/disable dynpm
    got to D3Cold
    
    v0.7:
    add hdmi audio support.
    
    v0.8:
    call autosuspend from idle, so pci config space access doesn't go straight
    back to sleep, this makes starting X faster.
    only signal usage if we actually handle the irq, otherwise usb keeps us awake.
    fix nv50 display active powerdown
    
    v0.9:
    use masking function to enable hdmi audio
    set busy when we fail to suspend
    
    Signed-off-by: Dave Airlie <airlied@redhat.com>

:040000 040000 dd34cf3bf24d742d91d5cb3565c828bdc818fa44 fcbbabdd4314fd2bc6e6273bf9914e22693bf327 M	drivers
Comment 4 John Hughes 2014-03-11 19:42:31 UTC
Created attachment 128991 [details]
sysrq-t during lspci hang
Comment 5 John Hughes 2014-03-11 19:43:27 UTC
Created attachment 129001 [details]
sysrq-t after lspci hang - kworker looping?
Comment 6 John Hughes 2014-03-11 19:44:51 UTC
This is what lspci seems to be doing when it's hung:

[   95.522084] lspci           D ffff88024f551b40     0  2318   2294 0x00000000
[   95.522088]  ffff88024f551800 0000000000000082 0000000000014280 0000000000014280
[   95.522092]  ffff8800369ebfd8 ffff88024f551800 ffffffff819eec40 ffff8800369ebac8
[   95.522096]  ffff8800369eba40 00000000ffff3831 ffffffff819eec40 000000000000e118
[   95.522101] Call Trace:
[   95.522105]  [<ffffffff814a245d>] ? schedule_timeout+0x16d/0x2c0
[   95.522110]  [<ffffffff81066cb0>] ? ftrace_raw_output_tick_stop+0x60/0x60
[   95.522115]  [<ffffffff810115bd>] ? __switch_to+0x15d/0x4b0
[   95.522119]  [<ffffffff81068277>] ? msleep+0x27/0x30
[   95.522127]  [<ffffffffa058c8fa>] ? azx_get_response+0x7a/0x2a0 [snd_hda_intel]
[   95.522139]  [<ffffffffa0602353>] ? codec_exec_verb+0x163/0x220 [snd_hda_codec]
[   95.522149]  [<ffffffffa06010f4>] ? snd_hda_codec_read+0x54/0x80 [snd_hda_codec]
[   95.522159]  [<ffffffffa0601d87>] ? hda_set_power_state+0x97/0x100 [snd_hda_codec]
[   95.522168]  [<ffffffffa0601ec1>] ? hda_call_codec_resume+0xd1/0x1c0 [snd_hda_codec]
[   95.522178]  [<ffffffffa0604280>] ? snd_hda_resume+0x20/0x40 [snd_hda_codec]
[   95.522185]  [<ffffffffa058d7eb>] ? azx_resume+0x9b/0x150 [snd_hda_intel]
[   95.522191]  [<ffffffff8134e735>] ? set_audio_state+0x55/0x60
[   95.522219]  [<ffffffffa04beabc>] ? nouveau_pmops_runtime_resume+0xcc/0x100 [nouveau]
[   95.522228]  [<ffffffff812a0a43>] ? pci_pm_runtime_resume+0x73/0xb0
[   95.522234]  [<ffffffff8135e767>] ? __rpm_callback+0x27/0x70
[   95.522238]  [<ffffffff8135e7ca>] ? rpm_callback+0x1a/0x80
[   95.522243]  [<ffffffff8135f726>] ? rpm_resume+0x466/0x660
[   95.522247]  [<ffffffff8118ec56>] ? dput+0x16/0xf0
[   95.522252]  [<ffffffff8135f963>] ? __pm_runtime_resume+0x43/0x70
[   95.522257]  [<ffffffff8134ebd1>] ? vga_switcheroo_runtime_resume_hdmi_audio+0x81/0xd0
[   95.522262]  [<ffffffff8135e767>] ? __rpm_callback+0x27/0x70
[   95.522266]  [<ffffffff8135e7ca>] ? rpm_callback+0x1a/0x80
[   95.522271]  [<ffffffff8135f726>] ? rpm_resume+0x466/0x660
[   95.522276]  [<ffffffff8135f963>] ? __pm_runtime_resume+0x43/0x70
[   95.522282]  [<ffffffff812a313f>] ? pci_read_config+0x7f/0x2a0
[   95.522290]  [<ffffffff811e71d5>] ? sysfs_bin_read+0xd5/0x1b0
[   95.522294]  [<ffffffff8117935b>] ? vfs_read+0x8b/0x160
[   95.522299]  [<ffffffff81179fa3>] ? SyS_pread64+0x63/0xa0
[   95.522304]  [<ffffffff814adb39>] ? system_call_fastpath+0x16/0x1b
Comment 7 John Hughes 2014-03-11 19:46:24 UTC
And this is what the kworker thread seems to be doing after lspci finally exits:

[  258.779028] kworker/u16:2   D ffff88024f85e350     0   206      2 0x00000000
[  258.779037] Workqueue: hd-audio1 hdmi_repoll_eld [snd_hda_codec_hdmi]
[  258.779039]  ffff88024f85e010 0000000000000046 0000000000014280 0000000000014280
[  258.779043]  ffff88024fae9fd8 ffff88024f85e010 ffff880253210000 ffff88024fae9cd8
[  258.779048]  ffff88024fae9c50 00000000ffffd79f ffff880253210000 000000000000e118
[  258.779052] Call Trace:
[  258.779057]  [<ffffffff814a245d>] ? schedule_timeout+0x16d/0x2c0
[  258.779061]  [<ffffffff81066cb0>] ? ftrace_raw_output_tick_stop+0x60/0x60
[  258.779066]  [<ffffffff81068277>] ? msleep+0x27/0x30
[  258.779073]  [<ffffffffa06118fa>] ? azx_get_response+0x7a/0x2a0 [snd_hda_intel]
[  258.779084]  [<ffffffffa0672353>] ? codec_exec_verb+0x163/0x220 [snd_hda_codec]
[  258.779095]  [<ffffffffa06710f4>] ? snd_hda_codec_read+0x54/0x80 [snd_hda_codec]
[  258.779101]  [<ffffffffa04373dc>] ? snd_hdmi_get_eld+0x2c/0xb0 [snd_hda_codec_hdmi]
[  258.779107]  [<ffffffffa0435694>] ? hdmi_present_sense+0x94/0x390 [snd_hda_codec_hdmi]
[  258.779112]  [<ffffffffa04359b4>] ? hdmi_repoll_eld+0x24/0x60 [snd_hda_codec_hdmi]
[  258.779117]  [<ffffffff81074c2d>] ? process_one_work+0x16d/0x420
[  258.779121]  [<ffffffff810757f6>] ? worker_thread+0x116/0x3b0
[  258.779126]  [<ffffffff810756e0>] ? rescuer_thread+0x330/0x330
[  258.779129]  [<ffffffff8107bb21>] ? kthread+0xc1/0xe0
[  258.779133]  [<ffffffff8107ba60>] ? kthread_create_on_node+0x180/0x180
[  258.779138]  [<ffffffff814ada8c>] ? ret_from_fork+0x7c/0xb0
[  258.779142]  [<ffffffff8107ba60>] ? kthread_create_on_node+0x180/0x180
Comment 8 John Hughes 2014-03-12 14:23:24 UTC
Well, that was a ridiculous waste of time on my part - the bug is fixed in (at least) 3.14.0-rc6
Comment 9 John Hughes 2014-03-12 14:33:03 UTC
Well, maybe I spoke too soon, it isn't exactly fixed - the system doesn't hang, but I get this in the log:

[  301.226037] ACPI Warning: \_SB_.PCI0.P0P2.DGPU._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20131218/nsarguments-95)
[  301.684458] hda-intel 0000:01:00.1: Enabling via VGA-switcheroo
[  306.923788] hda-intel 0000:01:00.1: Disabling via VGA-switcheroo
[  307.144358] hda-intel 0000:01:00.1: Cannot lock devices!
[  309.355868] ACPI Warning: \_SB_.PCI0.P0P2.DGPU._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20131218/nsarguments-95)