Bug 38442
Description
Dominik Köppl
2011-06-28 15:20:00 UTC
Created attachment 63742 [details]
dmesg output
Created attachment 63752 [details]
lspci output
Problem still exists for kernel version 3.2.9 Can you please check out where the i915 kworker is chewing up so much cpu with either perf or sysprof or whatever's your favourite kernel profiler? Created attachment 72717 [details]
kernel profiling with `perf record -a`
Created attachment 72718 [details]
profiling kworker/u:0
I've no particular favorite when it comes to kernel profiling, so I've sticked to the pref-tool. `pref top` reveals: PerfTop: 1098 irqs/sec kernel:99.7% exact: 0.0% [1000Hz cycles], (all, 2 CPUs) ------------------------------------------------------------------------------------------------------------------------------- 27.23% [kernel] [k] yenta_interrupt 21.87% [kernel] [k] delay_tsc 8.65% [kernel] [k] i915_read32 8.37% [kernel] [k] read_hpet 4.94% [firewire_ohci] [k] irq_handler 4.89% [kernel] [k] uhci_irq 3.26% [kernel] [k] acpi_idle_do_entry 2.57% [kernel] [k] set_clock 1.90% [kernel] [k] acpi_os_read_port 1.48% [kernel] [k] set_data 1.39% [kernel] [k] irq_entries_start 1.30% [kernel] [k] get_clock 1.07% [kernel] [k] native_sched_clock 0.75% [kernel] [k] acpi_os_write_port 0.60% [kernel] [k] _raw_spin_lock_irqsave 0.60% [kernel] [k] i915_driver_irq_handler 0.58% [kernel] [k] sched_clock_local 0.47% [kernel] [k] handle_irq_event_percpu 0.42% [kernel] [k] menu_select 0.34% [kernel] [k] _raw_spin_lock 0.33% [kernel] [k] _raw_spin_unlock_irqrestore 0.33% [kernel] [k] trace_i915_reg_rw ... Furthermore, I've made an odd observation while profiling: [ 785.322447] Uhhuh. NMI received for unknown reason 21 on CPU 1. [ 785.323008] Do you have a strange power saving mode enabled? [ 785.323008] Dazed and confused, but trying to continue Hm, 1k irq/s is pretty high. Can you also attach the output of /proc/interrupts - I guess interrupt sharing is part of the problem. Also, can you attach a sample run of vmstat 1 (wait until it looks steady) just to check whether this kind of irq load is normal for your system? 1000 irq/s is just perf using NMI for wakeups. Looks like your cardbus driver has gone beserk. Mind you, it looks like i915 is quite busy as well, but that could just be vblank loading. Created attachment 72722 [details]
/proc/interrupts
Created attachment 72723 [details]
vmstat 1
According to vmstat we don't have anything much running and the irq numbers can't explain a sluggish system. Also. I can't find the i915 interrupt in your /proc/interrupt. Are these really with i915 loaded and modesetting enabled? Created attachment 72724 [details]
vmstat 1 with i915-kms
Created attachment 72725 [details]
/proc/interrupts with i915-kms
Comment on attachment 72722 [details]
/proc/interrupts
without i915 module
Comment on attachment 72723 [details]
vmstat 1
without i915 module
Sorry, my mistake. Uploaded the probes of the system without kms enabled. Ok, 45k irq/s that sounds more like the problem we're looking for ;-) And given that things only go beserk once you load i915.ko the culprit seems clear. I'll write a debug patch to find out where the irqs are actually from within the gpu. Created attachment 72726 [details]
detailed interrupt sources for i915
Please apply this patch and then dump the i915_gem_interrupt file from debugfs in a 10 second interval and attach both, i.e.
cat i915_gem_interrupt > i915_irq1; sleep 10 ; cat i915_gem_interrupt > i915_irq2
Created attachment 72729 [details]
cat /sys/kernel/debug/dri/0/i915_gem_interrupt
Created attachment 72730 [details]
cat /sys/kernel/debug/dri/0/i915_gem_interrupt after sleep 10
The picture is clear: Hotplug gone mad. Iirc we've had a report that at least on one machine this is caused by the bios charging the battery. Can you try out what happens if you're not connected to the wall plug? Meanwhile I'll try to come up with new ideas. Similar scenario when running on battery: /proc/interrupts 16: 6411896 0 IO-APIC-fasteoi i915, yenta, uhci_hcd:usb5, firewire_ohci vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 963892 11488 23792 0 0 120 10 29433 541 5 44 48 2 1 0 0 963884 11488 23796 0 0 0 0 46522 689 0 44 56 0 1 0 0 963884 11488 23796 0 0 0 0 45372 608 0 43 57 0 1 0 0 963884 11488 23796 0 0 0 0 45265 620 0 42 58 0 PerfTop: 798 irqs/sec kernel:93.9% exact: 0.0% [1000Hz cycles], (all, 2 CPUs) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 32.43% [kernel] [k] yenta_interrupt 24.57% [kernel] [k] delay_tsc 7.79% [kernel] [k] i915_read32 5.02% [kernel] [k] set_clock Created attachment 72737 [details]
dump hotplug regs
Oops, I've created this patch yesterday and forgot to attach it. Please apply this on top of the other patch and grab the full dmesg. It should put the hotplug reg into dmesg (ratelimited) so we know whether it's a specific pin.
Your new patch renders the system completely useless - Whenever the kernel switches from low to high resolution the display turns completely black (e.g. no backlight) and the cpu fan goes to maximum. The notebook keeps frozen, I cannot even ping the ethernet connection. Created attachment 72743 [details]
dmesg output after applying patch
Okay, there was a bug in the patch - the semicolon in the line
+ if (__ratelimit(&rl_state));
caused trouble.
After fixing it, I got this `dmesg` output.
So it's bit6 HDMIB/SDVOB causing trouble ... I'll look into this tomorrow, thanks for testing (and fixing up the patch). Another thing: Can you please boot with drm.debug=0xe added to your kernel cmdline and attach the full output (modesetting enabled)? It doesn't matter when there's still another patch applied, I need to have all the drm output stuff. Also, if it's not too painful: Can you grab xrandr output? You need to start X for that, though ... Created attachment 72752 [details]
xrandr output
The machine currently runs an old version of X (x11-base/xorg-server-1.7.7-r1, x11-drivers/xf86-video-intel-2.9.1), hope that this fact is not crucial.
Created attachment 72754 [details] huge kernel log with drm.debug=0xe Adding drm.debug=0xe to the kernel line results in a hugh output in dmesg. I hope, I could fetch all important messages. I've also taken a larger snapshot of the messages under http://web403.webbox555.server-home.org/drake/kern.log.bz2 (around 1469541 lines...) Created attachment 72769 [details]
disable sdvob hotplug
A quick test patch for you to try - it should get rid of the hotplug irq storm for now. If it works, please boot againg with drm.debug=0xe and grab the full log (from the beginnig) - unfortunately the interesting part was cut off and totally swamped by irq messages.
Also, is the xrandr output correct and you do have a DVI output on your machine? Or could that DVI output be attached only to an optional base-station?
If you have a DVI connector, please check whether it works (you won't get hotplug notification though with this patch applied).
Created attachment 72775 [details]
complete dmesg with drm.debug=0xe
Patch worked out - the dmesg output seems complete.
Created attachment 72776 [details]
xrandr output with kernel 2.6.36-gentoo-r8 and no kms
My machine does not possess a DVI output.
I have a proprietary S-Video output and a 15-pin D-sub. Both connectors are analog.
Strangely enough, switching back to kernel 2.6.36 while keeping everything the same (Xorg, intel-drivers, etc.), the output of `xrandr` seems correct.
Created attachment 72777 [details]
printk hotplug_active
A new debug patch, please apply it on top of the other patch (so we can debug things without an irq storm) and grab the output of it.
Your xrandr has a DVI output too, btw: user mode setting tended to call DVI TMDS, but it's the same thing.
Created attachment 72783 [details]
dmesg - printk hotplug_active
The new line is:
[ 1.068171] hotplug active is 00,00
Good to know to have a DVI-capable video card without DVI output -.-
Created attachment 72927 [details]
retrieve sdo irq event source response
Please try this patch.
Created attachment 72928 [details]
fixup sdvo hotplug confusion
Now that I have the spec at hand, yet another patch for you to try.
Ok, I've pushed out a branch with the current set of sdvo patches I have to the for-dominik branch at http://cgit.freedesktop.org/~danvet/drm/ Please try whether that fixes anything for you. Unfortunately, the new kernel 3.4.0-rc2+ from your git repository does not deliver the desired result. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2004 root 20 0 5572 1912 1520 R 99 0.2 2:54.28 syslog-ng 14 root 20 0 0 0 0 R 97 0.0 0:57.64 kworker/u:1 806 root 20 0 0 0 0 S 2 0.0 1:13.89 kworker/u:4 1287 root 20 0 0 0 0 D 2 0.0 0:23.26 kworker/0:3 dmesg is again polluted with lines like [ 323.552089] [drm:i915_driver_irq_handler], hotplug event received, stat 0x00000040 Can you please install intel-gpu-tools v1.2 and attach the output file of intel_bios_dumper? Created attachment 72942 [details]
intel_bios_dumper output
Output of intel_bios_dumper from package intel-gpu-tools-1.2
Daniel, I found an erratum for 945GM SDVO hotplug, workaround is not to use it. :( Created attachment 73097 [details]
disable sdvo hotplug on i945
Please test this patch, thanks.
Excellent. I've applied your provided patch on vanilla kernel version 3.2.9. The system runs now stable, CPU usage of kworker dropped to nearly 0%. Perfect. SDVO hotplug has only been enabled in 3.1, so strictly speaking this is a regression. The upshot is that I'll fast-forward the bugfix to 3.4, cc: stable ;-) commit 768b107e4b3be0acf6f58e914afe4f337c00932b Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri May 4 11:29:56 2012 +0200 drm/i915: disable sdvo hotplug on i945g/gm Chris Wilson dug out a hw erratum saying that there's noise on the interrupt line on i945G chips. We also have a bug report from a i945GM chip with an sdvo hotplug interrupt storm (and no apparent cause). Play it safe and disable sdvo hotplug on all i945 variants. Note that this is a regression that has been introduced in 3.1, when we've enabled sdvo hotplug support with commit cc68c81aed7d892deaf12d720d5455208e94cd0a Author: Simon Farnsworth <simon.farnsworth@onelan.co.uk> Date: Wed Sep 21 17:13:30 2011 +0100 drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI Cc: stable@kernel.org Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=38442 Reported-and-tested-by: Dominik Köppl <dominik@devwork.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Pull request is out and already merged in drm-next, patch should land in Linus' tree shortly. Currently only patch "disable sdvo hotplug on i945" ( https://bugzilla.kernel.org/attachment.cgi?id=73097 ) has been merged into the vanilla kernel. Speaking of kernel version 3.5.0, I'll get again noise from the kworkers, but this time approx. 50% occupation of the cpu. Applying manually the patch "disable sdvob hotplug" ( https://bugzilla.kernel.org/attachment.cgi?id=72769 ) kills the noise. I don't know if I'm disabling with this patch any great feature, but the system seems now stable and quite. Created attachment 77451 [details]
don't enable the sdvo hotplug if not needed
Does this patch help (on top of the one already merged)?
Created attachment 77611 [details]
patch for kernel-3.5.1 : don't enable the sdvo hotplug if not needed
Slightly changed your provided patch to suite kernel-3.5.1 (stable).
Unfortunately, the noise still exists and is slowing down the system.
Created attachment 78741 [details]
only enable sdvo hotplug irq if needed
Dominik, please try this patch instead and report back. Thanks.
Also available in the for-dominik branch of git://gitorious.org/jani/drm.git (which is on top of drm-intel-next-queued).
Jani, thanks for your submission! Kernel 3.6.0-rc3 from the for-dominik branch of git://gitorious.org/jani/drm.git solved the problem. I've testing your modified kernel without noticing any performance regression. Patch merged to -fixes as commit fcbc50da7753b210b4442ca9abc4efbd4e481f6e Author: Jani Nikula <jani.nikula@intel.com> Date: Wed Aug 29 14:08:42 2012 +0300 drm/i915: only enable sdvo hotplug irq if needed Should pop up in 3.5 soonish (1-2 weeks). I probably have the same issue with the Intel 82Q35 chipset here, but using kernel 2.6.32 (my installation is based on CentOS 6). However, the patches provided here did not solve the issue. So in the end, additionally I prevented the SDVO device hotplugging from getting re-enabled when an interrupt happens. This significantly lowered the system load. I'd like to provide my patch for 2.6.32 and to discuss this solution with others. Would it be right to discuss it here, or should I open a separate related issue therefore? > --- Comment #54 from Björn Gerhart <bjoern.gerhart@wincor-nixdorf.com> > 2012-09-03 14:19:52 --- > I probably have the same issue with the Intel 82Q35 chipset here, but using > kernel 2.6.32 (my installation is based on CentOS 6). However, the patches > provided here did not solve the issue. So in the end, additionally I > prevented > the SDVO device hotplugging from getting re-enabled when an interrupt > happens. > This significantly lowered the system load. > I'd like to provide my patch for 2.6.32 and to discuss this solution with > others. Would it be right to discuss it here, or should I open a separate > related issue therefore? 3.6.32 is, especially wrt drm drivers ridiculously old. But iirc rhel does backport more recent drm drivers from recent kernels to not be so outdated, so you need to direct your support request at redhat/centos/whomever you've got your kernel from. Obviously if this is an issue on recent mainline kernels, please file a bug report (Q35 is a different chipset, so likely a different issue at hand). -- Daniel Vetter daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch A patch referencing this bug report has been merged in Linux v3.7-rc1: commit ff04b35af0e40956764596e3d032f786e5451238 Author: Jani Nikula <jani.nikula@intel.com> Date: Wed Aug 29 14:08:42 2012 +0300 drm/i915: only enable sdvo hotplug irq if needed |