Latest working kernel version: 2.6.24.7 Earliest failing kernel version: 2.6.25 Distribution: Gentoo Hardware Environment: ThinkPad X60 Software Environment: Problem Description: When upgrade to 2.6.25, I find that OpenGL program no longer work after suspend, and find this line in dmesg. [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 16309 emitted: 16310 This seems can be workaround by reload drm and i915 kernel modules after suspend. After doing this, OpenGl program does not have problem with suspending. I find the only difference between the first suspend and the suspend after reload drm kernel module are these lines in dmesg. first suspend: [ 47.879220] Switched to high resolution mode on CPU 1 [ 47.882557] ACPI: EC: missing OBF confirmation, don't expect it any longer. [ 47.885890] ACPI: EC: missing write data confirmation, don't expect it any longer. [ 47.969252] PM: Writing back config space on device 0000:00:02.1 at offset 1 (was 900000, writing 900003) [ 47.972596] PM: Writing back config space on device 0000:00:1b.0 at offset 1 (was 100106, writing 100102) second suspend (after reload drm and i915): [ 345.656760] Switched to high resolution mode on CPU 1 [ 345.726355] PM: Writing back config space on device 0000:00:02.0 at offset 1 (was 900003, writing 900007) [ 345.752645] ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 [ 345.752652] PCI: Setting latency timer of device 0000:00:02.0 to 64 [ 345.752669] PM: Writing back config space on device 0000:00:02.1 at offset 1 (was 900000, writing 900003) [ 345.765859] PM: Writing back config space on device 0000:00:1b.0 at offset 1 (was 100106, writing 100102) It seems some error happened when ACPI setting PCI Interrupt for device 0000:00:02.0, which is the graphic card. 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) Steps to reproduce: 1. Start with 2.6.25 kernel on a ThinkPad X60. 2. Suspend to ram.
Created attachment 16353 [details] dmesg
Dave, do you think this regression is an ACPI thing or a DRM thing? Thanks.
This happen even without DRM modules loaded and X started. Doing a suspend from console and then start X, OpenGL program will failed with a "[drm:i915_wait_irq] *ERROR* EBUSY...".
(In reply to comment #3) > This happen even without DRM modules loaded and X started. I made a mistake. This is wrong. I find comment out following code in i915_suspend() make OpenGL program work after suspend to ram. if (state.event == PM_EVENT_SUSPEND) { /* Shut down the device */ pci_disable_device(dev->pdev); pci_set_power_state(dev->pdev, PCI_D3hot); } You still can see the "ACPI: EC: ..." in dmesg. [ 95.151000] ACPI: EC: missing OBF confirmation, don't expect it any longer. [ 95.154333] ACPI: EC: missing write data confirmation, don't expect it any longer. [ 95.223541] PM: Writing back config space on device 0000:00:02.0 at offset 1 (was 900003, writing 900007) [ 95.237760] ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 [ 95.237768] PCI: Setting latency timer of device 0000:00:02.0 to 64 [ 95.237784] PM: Writing back config space on device 0000:00:02.1 at offset 1 (was 900000, writing 900003) [ 95.241035] PM: Writing back config space on device 0000:00:1b.0 at offset 1 (was 100106, writing 100102) But this time seems it can "Writing back config space on device 0000:00:02.0 ...", so the IRQ problem gone. Hope this can help resolve this problem.
Yeah, that code path has been a problem in the past. Hopefully this sort of thing will go away with Rafael's suspend/hibernate callback rework. In the meantime I can test whether we actually need the disable_device & set_power_state calls (btw did you try commenting out just one of those calls? if so, which one made the difference?). Thanks, Jesse
(In reply to comment #5) > Yeah, that code path has been a problem in the past. Hopefully this sort of > thing will go away with Rafael's suspend/hibernate callback rework. In the > meantime I can test whether we actually need the disable_device & > set_power_state calls (btw did you try commenting out just one of those > calls? > if so, which one made the difference?). Just commenting out disable_device is enough. Thanks.
Well, there's at least one bug in our restore code: we're not even bothering to restore the interrupt configuration! It may be that disable_device (which disables interrupts) ends up clearing our IER regs or something. If so, this patch may help: diff --git a/drivers/char/drm/i915_drv.c b/drivers/char/drm/i915_drv.c index e8f3d68..a4da696 100644 --- a/drivers/char/drm/i915_drv.c +++ b/drivers/char/drm/i915_drv.c @@ -499,6 +499,11 @@ static int i915_resume(struct drm_device *dev) /* FIXME: restore TV & SDVO state */ + /* Interrupt state */ + I915_WRITE(I915REG_INT_IDENTITY_R, dev_priv->saveIIR); + I915_WRITE(I915REG_INT_ENABLE_R, dev_priv->saveIER); + I915_WRITE(I915REG_INT_MASK_R, dev_priv->saveIMR); + /* FBC info */ I915_WRITE(FBC_CFB_BASE, dev_priv->saveFBC_CFB_BASE); I915_WRITE(FBC_LL_BASE, dev_priv->saveFBC_LL_BASE);
(In reply to comment #7) > Well, there's at least one bug in our restore code: we're not even bothering > to restore the interrupt configuration! It seems this patch is not enough to fix my problem. I still need to comment out disable_device. As I have mentioned in my bug report, if I reload i915 module after suspend, it works even without comment out disable_device. So there may be some bugs in the disable_device code path as well. And it seems that the video device will be disabled correctly even without call from drm, is this call necessary?
Yeah, something else in the disable_device path must be hosing things. The call may not be necessary, but I'd still like to figure out why it breaks things.
Created attachment 16420 [details] Restore pipestat config as well Oops, missed a couple of regs in my last patch. Can you try this one out? Thanks, Jesse
Created attachment 16421 [details] diff of reg_dump before and after suspend Still need to comment out disable_device. I attached the difference of intel_reg_dump before and after suspend, and the difference of intel_reg_dump with and without this patch. Maybe I should enable PCI debug to see if there are more information.
Created attachment 16422 [details] diff of reg_dump with and without patch
The trick of reload i915 kernel modules seems just make disable_device fail to disable IRQ 16, so it just has the same effect as comment out disable_device. The "ACPI: EC" is another problem, I didn't see any problem with it currently.
It seams add pci_set_master(dev->pdev) after pci_enable_device make OpenGL works after resume at least. But pci_enable_device didn't enable video device immediately, it takes some time to enable the irq 16.
Created attachment 16559 [details] Call pci_set_master after pci_enable_device during resume time I just verify this by doing a clean build. When I compare the suspend code of hda-intel sound driver and the i915 drm module, I find this fix. The hda-intel's suspend code also take care of many things which maybe also should be cared in i915 driver, such as MSI. But I'm just a user, it's up to you to decide which can make i915's suspend code more robust.
Forget to mention, this bug also exits in current git kernel. I'm glade if it could be fixed before 2.6.26 release. Thanks.
Thanks for testing, Jie, I'll see if I can push a fix upstream quickly.
Thanks Jie, just pushed the fix to Linus.
please reopen this bug. i have sam issue: [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 0 emitted: 1 i use kernel 2.6.28-rc5-slub-00172-gee2f6cc x86_64
Created attachment 18950 [details] dmesg_2.6.28_lex
Possibly yet another DUP of the problem fixed by http://lists.freedesktop.org/archives/intel-gfx/2008- November/000614.html?
i can't apply this patch to the latest git.
this issue is fixed with latest git