Bug 10844
Summary: | [drm:i915_wait_irq] *ERROR* EBUSY after suspend | ||
---|---|---|---|
Product: | Drivers | Reporter: | Jie Luo (clotho67) |
Component: | Video(DRI - non Intel) | Assignee: | Jesse Barnes (jbarnes) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, airlied, akpm, bug-track, jbarnes, rjw, rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25.4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 9832 | ||
Attachments: |
dmesg
Restore pipestat config as well diff of reg_dump before and after suspend diff of reg_dump with and without patch Call pci_set_master after pci_enable_device during resume time dmesg_2.6.28_lex |
Description
Jie Luo
2008-05-31 22:15:13 UTC
Created attachment 16353 [details]
dmesg
Dave, do you think this regression is an ACPI thing or a DRM thing? Thanks. This happen even without DRM modules loaded and X started. Doing a suspend from console and then start X, OpenGL program will failed with a "[drm:i915_wait_irq] *ERROR* EBUSY...". (In reply to comment #3) > This happen even without DRM modules loaded and X started. I made a mistake. This is wrong. I find comment out following code in i915_suspend() make OpenGL program work after suspend to ram. if (state.event == PM_EVENT_SUSPEND) { /* Shut down the device */ pci_disable_device(dev->pdev); pci_set_power_state(dev->pdev, PCI_D3hot); } You still can see the "ACPI: EC: ..." in dmesg. [ 95.151000] ACPI: EC: missing OBF confirmation, don't expect it any longer. [ 95.154333] ACPI: EC: missing write data confirmation, don't expect it any longer. [ 95.223541] PM: Writing back config space on device 0000:00:02.0 at offset 1 (was 900003, writing 900007) [ 95.237760] ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 [ 95.237768] PCI: Setting latency timer of device 0000:00:02.0 to 64 [ 95.237784] PM: Writing back config space on device 0000:00:02.1 at offset 1 (was 900000, writing 900003) [ 95.241035] PM: Writing back config space on device 0000:00:1b.0 at offset 1 (was 100106, writing 100102) But this time seems it can "Writing back config space on device 0000:00:02.0 ...", so the IRQ problem gone. Hope this can help resolve this problem. Yeah, that code path has been a problem in the past. Hopefully this sort of thing will go away with Rafael's suspend/hibernate callback rework. In the meantime I can test whether we actually need the disable_device & set_power_state calls (btw did you try commenting out just one of those calls? if so, which one made the difference?). Thanks, Jesse (In reply to comment #5) > Yeah, that code path has been a problem in the past. Hopefully this sort of > thing will go away with Rafael's suspend/hibernate callback rework. In the > meantime I can test whether we actually need the disable_device & > set_power_state calls (btw did you try commenting out just one of those > calls? > if so, which one made the difference?). Just commenting out disable_device is enough. Thanks. Well, there's at least one bug in our restore code: we're not even bothering to restore the interrupt configuration! It may be that disable_device (which disables interrupts) ends up clearing our IER regs or something. If so, this patch may help: diff --git a/drivers/char/drm/i915_drv.c b/drivers/char/drm/i915_drv.c index e8f3d68..a4da696 100644 --- a/drivers/char/drm/i915_drv.c +++ b/drivers/char/drm/i915_drv.c @@ -499,6 +499,11 @@ static int i915_resume(struct drm_device *dev) /* FIXME: restore TV & SDVO state */ + /* Interrupt state */ + I915_WRITE(I915REG_INT_IDENTITY_R, dev_priv->saveIIR); + I915_WRITE(I915REG_INT_ENABLE_R, dev_priv->saveIER); + I915_WRITE(I915REG_INT_MASK_R, dev_priv->saveIMR); + /* FBC info */ I915_WRITE(FBC_CFB_BASE, dev_priv->saveFBC_CFB_BASE); I915_WRITE(FBC_LL_BASE, dev_priv->saveFBC_LL_BASE); (In reply to comment #7) > Well, there's at least one bug in our restore code: we're not even bothering > to restore the interrupt configuration! It seems this patch is not enough to fix my problem. I still need to comment out disable_device. As I have mentioned in my bug report, if I reload i915 module after suspend, it works even without comment out disable_device. So there may be some bugs in the disable_device code path as well. And it seems that the video device will be disabled correctly even without call from drm, is this call necessary? Yeah, something else in the disable_device path must be hosing things. The call may not be necessary, but I'd still like to figure out why it breaks things. Created attachment 16420 [details]
Restore pipestat config as well
Oops, missed a couple of regs in my last patch. Can you try this one out?
Thanks,
Jesse
Created attachment 16421 [details]
diff of reg_dump before and after suspend
Still need to comment out disable_device. I attached the difference of intel_reg_dump before and after suspend, and the difference of intel_reg_dump with and without this patch. Maybe I should enable PCI debug to see if there are more information.
Created attachment 16422 [details]
diff of reg_dump with and without patch
The trick of reload i915 kernel modules seems just make disable_device fail to disable IRQ 16, so it just has the same effect as comment out disable_device. The "ACPI: EC" is another problem, I didn't see any problem with it currently. It seams add pci_set_master(dev->pdev) after pci_enable_device make OpenGL works after resume at least. But pci_enable_device didn't enable video device immediately, it takes some time to enable the irq 16. Created attachment 16559 [details]
Call pci_set_master after pci_enable_device during resume time
I just verify this by doing a clean build.
When I compare the suspend code of hda-intel sound driver and the i915 drm module, I find this fix.
The hda-intel's suspend code also take care of many things which maybe also should be cared in i915 driver, such as MSI.
But I'm just a user, it's up to you to decide which can make i915's suspend code more robust.
Forget to mention, this bug also exits in current git kernel. I'm glade if it could be fixed before 2.6.26 release. Thanks. Thanks for testing, Jie, I'll see if I can push a fix upstream quickly. Thanks Jie, just pushed the fix to Linus. please reopen this bug. i have sam issue: [drm:i915_wait_irq] *ERROR* EBUSY -- rec: 0 emitted: 1 i use kernel 2.6.28-rc5-slub-00172-gee2f6cc x86_64 Created attachment 18950 [details]
dmesg_2.6.28_lex
Possibly yet another DUP of the problem fixed by http://lists.freedesktop.org/archives/intel-gfx/2008- November/000614.html? i can't apply this patch to the latest git. this issue is fixed with latest git |