Bug 4607 - Crashes after resume with Savage DRI
Summary: Crashes after resume with Savage DRI
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
: 7417 (view as bug list)
Depends on:
Blocks: 7216
  Show dependency tree
 
Reported: 2005-05-09 18:28 UTC by Johan Brannlund
Modified: 2011-04-18 21:16 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.23
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel messages (268.10 KB, text/plain)
2011-01-17 21:54 UTC, Tormod Volden
Details
Reenable with previously set mode on resume (1.42 KB, patch)
2011-04-16 21:45 UTC, Tormod Volden
Details | Diff

Description Johan Brannlund 2005-05-09 18:28:38 UTC
Distribution: Ubuntu Breezy
Hardware Environment: Acer Aspire 1304LC, Savage Twister VT8636A [ProSavage KN133]
Software Environment: xorg 6.8.2, any recent Savage DRI driver snapshot
from http://dri.freedesktop.org/snapshots/

Problem Description: When using DRI drivers for my Savage graphics card,
everything works fine until after a suspend/resume cycle (I use the suspend2
patches). After resuming, glxinfo still reports "direct rendering: Yes" but
running glxgears locks the system hard directly, the glxgears window doesn't
even open. According to Savage developer Felix K
Comment 1 Johan Brannlund 2005-07-04 19:09:52 UTC
Just tried 2.6.12 with the latest DRI snapshot, no change.
Comment 2 Dave Jones 2006-01-04 22:49:25 UTC
is this still an issue with 2.6.15 ?
(And please test with the in-kernel suspend implementation. If you want to debug
issues with suspend2, they have their own bugtracker at
http://bugzilla.suspend2.net/

If it is still repeatable under that situation, please attach output of lspci
and dmesg | grep -i agp
Comment 3 Johan Brannlund 2006-01-04 22:57:30 UTC
I got a new laptop, so I'm afraid I can't help you with this bug anymore.
Comment 4 Tormod Volden 2006-05-10 02:02:19 UTC
I am using 2.6.15 from Ubuntu (6.06 Beta 2), and I have the exact same problem.
There is some verbose drm output in my Ubuntu bug report:
https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/43007
Comment 5 Tormod Volden 2006-10-10 15:59:48 UTC
I still have DRI/drm crashing after hibernation (no suspend2) in Ubuntu Edgy Beta:
kernel 2.6.17-10.28
drm 1.0.1 20051102
savage 2.4.1 20050313
xserver-xorg-video-savage 1:2.1.1-0ubuntu3

If I try to run glxgears after hibernation, the machine locks up hard with this
last message:

Oct 10 02:42:14 viki kernel: [17192405.404000]
[drm:savage_bci_wait_event_shadow] *ERROR* failed!
Oct 10 02:42:14 viki kernel: [17192405.404000] [drm] status=0x00003d7e, e=0x3d88

https://launchpad.net/bugs/37218
Comment 6 Tormod Volden 2006-11-18 12:39:59 UTC
It seems like I can avoid the crashes with Option "BusType" "PCI" alone. Using
Option "DmaMode" "None" alone helps for glxgears, but crashes with other 3D
screensavers.
Comment 7 Tormod Volden 2007-03-26 14:47:30 UTC
I still get these crashes using Ubuntu 7.04 Beta with kernel 2.6.20 and Xorg
7.2. Can anyone suggest a way to debug this further?
Comment 8 Rafael J. Wysocki 2007-05-30 11:24:03 UTC
I think we have a general problem with DRI.  Bug #7417 seems to be related to
this one.
Comment 9 Tormod Volden 2007-05-30 14:48:43 UTC
It seems to be AGP in particular, from my experience. Many Savage cards needs
BusType PCI instead of AGP to not crash, and the same is true for some ATI agp
cards. It could be that the relevant parts the savage drivers are based on the
ati drivers, but it could also be a more general problem in drm or agp modules.
Examples:
https://bugs.launchpad.net/bugs/33617 (savage)
https://bugs.launchpad.net/bugs/114520 (ati)
(The last one is a PCI card, but w/comments from AGP users)

Note that in bug #7417 only Xorg crashes or goes bananas, and not the kernel, as
in my case. And I don't get problems if I shut down Xorg before hibernation
(it's just that it makes hibernation almost pointless).

Please tell if I can help with some debugging/testing.
Comment 10 Rafael J. Wysocki 2007-05-31 10:08:23 UTC
Out of couriosity, have you tried X without DRI?
Comment 11 Tormod Volden 2007-05-31 10:12:21 UTC
Sure! It is only a problem with DRI enabled. That goes for all these savage/ati
issues.
Comment 12 Rafael J. Wysocki 2007-06-04 09:39:20 UTC
*** Bug 7417 has been marked as a duplicate of this bug. ***
Comment 13 Rafael J. Wysocki 2007-10-15 03:43:48 UTC
Tormod, can you please confirm that the problem is still present in 2.6.23?
Comment 14 Tormod Volden 2007-10-15 13:23:20 UTC
Yes, same issue with 2.6.23.1, running on top of Ubuntu 7.10. Like before, after waking up from hibernation, as soon as I start glxgears it hangs. I can move the mouse pointer for a second, leaving a trace of mouse pointers on the screen. Then everything is stuck, no sysrq, only hard power-off is possible.
Comment 15 Erik Andr 2010-01-05 20:58:06 UTC
Is this still an issue with 2.6.32?
Comment 16 Rafael J. Wysocki 2010-12-29 23:49:07 UTC
Is the problem reproducible with 2.6.37-rc8?
Comment 17 Tormod Volden 2011-01-16 21:24:03 UTC
Tested with 2.6.37 (Ubuntu 2.6.37-12-generic on top of Ubuntu 10.10) and it works fine, both with AGP and PCI. Thanks!
Comment 18 Tormod Volden 2011-01-17 21:54:32 UTC
Created attachment 43882 [details]
kernel messages

That was too good to be true. I tested yesterday by using PCI mode after booting, sleep/resume - antspotlight - sleep/resume - antspotlight, then restarted X with AGP mode and repeated the testing. Which all seemed to work fine.

Today I booted with AGP mode, and then antspotlight caused page allocation failure, with this seen in strace (fd 4 is /dev/dri/card0):
[pid  2089] ioctl(4, 0x40246441, 0xbf9b5aac) = -1 ENOMEM (Cannot allocate memory)
[pid  2089] write(2, "cmdbuf ioctl returned -12\n", 26) = 26

After sleep/resume I got a solid lock-up when starting antspotlight.

Starting up with AGP mode, then restarting X with PCI mode, I can now sleep/resume again.

However I think there is an issue in the DDX driver. I have been investigating https://bugs.freedesktop.org/show_bug.cgi?id=32511 where mesa has been writing into the wrong framebuffer aperture. Running drawpix after resume, I can see that the back buffer aperture becomes wrong - it is now tiled (I mean it draws into small squares on the top of the screen instead of displaying the picture in the window). The front buffer is fine. So I guess there is something missing in the DDX that should set up the apertures again at resume. If for instance the command buffer also gets mismapped after resume I guess a lock-up is not far away.

So I think this kernel bug task can stay closed until we make sure the DDX does the right thing.
Comment 19 Tormod Volden 2011-04-16 20:04:52 UTC
I think I am closing in on this 6 year old bug, and that the problem is the VIA agp bridge. Hacking the DDX to call drmAgpEnable after resume (in EnterVT) seems to fix the lock-up issue.

I am a bit puzzled by this appearing on boot and resume:
agpgart-via 0000:00:00.0: putting AGP V2 device into 0x mode

although the drmAgpEnable is called with mode=1. Is the 0x expected? Can it be bridge_agpstat is read out from PCI_AGP_STATUS with the mode bits 0 and this falls through all the cracks in agp_collect_device_status and agp_v2_parse_one?

Should there be something in the agp driver (or via-agp) that sets up the mode correctly again on resume? I believe there is something special about this VIA bridge in that the pci_restore_state is not enough.
Comment 20 Tormod Volden 2011-04-16 21:45:19 UTC
Created attachment 54512 [details]
Reenable with previously set mode on resume

Would this (untested) patch make sense?
Comment 21 Tormod Volden 2011-04-16 21:56:15 UTC
BTW, with regards to the x0 above, drmAgpGetMode always returns 0x1f000207.
Comment 22 Tormod Volden 2011-04-18 21:16:27 UTC
I looked at other DDX's and the radeon DDX in fact calls drmAgpEnable in EnterVT after resume. So I will just go on and commit that to the savage DDX as well.

It would be appreciated though if someone can comment on whether this is how it is supposed to work, or if this is a workaround.

Note You need to log in before you can comment on or make changes to this bug.