Created attachment 26667 [details] dmesg output shortly before suspending an after resume On my ThinkPad T40p notebook (Radoen rv250, AGP), resuming from suspend to ram is broken since the introduction of KMS into mainline. In newer kernel versions, it is possible to switch from the garbled screen to a console and reboot cleanly at least. In UMS, suspending and resuming works as supposed. See attached dmesg output.
Created attachment 26668 [details] output of lspci
I have this issue too but with rv280 chipset. This not happens with kernel 2.6.33, the regression is in the 2.6.34. KMS always active in both versions. lspci VGA line: 01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200] (rev 01) dmesg log: [ 965.493155] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(0). [ 965.493164] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB ! [ 965.493702] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(1). [ 965.493707] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB ! [ 965.993269] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(2). [ 965.993277] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB ! [ 965.993785] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(3). [ 965.993790] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB ! ..... [ 968.318575] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(15).
Does disabling AGP work any better? boot with radeon.agpmode=-1
Hi, this does work for me, tested on 2.6.31 and 2.6.34.
I think this should be fixed in: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=10b06122afcc78468bd1d009633cb71e528acdc5
If so, this bug and probably bug 16273 are dupes of bug 15969.
After applying this to 2.6.34, I get this after resume: Jun 24 18:57:23 T40p kernel: [ 110.028385] PM: resume of devices complete after 1578.173 msecs Jun 24 18:57:23 T40p kernel: [ 110.028598] Restarting tasks ... Jun 24 18:57:23 T40p kernel: [ 110.031390] BUG: unable to handle kernel paging request at f8f50000 Jun 24 18:57:23 T40p kernel: [ 110.031659] IP: [<f8aad0da>] radeon_cs_update_pages+0x15a/0x190 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.031955] *pde = 3654f067 *pte = 00000000 Jun 24 18:57:23 T40p kernel: [ 110.032008] Oops: 0002 [#1] PREEMPT Jun 24 18:57:23 T40p kernel: [ 110.032008] last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:00/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/uevent Jun 24 18:57:23 T40p kernel: [ 110.032008] Modules linked in: xts gf128mul bluetooth autofs4 nfsd lockd auth_rpcgss sunrpc snd_seq snd_usb_audio snd_hwdep snd_usb_lib snd_rawmidi snd_seq_device dm_crypt dm_mod acpi_cpufreq fuse radeon ttm snd_intel8x0 ath5k thinkpad_acpi drm_kms_helper snd_ac97_codec mac80211 drm ath ac97_bus hwmon snd_pcm cfg80211 snd_timer cfbcopyarea snd ehci_hcd cfbimgblt sr_mod rfkill soundcore uhci_hcd yenta_socket evdev led_class rtc cdrom pcmcia_core ac battery cfbfillrect snd_page_alloc sg nvram usbcore thermal processor button Jun 24 18:57:23 T40p kernel: [ 110.032008] Jun 24 18:57:23 T40p kernel: [ 110.032008] Pid: 1530, comm: X Tainted: G W 2.6.34 #3 2373G1G/2373G1G Jun 24 18:57:23 T40p kernel: [ 110.032008] EIP: 0060:[<f8aad0da>] EFLAGS: 00010202 CPU: 0 Jun 24 18:57:23 T40p kernel: [ 110.032008] EIP is at radeon_cs_update_pages+0x15a/0x190 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.032008] EAX: 00000000 EBX: 00000005 ECX: 000000bc EDX: f735481c Jun 24 18:57:23 T40p kernel: [ 110.032008] ESI: f63bb000 EDI: f8f50000 EBP: ee4b43c0 ESP: f6207d0c Jun 24 18:57:23 T40p kernel: [ 110.032008] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Jun 24 18:57:23 T40p kernel: [ 110.032008] Process X (pid: 1530, ti=f6206000 task=f64bb900 task.ti=f6206000) Jun 24 18:57:23 T40p kernel: [ 110.032008] Stack: Jun 24 18:57:23 T40p kernel: [ 110.032008] 00000000 00000001 000002f0 f6207da0 ee4b43c0 f6207de4 00000000 f8ab720d Jun 24 18:57:23 T40p kernel: [ 110.032008] <0> f73543f4 f8a97000 00000001 f8985a53 00000000 00000202 00000000 f6207de4 Jun 24 18:57:23 T40p kernel: [ 110.032008] <0> f63b8000 f7354000 f7355574 f8ab7555 00000001 f36bf814 f89868a4 00000001 Jun 24 18:57:23 T40p kernel: [ 110.032008] Call Trace: Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f8ab720d>] ? r100_cs_packet_parse+0x5d/0x1d0 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f8a97000>] ? radeon_bo_move+0x0/0x330 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f8985a53>] ? ttm_bo_handle_move_mem+0x263/0x330 [ttm] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f8ab7555>] ? r100_cs_parse+0x35/0x680 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f89868a4>] ? ttm_bo_move_buffer+0x114/0x140 [ttm] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f898695d>] ? ttm_bo_validate+0x8d/0x110 [ttm] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f8a982b5>] ? radeon_bo_list_validate+0x55/0x90 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f8aacea1>] ? radeon_cs_ioctl+0x111/0x1f0 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f83d589b>] ? drm_ioctl+0x14b/0x390 [drm] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f8aacd90>] ? radeon_cs_ioctl+0x0/0x1f0 [radeon] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c100a2fb>] ? restore_i387_fxsave+0x6b/0x80 Jun 24 18:57:23 T40p kernel: [ 110.032008] [<f83d5750>] ? drm_ioctl+0x0/0x390 [drm] Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c10a50db>] ? vfs_ioctl+0x2b/0xb0 Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c10a5959>] ? do_vfs_ioctl+0x79/0x600 Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c10438bd>] ? __remove_hrtimer+0x2d/0x90 Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c102ec74>] ? do_setitimer+0x154/0x1a0 Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c102ed09>] ? sys_setitimer+0x49/0xb0 Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c10a5f1d>] ? sys_ioctl+0x3d/0x70 Jun 24 18:57:23 T40p kernel: [ 110.032008] [<c1002c50>] ? sysenter_do_call+0x12/0x26 Jun 24 18:57:23 T40p kernel: [ 110.032008] Code: ff ff 89 44 24 08 e9 63 ff ff ff f7 c7 01 00 00 00 75 1f f7 c7 02 00 00 00 75 30 f7 c7 04 00 00 00 75 19 89 c1 83 e0 03 c1 e9 02 <f3> a5 e9 76 ff ff ff 0f b6 16 48 46 88 17 47 eb d7 8b 16 83 e8 Jun 24 18:57:23 T40p kernel: [ 110.032008] EIP: [<f8aad0da>] radeon_cs_update_pages+0x15a/0x190 [radeon] SS:ESP 0068:f6207d0c Jun 24 18:57:23 T40p kernel: [ 110.032008] CR2: 00000000f8f50000 Jun 24 18:57:23 T40p kernel: [ 110.032008] ---[ end trace 38fee5bfe123df75 ]--- Jun 24 18:57:23 T40p kernel: [ 110.070427] done. Jun 24 18:57:23 T40p kernel: [ 110.155076] [drm:drm_release] *ERROR* Device busy: 1 Kernel 2.6.35-rc3, which has those patches already applied resumes with black screen, can't do anything else except triggering SysRq.
On a T40p with RV250 I have the same issues here with 2.6.35-git12 Linux kernel: [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(0). [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB ! This happens after I force a suspend via: # /usr/sbin/pm-suspend - Sedat -
Created attachment 27429 [details] lspci output for 2.6.35-git12 kernel
Created attachment 27430 [details] dmesg for 2.6.35-git12 kernel
Created attachment 27431 [details] Output of 'LIBGL_VERBOSE=debug glxinfo 2>/dev/null > glxinfo.txt'
I tested again on 2.6.36-rc2-git4 plus pulled in drm-fixes from Dave's drm-2.6 tree. The problem still remains. Booting with "radeon.agpmode=-1" make a 'pm-suspend' sucessful. I will add the video-bios dump if this helps.
Created attachment 28161 [details] Video-BIOS dump of RV250 in a IBM T40p
How to dump video-bios (here on 2.6.36 upstream-kernel)? # lspci | grep "VGA compatible controller" 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 [Mobility FireGL 9000] (rev 02) # find /sys/ -name rom | grep 01:00.0 /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom # echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom # cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom > /tmp/vbios_rv250.bin # echo 0 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/rom
I made a debug-try with 2.6.36-rc4-git3 + (radeon) backlight-type patches [1.2]: Booted with radeon.modeset=1 and drm.debug=15. Logs attached. - Sedat - [1] https://patchwork.kernel.org/patch/163971/ [2] https://patchwork.kernel.org/patch/182352/
Created attachment 30322 [details] 2.6.36-rc4-git3 kern.log (pm-suspend + pm-resume)
Created attachment 30332 [details] 2.6.36-rc4-git3 debug (pm-suspend + pm-resume)
Created attachment 30342 [details] 2.6.36-rc4-git3 syslog (pm-suspend + pm-resume)
Comment on attachment 30332 [details] 2.6.36-rc4-git3 debug (pm-suspend + pm-resume) bit late over here... turns out this was �7zXZ�...
You can't read? It was done with xz-utils from official Debian/sid. Shall I attach in another archive-format?
Created attachment 32762 [details] debug patch to see where we fail to set up the ring You have this in you kern.log file: [drm] radeon: ring at 0x00000000D0000000 which doesnt't look so good. From there on it goes bad: Sep 17 13:12:12 tbox kernel: [ 381.304184] [drm:r100_ring_test] *ERROR* radeon: ring test failed (sracth(0x15E4)=0xCAFEDEAD) Sep 17 13:12:12 tbox kernel: [ 381.304187] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22). Sep 17 13:12:12 tbox kernel: [ 381.304191] radeon 0000:01:00.0: failled initializing CP (-22). which explains the "[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(0)." message. cp.ready is false, due to r100_cp_init failing. Let's see if we can narrow down what fails. Can you apply this patch and post the resulting kern.log?
(In reply to comment #20) > You can't read? I wondered that myself...
Thanks Florian for taking care of this BR. I applied your patch from above against 2.6.36-rc7 and will attach the logs. Here the legend to my logs (messages, kern.log and debug): 1: 2010-10-08 10:35: boot-up into runlevel-3 2: 2010-10-08 10:36: startx 3: 2010-10-08 10:37: pm-suspend 4. 2010-10-08 10:38: power-on/pm-resume 5. 2010-10-08 10:39: poweroff EXAMPLE: So "3_debug.txt" covers all logs written to /var/log/debug after running "pm-suspend" command. I hope this helps narrowing down the problem.
Created attachment 32812 [details] Tarball of logs for Florian Mickler
Created attachment 32822 [details] Checksum for tarball of logs for Florian Mickler
Created attachment 32852 [details] Some more debugging output in r100_cp_init() + Fix typos in r100_ring_test() With attached patch, I get now the below warning, looks more like an ACPI issue? - Sedat - [ /var/log/kern.log ] ... Oct 8 13:38:09 tbox kernel: [ 94.308199] [drm:r100_ring_test] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD) Oct 8 13:38:09 tbox kernel: [ 94.308202] ------------[ cut here ]------------ Oct 8 13:38:09 tbox kernel: [ 94.308236] WARNING: at /home/sd/src/linux-2.6/linux-2.6.36-rc7/debian/build/source_i386_none/drivers/gpu/drm/radeon/r100.c:1028 r100_cp_init+0x5b6/0x5db [radeon]() Oct 8 13:38:09 tbox kernel: [ 94.308240] Hardware name: 2374SG6 Oct 8 13:38:09 tbox kernel: [ 94.308242] Modules linked in: sco bnep rfcomm l2cap bluetooth aes_i586 acpi_cpufreq mperf aes_generic cpufreq_stats cpufreq_userspace cpufreq_conservative ppdev cpufreq_powersave lp dm_crypt binfmt_misc ext4 snd_intel8x0m snd_intel8x0 snd_ac97_codec jbd2 ac97_bus crc16 snd_pcm_oss radeon snd_mixer_oss thinkpad_acpi snd_pcm arc4 snd_seq_midi ecb ath5k snd_rawmidi snd_seq_midi_event mac80211 ttm snd_seq ath drm_kms_helper pcmcia drm cfg80211 i2c_algo_bit rfkill i2c_i801 nsc_ircc yenta_socket i2c_core pcmcia_rsrc snd_timer snd_seq_device joydev pcmcia_core tpm_tis irda shpchp snd tpm parport_pc crc_ccitt snd_page_alloc pci_hotplug soundcore serio_raw led_class psmouse tpm_bios video nvram processor battery ac parport rng_core pcspkr output button evdev fuse autofs4 ext3 jbd mbcache dm_mod usbhid hid sg sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix libata uhci_hcd ehci_hcd usbcore scsi_mod thermal e1000 floppy thermal_sys nls_base [last unloaded: scsi_wait_scan] Oct 8 13:38:09 tbox kernel: [ 94.308314] Pid: 2052, comm: kworker/u:5 Not tainted 2.6.36-rc7-686 #1 Oct 8 13:38:09 tbox kernel: [ 94.308317] Call Trace: Oct 8 13:38:09 tbox kernel: [ 94.308327] [<c102eff1>] ? warn_slowpath_common+0x6a/0x7b Oct 8 13:38:09 tbox kernel: [ 94.308345] [<f8f5aceb>] ? r100_cp_init+0x5b6/0x5db [radeon] Oct 8 13:38:09 tbox kernel: [ 94.308349] [<c102f00f>] ? warn_slowpath_null+0xd/0x10 Oct 8 13:38:09 tbox kernel: [ 94.308369] [<f8f5aceb>] ? r100_cp_init+0x5b6/0x5db [radeon] Oct 8 13:38:09 tbox kernel: [ 94.308388] [<f8f5ba4b>] ? r100_startup+0x1f8/0x246 [radeon] Oct 8 13:38:09 tbox kernel: [ 94.308403] [<f8f33378>] ? radeon_resume_kms+0x77/0xe6 [radeon] Oct 8 13:38:09 tbox kernel: [ 94.308407] [<c114bcf5>] ? pci_legacy_resume+0x23/0x2c Oct 8 13:38:09 tbox kernel: [ 94.308411] [<c114bdab>] ? pci_pm_resume+0x0/0x60 Oct 8 13:38:09 tbox kernel: [ 94.308418] [<c11c15b5>] ? pm_op+0x8f/0x13f Oct 8 13:38:09 tbox kernel: [ 94.308422] [<c11c19aa>] ? device_resume+0x3a/0xb3 Oct 8 13:38:09 tbox kernel: [ 94.308425] [<c11c1cfd>] ? async_resume+0x13/0x33 Oct 8 13:38:09 tbox kernel: [ 94.308430] [<c1048df9>] ? async_run_entry_fn+0x8b/0x121 Oct 8 13:38:09 tbox kernel: [ 94.308436] [<c103fe46>] ? process_one_work+0x181/0x25e Oct 8 13:38:09 tbox kernel: [ 94.308440] [<c1048d6e>] ? async_run_entry_fn+0x0/0x121 Oct 8 13:38:09 tbox kernel: [ 94.308444] [<c1041319>] ? worker_thread+0xf3/0x1ed Oct 8 13:38:09 tbox kernel: [ 94.308448] [<c1041226>] ? worker_thread+0x0/0x1ed Oct 8 13:38:09 tbox kernel: [ 94.308452] [<c1043926>] ? kthread+0x63/0x68 Oct 8 13:38:09 tbox kernel: [ 94.308455] [<c10438c3>] ? kthread+0x0/0x68 Oct 8 13:38:09 tbox kernel: [ 94.308461] [<c100357e>] ? kernel_thread_helper+0x6/0x10 Oct 8 13:38:09 tbox kernel: [ 94.308464] ---[ end trace a248808f0af92caf ]--- Oct 8 13:38:09 tbox kernel: [ 94.308467] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22). Oct 8 13:38:09 tbox kernel: [ 94.308470] radeon 0000:01:00.0: failled initializing CP (-22).
Created attachment 32862 [details] kern.log.xz
Created attachment 32872 [details] debug.xz
Created attachment 32882 [details] messages.xz
No,that's one of the warnings I put in place with the debug patch. It is definitely the radeon driver which is failing. Maybe because of some other failure in the system but I guess there is a bug somewhere in the suspend or resume phase of the radeon driver. I'm not really familiar with the radeon driver, but will look into it for some obvious or easy to spot errors.
Oh, and please dont fix those typos! They are the only way to distinguish which code path is actually running... you just threw me off there for a while wondering why it's not the typoed scratch message...
(Patch see https://bugzilla.kernel.org/show_bug.cgi?id=16140#c26) With applying my patch from above, it's this section (Line #1028 and following) from r100_cp_init() doing the problem: 940 int r100_cp_init(struct radeon_device *rdev, unsigned ring_size) ... 1026 radeon_ring_start(rdev); 1027 r = radeon_ring_test(rdev); 1028 if (r) { 1029 DRM_ERROR("radeon: cp isn't working (%d).\n", r); 1030 return r; 1031 } 1032 rdev->cp.ready = true; 1033 return 0; 1034 } ... Replacing "if (r) {" with "if (WARN_ON(r)) {" shows the above Call-trace. I looked into r600.c source-code and put "rdev->cp.ready = true;" before Line "r = radeon_ring_test(rdev);", not helping. Again inspired from r600.c, I put Line #966 "r100_cp_load_microcode(rdev);" after "r = radeon_ring_init(rdev, ring_size);", this resulted in a not-so-garbled screen, after hanging: pm-resume in X -> switching to vt-1 -> killing X -> restarting startx This is doing no harm, see my logs. - DRM_ERROR("radeon: ring test failed (sracth(0x%04X)=0x%08X)\n", + DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n", I am not sure what you mean with "radeon driver": the one in the kernel or the DDX (xf86-video-ati). One NOTE: In Line #3728 there is a commented "r100_gpu_init(rdev);", it is nowhere "defined". I see in r600.c a *_gpu_init() and a *_cp_start() in case of resuming. Just a hint, if you wanna compare or dig into it. IIRC it would make sense to interprete correctly the Call-trace, I am not that familiar with "the internals". Sorry, I don't wanna experiment with older Linux-Kernels as my graphics driver stack is mostly latest-stable or from GIT, not sure what will happen. [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/gpu/drm/radeon/r100.c;h=e151f16a8f86d73090ec6a4eb17a3590661868db;hb=HEAD#l1028 [2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/gpu/drm/radeon/r100.c;h=e151f16a8f86d73090ec6a4eb17a3590661868db;hb=HEAD#l966 [3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/gpu/drm/radeon/r100.c;h=e151f16a8f86d73090ec6a4eb17a3590661868db;hb=HEAD#l3728
Hi! I did find a rv280 card. On that card, the screen is garbled after resume, but the ring test doesn't fail. It is using the same code-paths as far as I see. So we can probably conclude: 1. garbled screen and the ring setup failure are independent failures. 2. the ring setup failure is something specific to your card / or chipset. Do you see differences in lspci -vv output before and after suspend? (In reply to comment #32) > (Patch see https://bugzilla.kernel.org/show_bug.cgi?id=16140#c26) > > With applying my patch from above, it's this section (Line #1028 and > following) > from r100_cp_init() doing the problem: > > 940 int r100_cp_init(struct radeon_device *rdev, unsigned ring_size) > ... > 1026 radeon_ring_start(rdev); > 1027 r = radeon_ring_test(rdev); > 1028 if (r) { > 1029 DRM_ERROR("radeon: cp isn't working (%d).\n", r); > 1030 return r; > 1031 } > 1032 rdev->cp.ready = true; > 1033 return 0; > 1034 } > ... > > Replacing "if (r) {" with "if (WARN_ON(r)) {" shows the above Call-trace. Yes. This is also seen by the "radeon: cp isn't working (-22)." Line in your dmesg. But of course the callstack is handy to verify we are looking at the right code. I didn't put a WARN there, because we already knew it failed. I wondered if some tests without error-messages failed and put the WARN's there. But in retrospect we would have seen that, because the above error message would have not been preceded by the ring-test error message. > I looked into r600.c source-code and put "rdev->cp.ready = true;" before Line > "r = radeon_ring_test(rdev);", not helping. If you are interested how the driver works, have a look at http://www.botchco.com/agd5f/?p=50 The "ring" is a buffer where the driver writes commands and the gpu reads those commands and executes them. It's a ring buffer. http://en.wikipedia.org/wiki/Circular_buffer If you set cp.ready and the hardware isn't really ready, that won't help. The ring test works like so: The driver writes a value (0xCAFEDEAD) into the scratch-register and instructs the gpu via the ringbuffer to overwrite it with "0xDEADBEEF". Then the driver check's if the gpu does it. And if after N udelays(1) the gpu did not write the expected value into that register, the test fails. But of course, we are left to wonder as to why. > Again inspired from r600.c, I put Line #966 "r100_cp_load_microcode(rdev);" > after "r = radeon_ring_init(rdev, ring_size);", this resulted in a > not-so-garbled screen, after hanging: > pm-resume in X -> switching to vt-1 -> killing X -> restarting startx That's interesting. Can you elaborate on the hanging? > > This is doing no harm, see my logs. > - DRM_ERROR("radeon: ring test failed (sracth(0x%04X)=0x%08X)\n", > + DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n", True, but it is inconvenient. If you 'grep -r' on that error message you only get the r100 one. With the typo corrected, you get both, the r100 and the r600 one. I agree, not a big deal, but... > I am not sure what you mean with "radeon driver": the one in the kernel or > the > DDX (xf86-video-ati). Always the kernel one, at the moment. > > One NOTE: > In Line #3728 there is a commented "r100_gpu_init(rdev);", it is nowhere > "defined". I see in r600.c a *_gpu_init() and a *_cp_start() in case of > resuming. Just a hint, if you wanna compare or dig into it. > Yes. I wondered about that too. 'git-blame' shows it is a left over from: commit 90aca4d2740255bd130ea71a91530b9920c70abe Author: Jerome Glisse <jglisse@redhat.com> Date: Tue Mar 9 14:45:12 2010 +0000 drm/radeon/kms: simplify & improve GPU reset V2 ... > IIRC it would make sense to interprete correctly the Call-trace, I am not > that > familiar with "the internals". The call-trace is not complicated. The topmost function is the function that is currently executing. The second entry is the function it will return to. The third function is the function the second function will return to. and so on. see: http://en.wikipedia.org/wiki/Call_stack I don't know about the item 1 to 3 in that trace. But I guess they are just artifacts of the WARN_ON macro. If you look into the code, you see that the call trace is to be expected. What has to be considered bad, is that the ring-test fails because the gpu doesn't process the ringbuffer in time. In comment #12 you said, that turning off agp would fix the suspend issue? Which one was that? The ring-test error message, or the garbled screen or both? In my setup (rv280) it only worked once out of ten times. First time, it came back without garbled screen, but all subsequent suspend/resumes did garble the screen. On that screen garble I have a few thoughts. It is somewhat periodic and always follows a pattern for me. I can clear the corruption by changing consoles for example. Then it always scribbles in a predetermined pattern on the framebuffer where it stays (overwriting itself with a high frequency), till I change consoles. Same for you?
Sorry, but I don't think I will follow this BR for a while. Currently, busy with other stuff.
I have a RV250 (T40p, same hardware than above) and I have been experiencing the same problems as Sedat for several months (on resume, garbled screen in X AND the ring setup failure). I still can't resume properly on 2.6.32-rc7. To follow up on the previous message: turning off AGP solve both problems, completely. Can I do something to help?
I have the same problem, since several months. I disabled KMS with radeon.modeset=0, and all worked fine, until today. But the last upgrades (I use archlinux) make it not working now. First, X didn't start, so I enabled KMS, but I lost the suspend-to-ram (black screen at resume). I tried radeon.agpmode=-1, but it doesn't work. What can I do ? I really need s2ram for my laptop (too old, booting takes several minutes !) Thanks for your help. Cactus. System Informations : kernel 2.6.36.1-3, xf86-video-ati 6.13.2-2, ati-dri 7.9-1, libgl 7.9-1, mesa 7.9-1. Older versions (no problem with radeon.modeset=0): kernel 2.6.35.8-1, xf86-video-ati 6.13.2-1, ati-dri 7.8.2-3, libgl 7.8.2-3, mesa 7.8.2-3
I've followed gentoo documentation to enable standby on my workstation. When I run the command "hibernate-ram" in the console, system turns off but sytem freeze on resume from Suspend/Standby/Sleep. (monitor show a vertical rainbow) + (alt + sysrq + b cannot reboot the system). hardware:dual xeon motherboard with only one 5540 cpu + very old ps2 mouse and keyboard + old x300SE (RV370) ati vga kernel: gentoo-sources (2.6.34 r12) + original kernel drivers. When I've reconfigured the kernel to do not use ati drivers (I used vesa drivers instead), suspend work without problem. Now I'm sure there is a bug in the radeon drivers. on my next experiment I've configured the kernel to use only ati driver under direct rendering manager (ie I've removed the ati framebuffer driver) -> this time system suspends and on resume there is color page with a blinking cursor but there is no working console. alt + sysrq + b can reboot my box.
Michael do you still have this issue with more recent kernel ? Others people having issue, please open your own bug, before try the lastest kernel to check if the issue is resolved for you. Note also that kernel framebuffer driver is outdated and shouldn't be used, we only actively support KMS for radeon.
I have tested with linux-next (next-20110307): 1. KMS enabled = NOPE 2. KMS enabled + radeon.agpmode=-1 = OK (dmesg: [drm] Forcing AGP to PCI mode) My Xorg stack changed to libdrm-2.4.23, mesa-7.10.1, ddx-1:6.14.0 and xserver-1.10-rc3. Killing xserver from runlevel-3 several times, does not help, the GPU seems to hang. Still noone told me after attaching so much informations to this BR how to force a GPU reset in rl-3. Can you please enlighten me and others? - Sedat -
Just in case it mattersm here my PM userspace: # dpkg -l | egrep -i 'acpid|pm-utils' ii acpid 1:2.0.8-2 Advanced Configuration and Power Interface event daemon ii pm-utils 1.4.1-6 utilities and scripts for power management I still do suspend via pm-suspend (pm-utils) command. - Sedat -
Jérôme, unfortunately I can only confirm what Sedat reported previously. I was testing it with the most recent git-kernel.
Created attachment 50332 [details] Most recent resume trace
This same bug is also reported against Fedora; https://bugzilla.redhat.com/show_bug.cgi?id=531825 There it was determined that it is not necessary to completely disable AGP, but that dropping from AGP 4X to AGP 1X is sufficient to let the system resume. In other words, boot with: radeon.agpmode=1 There is though a serious performance penalty with doing so with certain applications (games) that copy around large textures.
I'm still seeing this on Ubuntu 11.04 with 2.6.38-11-generic. Any hope that this might be fixed?
No fix in sight, but there is a workaround: set a primary password in the BIOS. The BIOS will initialize the video card on wake-up and allow Linux to resume normally. Inconvenient? Yes, but it's better than bottlenecking your already out-of-date graphics card. I posted this workaround on the Canonical Launchpad bug report some time ago; I guess it hasn't been shared outside as of yet.
(In reply to comment #45) > I posted this workaround on the Canonical Launchpad > bug report some time ago; I guess it hasn't been shared outside as of yet. Could you add a link to that bug report?
(In reply to comment #45) > No fix in sight, but there is a workaround: set a primary password in the > BIOS. > The BIOS will initialize the video card on wake-up and allow Linux to resume > normally. 0) This wasn't on a ThinkPad, was it? Because on a ThinkPad T41 that triggers issue there's no "primary" BIOS password. Fiddling with other BIOS passwords doesn't seem to help: they're not even asked on resume. 1) Could you please provide some further details?
The trick mentioned in comment #45 applies to Dell Latitude D600 model. Here is the link to Canonical's bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/559163 Paul Bolle said correctly that this trick depends on primary BIOS password being asked upon resuming. It did not work for me on a Dell Inspiron 600m (which is a "consumer" sister model of D600). FWIW here are some related bug pages on Ubuntu/Launchpad: "resume broken on ATI radeon RV250" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/557224 (affecting Thinkpad T41) "[Dell Computer Corporation Inspiron 600m] suspend/resume failure" https://bugs.launchpad.net/linux/+bug/471872 (affecting my Dell Inspiron 600m computer)
Just FYI: There is now an upstream fix (see [1]). commit 45171002b01b2e2ec4f991eca81ffd8430fd0aec "radeon: add AGPMode 1 quirk for RV250" http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=45171002b01b2e2ec4f991eca81ffd8430fd0aec - Sedat -