Bug 11891
Summary: | resume from disk broken on hp/compaq nx7000 (DRM problem) | ||
---|---|---|---|
Product: | Power Management | Reporter: | Markus Meier (maekke) |
Component: | Hibernation/Suspend | Assignee: | Jesse Barnes (jbarnes) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | 1i5t5.duncan, airlied, jbarnes, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 11808 | ||
Attachments: |
.config
dmesg-2.6.28-rc2 map registers at load time |
Description
Markus Meier
2008-10-29 14:42:26 UTC
Created attachment 18499 [details]
.config
kernel config for 2.6.28-rc2
Please try with the fixes from bug #11827 and bug #11845. the patches do not help. the last output lines are: PM: Loading image data pages (63781 pages) ... done PM: Read 255124 kbytes in 10.05 seconds (25.38 MB/s) Suspending console(s) (use no_console_suspend to debug) _ Created attachment 18505 [details]
dmesg-2.6.28-rc2
dmesg output
Please boot with 'init=/bin/bash', run 'mount /sys && mount /proc && echo mem > /sys/power/state' and see what happens. Freezing user space processes ... (elapsed 0.00 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done Suspending console(s) (use no_console_suspend to debug) and it freezes (caps-lock blink). does not happen with 2.6.27.3 (although it doesn't resume there - but that's not a regression.) bisected, hopefully this is correct... $ git bisect log git bisect start # good: [3fa8749e584b55f1180411ab1b51117190bac1e5] Linux 2.6.27 git bisect good 3fa8749e584b55f1180411ab1b51117190bac1e5 # bad: [57f8f7b60db6f1ed2c6918ab9230c4623a9dbe37] Linux 2.6.28-rc1 git bisect bad 57f8f7b60db6f1ed2c6918ab9230c4623a9dbe37 # good: [cf2fa66055d718ae13e62451bb546505f63906a2] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 git bisect good cf2fa66055d718ae13e62451bb546505f63906a2 # bad: [1d8cca44b6a244b7e378546d719041819049a0f9] byteorder: provide swabb.h generically in asm/byteorder.h git bisect bad 1d8cca44b6a244b7e378546d719041819049a0f9 # skip: [cb23832e3987a02428a274c8f259336f706b17e9] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 git bisect skip cb23832e3987a02428a274c8f259336f706b17e9 # good: [65ae24b1811650f2bc5b0b85ea8b0bff6b5bf4a9] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid git bisect good 65ae24b1811650f2bc5b0b85ea8b0bff6b5bf4a9 # good: [8eb88c80d444fd249edaa7d895666cde79e7b3b8] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 git bisect good 8eb88c80d444fd249edaa7d895666cde79e7b3b8 # bad: [f7ea4a4ba84f382e8eb143e435551de0feee5b4b] Merge branch 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 git bisect bad f7ea4a4ba84f382e8eb143e435551de0feee5b4b # bad: [9e0b97e37fddaf5419d8af24362015ab684eff7e] drm: make CONFIG_DRM depend on CONFIG_SHMEM. git bisect bad 9e0b97e37fddaf5419d8af24362015ab684eff7e # good: [bdbf0ac7e187b2b757216e653e64f8b808b9077e] Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6 git bisect good bdbf0ac7e187b2b757216e653e64f8b808b9077e # good: [398c9cb20b5c6c5d1313912b937d653a46fec578] i915: Initialize hardware status page at device load when possible. git bisect good 398c9cb20b5c6c5d1313912b937d653a46fec578 # bad: [4f481ed22ec0d412336a13dc4477f6d0f3688882] drm: Avoid oops in GEM execbuffers with bad arguments. git bisect bad 4f481ed22ec0d412336a13dc4477f6d0f3688882 # bad: [0a3e67a4caac273a3bfc4ced3da364830b1ab241] drm: Rework vblank-wait handling to allow interrupt reduction. git bisect bad 0a3e67a4caac273a3bfc4ced3da364830b1ab241 # good: [6b79d521e07aae155303a992245abb539974dbaa] radeon: fix writeback across suspend/resume. git bisect good 6b79d521e07aae155303a992245abb539974dbaa # good: [b9bfdfe6703eb089839d48316a79c84924a3c335] new chip name is GM45 git bisect good b9bfdfe6703eb089839d48316a79c84924a3c335 # good: [2df68b439fcb97a4c55f81516206ef4ee325e28d] drm/cred: wrap task credential accesses in the drm driver. git bisect good 2df68b439fcb97a4c55f81516206ef4ee325e28d Hm, do I understand correctly that commit 0a3e67a4caac273a3bfc4ced3da364830b1ab241 "drm: Rework vblank-wait handling to allow interrupt reduction" is the first bad one? If this is correct, have you tried to revert this patch alone and see if that works? yep, reverting the commit made it work. I also tried a vanilla-2.6.28-rc2 with CONFIG_DRM=n, which also worked. Thanks for verifying this. Caused by: commit 0a3e67a4caac273a3bfc4ced3da364830b1ab241 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Tue Sep 30 12:14:26 2008 -0700 drm: Rework vblank-wait handling to allow interrupt reduction. Co-author: Michel Dänzer <michel@tungstengraphics.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@redhat.com> Looks like this is actually a radeon related issue (according to the lspci & dmesg radeon would be in use here). And according to comment #6 it might be suspend/resume related (maybe radeon is assuming some of the vblank structs are set up at suspend time?). It looks like the radeon IRQ handler could call drm_handle_vblank before calling drm_vblank_init, depending on the state of the vblank interrupt bits. This patch should catch that situation and give you a backtrace if it happens, in which case we'll have to fix the radeon driver to be more careful. diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 212a94f..cab7e7d 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -694,6 +694,8 @@ static void drm_vbl_send_signals(struct drm_device *dev, int */ void drm_handle_vblank(struct drm_device *dev, int crtc) { + BUG_ON(!dev->num_crtcs); + atomic_inc(&dev->_vblank_count[crtc]); DRM_WAKEUP(&dev->vbl_queue[crtc]); drm_vbl_send_signals(dev, crtc); Markus, can you test this patch, please? with the patch applied: # echo mem > /sys/power/state in init=/bin/bash results in http://dev.gentoo.org/~maekke/img_0146.jpg resuming after # echo disk > /sys/power/state (X running) results in http://dev.gentoo.org/~maekke/img_0148.jpg hope this helps. Created attachment 18644 [details]
map registers at load time
Looks like the registers aren't mapped when you go to do your suspend. Does this patch at least get things working when you do the suspend/resume from the console?
yes, with this patch, I'm able to suspend/resume from console. Does it fix hibernation too? yes, hibernation is fixed, too. Great, thanks for testing. I've submitted the patch to Dave for inclusion into 2.6.28. I'm sorry, I misunderstood you, the issue is not yet fixed. with your patch, I'm unable to start X. Here's the syslog part, after `startx`: Nov 5 23:01:13 schleppi agpgart-intel 0000:00:00.0: AGP 2.0 bridge Nov 5 23:01:13 schleppi agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode Nov 5 23:01:13 schleppi radeonfb 0000:01:00.0: putting AGP V2 device into 4x mode Nov 5 23:01:14 schleppi BUG: unable to handle kernel NULL pointer dereference at 00000010 Nov 5 23:01:14 schleppi IP: [<c02989b8>] radeon_read_fb_location+0x98/0xa7 Nov 5 23:01:14 schleppi *pde = 35899067 *pte = 00000000 Nov 5 23:01:14 schleppi Oops: 0000 [#1] PREEMPT Nov 5 23:01:14 schleppi last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/enable Nov 5 23:01:14 schleppi Modules linked in: michael_mic arc4 ecb ieee80211_crypt_tkip ipv6 rfcomm bnep l2cap snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device bluetooth firewire _ohci firewire_core crc_itu_t usbhid snd_intel8x0 snd_intel8x0m ohci1394 yenta_socket ieee1394 rsrc_nonstatic rtc snd_ac97_codec ac97_bus i2c_i801 pcmcia_core ipw2100 snd_pcm ieee80211 ehci_hcd snd_timer uhc i_hcd 8139cp snd snd_page_alloc ieee80211_crypt thermal processor button video battery ac evdev [last unloaded: hci_usb] Nov 5 23:01:14 schleppi Nov 5 23:01:14 schleppi Pid: 3767, comm: X Not tainted (2.6.28-rc2 #6) HP compaq nx7000 (DG706A#UUZ) Nov 5 23:01:14 schleppi EIP: 0060:[<c02989b8>] EFLAGS: 00213283 CPU: 0 Nov 5 23:01:14 schleppi EIP is at radeon_read_fb_location+0x98/0xa7 Nov 5 23:01:14 schleppi EAX: 00000000 EBX: f71d8800 ECX: f71d8800 EDX: 00000006 Nov 5 23:01:14 schleppi ESI: f712f000 EDI: f628dcc0 EBP: f58b5ecc ESP: f58b5ec8 Nov 5 23:01:14 schleppi DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Nov 5 23:01:14 schleppi Process X (pid: 3767, ti=f58b4000 task=f58be720 task.ti=f58b4000) Nov 5 23:01:14 schleppi Stack: Nov 5 23:01:14 schleppi f71d8800 f58b5ee8 c029b05b c029255e f712f124 fffffff4 f628dcc0 40546440 Nov 5 23:01:14 schleppi f58b5f0c c0292590 f614d380 f712f000 c029ab82 f712f074 c0508ce4 f62bce40 Nov 5 23:01:14 schleppi bff3e04c f58b5f28 c0176801 bff3e04c 40546440 f6b50af8 fffffff7 40546440 Nov 5 23:01:14 schleppi Call Trace: Nov 5 23:01:14 schleppi [<c029b05b>] ? radeon_cp_init+0x4d9/0x946 Nov 5 23:01:14 schleppi [<c029255e>] ? drm_ioctl+0x17e/0x224 Nov 5 23:01:14 schleppi [<c0292590>] ? drm_ioctl+0x1b0/0x224 Nov 5 23:01:14 schleppi [<c029ab82>] ? radeon_cp_init+0x0/0x946 Nov 5 23:01:14 schleppi [<c0176801>] ? vfs_ioctl+0x50/0x69 Nov 5 23:01:14 schleppi [<c0176be0>] ? do_vfs_ioctl+0x3c6/0x3f7 Nov 5 23:01:14 schleppi [<c016ceb9>] ? vfs_write+0xf0/0x12c Nov 5 23:01:14 schleppi [<c0176c3d>] ? sys_ioctl+0x2c/0x45 Nov 5 23:01:14 schleppi [<c0102ead>] ? sysenter_do_call+0x12/0x30 Nov 5 23:01:14 schleppi Code: 8b 40 10 ba 04 00 7f 00 83 c0 70 89 10 8b 81 e4 00 00 00 8b 40 10 83 c0 74 8b 18 8b 81 e4 00 00 00 31 d2 8b 40 10 83 c0 70 eb c6 <8b> 40 10 05 48 01 00 00 8b 18 89 d8 5b 5d c3 55 89 c1 89 e5 53 Nov 5 23:01:14 schleppi EIP: [<c02989b8>] radeon_read_fb_location+0x98/0xa7 SS:ESP 0068:f58b5ec8 Nov 5 23:01:14 schleppi ---[ end trace a4c2035e18842d92 ]--- Nov 5 23:01:14 schleppi [drm:drm_release] *ERROR* Device busy: 1 0 Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org> First-Bad-Commit : 0a3e67a4caac273a3bfc4ced3da364830b1ab241 Weird, so that crash looks like the first register read after cp_init... But the registers should already be mapped. I must be missing something about how the DRM & 2D drivers interact. Any ideas Dave? please try the fix I've sent upstream in the drm-fixes tree. the radeon driver does bad things with memset on drm open/close. *** Bug 12005 has been marked as a duplicate of this bug. *** just tried 2.6.28-rc4-git4, which works perfectly. thanks for fixing it. closing this bug. |