Bug 11891

Summary: resume from disk broken on hp/compaq nx7000 (DRM problem)
Product: Power Management Reporter: Markus Meier (maekke)
Component: Hibernation/SuspendAssignee: Jesse Barnes (jbarnes)
Severity: normal CC: 1i5t5.duncan, airlied, jbarnes, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 11808    
Attachments: .config
map registers at load time

Description Markus Meier 2008-10-29 14:42:26 UTC
Latest working kernel version:
Earliest failing kernel version: 2.6.28-rc1
Distribution: Gentoo
Hardware Environment:
# lspci
00:00.0 Host bridge: Intel Corporation 82855PM Processor to I/O Controller (rev 03)
00:01.0 PCI bridge: Intel Corporation 82855PM Processor to AGP Controller (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 [Mobility FireGL 9000] (rev 01)
02:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306 Fire II IEEE 1394 OHCI Link Layer Controller (rev 80)
02:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
02:02.0 Network controller: Intel Corporation PRO/Wireless LAN 2100 3B Mini PCI Adapter (rev 04)
02:04.0 CardBus bridge: ENE Technology Inc CB1410 Cardbus Controller

Software Environment:
Problem Description:
resuming from harddisk ends in a blinking capslock-led, just about before finishing resuming. (X was active while suspending to disk)
fails with 2.6.28-rc1 and 2.6.28-rc2

Steps to reproduce:
Comment 1 Markus Meier 2008-10-29 14:43:35 UTC
Created attachment 18499 [details]

kernel config for 2.6.28-rc2
Comment 2 Rafael J. Wysocki 2008-10-29 14:52:53 UTC
Please try with the fixes from bug #11827 and bug #11845.
Comment 3 Markus Meier 2008-10-29 16:11:18 UTC
the patches do not help.

the last output lines are:
PM: Loading image data pages (63781 pages) ... done
PM: Read 255124 kbytes in 10.05 seconds (25.38 MB/s)
Suspending console(s) (use no_console_suspend to debug)
Comment 4 Markus Meier 2008-10-29 16:11:48 UTC
Created attachment 18505 [details]

dmesg output
Comment 5 Rafael J. Wysocki 2008-10-29 16:15:51 UTC
Please boot with 'init=/bin/bash', run 'mount /sys && mount /proc && echo mem > /sys/power/state' and see what happens.
Comment 6 Markus Meier 2008-10-29 22:49:35 UTC
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done
Suspending console(s) (use no_console_suspend to debug)

and it freezes (caps-lock blink).
does not happen with (although it doesn't resume there - but that's not a regression.)
Comment 7 Markus Meier 2008-11-01 09:57:33 UTC
bisected, hopefully this is correct...

$ git bisect log
git bisect start
# good: [3fa8749e584b55f1180411ab1b51117190bac1e5] Linux 2.6.27
git bisect good 3fa8749e584b55f1180411ab1b51117190bac1e5
# bad: [57f8f7b60db6f1ed2c6918ab9230c4623a9dbe37] Linux 2.6.28-rc1
git bisect bad 57f8f7b60db6f1ed2c6918ab9230c4623a9dbe37
# good: [cf2fa66055d718ae13e62451bb546505f63906a2] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
git bisect good cf2fa66055d718ae13e62451bb546505f63906a2
# bad: [1d8cca44b6a244b7e378546d719041819049a0f9] byteorder: provide swabb.h generically in asm/byteorder.h
git bisect bad 1d8cca44b6a244b7e378546d719041819049a0f9
# skip: [cb23832e3987a02428a274c8f259336f706b17e9] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
git bisect skip cb23832e3987a02428a274c8f259336f706b17e9
# good: [65ae24b1811650f2bc5b0b85ea8b0bff6b5bf4a9] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
git bisect good 65ae24b1811650f2bc5b0b85ea8b0bff6b5bf4a9
# good: [8eb88c80d444fd249edaa7d895666cde79e7b3b8] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect good 8eb88c80d444fd249edaa7d895666cde79e7b3b8
# bad: [f7ea4a4ba84f382e8eb143e435551de0feee5b4b] Merge branch 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
git bisect bad f7ea4a4ba84f382e8eb143e435551de0feee5b4b
# bad: [9e0b97e37fddaf5419d8af24362015ab684eff7e] drm: make CONFIG_DRM depend on CONFIG_SHMEM.
git bisect bad 9e0b97e37fddaf5419d8af24362015ab684eff7e
# good: [bdbf0ac7e187b2b757216e653e64f8b808b9077e] Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
git bisect good bdbf0ac7e187b2b757216e653e64f8b808b9077e
# good: [398c9cb20b5c6c5d1313912b937d653a46fec578] i915: Initialize hardware status page at device load when possible.
git bisect good 398c9cb20b5c6c5d1313912b937d653a46fec578
# bad: [4f481ed22ec0d412336a13dc4477f6d0f3688882] drm: Avoid oops in GEM execbuffers with bad arguments.
git bisect bad 4f481ed22ec0d412336a13dc4477f6d0f3688882
# bad: [0a3e67a4caac273a3bfc4ced3da364830b1ab241] drm: Rework vblank-wait handling to allow interrupt reduction.
git bisect bad 0a3e67a4caac273a3bfc4ced3da364830b1ab241
# good: [6b79d521e07aae155303a992245abb539974dbaa] radeon: fix writeback across suspend/resume.
git bisect good 6b79d521e07aae155303a992245abb539974dbaa
# good: [b9bfdfe6703eb089839d48316a79c84924a3c335] new chip name is GM45
git bisect good b9bfdfe6703eb089839d48316a79c84924a3c335
# good: [2df68b439fcb97a4c55f81516206ef4ee325e28d] drm/cred: wrap task credential accesses in the drm driver.
git bisect good 2df68b439fcb97a4c55f81516206ef4ee325e28d
Comment 8 Rafael J. Wysocki 2008-11-01 14:33:02 UTC
Hm, do I understand correctly that commit 0a3e67a4caac273a3bfc4ced3da364830b1ab241 "drm: Rework vblank-wait
handling to allow interrupt reduction" is the first bad one?

If this is correct, have you tried to revert this patch alone and see if that works?
Comment 9 Markus Meier 2008-11-02 09:07:13 UTC
yep, reverting the commit made it work.
I also tried a vanilla-2.6.28-rc2 with CONFIG_DRM=n, which also worked.
Comment 10 Rafael J. Wysocki 2008-11-02 09:30:01 UTC
Thanks for verifying this.

Caused by:

commit 0a3e67a4caac273a3bfc4ced3da364830b1ab241
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Sep 30 12:14:26 2008 -0700

    drm: Rework vblank-wait handling to allow interrupt reduction.

    Co-author: Michel Dänzer <michel@tungstengraphics.com>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
Comment 11 Jesse Barnes 2008-11-03 13:48:09 UTC
Looks like this is actually a radeon related issue (according to the lspci & dmesg radeon would be in use here).  And according to comment #6 it might be suspend/resume related (maybe radeon is assuming some of the vblank structs are set up at suspend time?).
Comment 12 Jesse Barnes 2008-11-03 13:58:07 UTC
It looks like the radeon IRQ handler could call drm_handle_vblank before calling drm_vblank_init, depending on the state of the vblank interrupt bits.  This patch should catch that situation and give you a backtrace if it happens, in which case we'll have to fix the radeon driver to be more careful.

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 212a94f..cab7e7d 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -694,6 +694,8 @@ static void drm_vbl_send_signals(struct drm_device *dev, int
 void drm_handle_vblank(struct drm_device *dev, int crtc)
+       BUG_ON(!dev->num_crtcs);
        drm_vbl_send_signals(dev, crtc);
Comment 13 Rafael J. Wysocki 2008-11-03 14:24:36 UTC
Markus, can you test this patch, please?
Comment 14 Markus Meier 2008-11-03 15:19:17 UTC
with the patch applied:

# echo mem > /sys/power/state
in init=/bin/bash results in http://dev.gentoo.org/~maekke/img_0146.jpg

resuming after # echo disk > /sys/power/state (X running) results in http://dev.gentoo.org/~maekke/img_0148.jpg

hope this helps.
Comment 15 Jesse Barnes 2008-11-03 16:16:22 UTC
Created attachment 18644 [details]
map registers at load time

Looks like the registers aren't mapped when you go to do your suspend.  Does this patch at least get things working when you do the suspend/resume from the console?
Comment 16 Markus Meier 2008-11-05 10:50:07 UTC
yes, with this patch, I'm able to suspend/resume from console.
Comment 17 Jesse Barnes 2008-11-05 11:02:50 UTC
Does it fix hibernation too?
Comment 18 Markus Meier 2008-11-05 11:37:48 UTC
yes, hibernation is fixed, too.
Comment 19 Jesse Barnes 2008-11-05 13:03:11 UTC
Great, thanks for testing.  I've submitted the patch to Dave for inclusion into 2.6.28.
Comment 20 Markus Meier 2008-11-05 14:07:11 UTC
I'm sorry, I misunderstood you, the issue is not yet fixed. with your patch, I'm unable to start X. Here's the syslog part, after `startx`:

Nov  5 23:01:13 schleppi agpgart-intel 0000:00:00.0: AGP 2.0 bridge
Nov  5 23:01:13 schleppi agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode
Nov  5 23:01:13 schleppi radeonfb 0000:01:00.0: putting AGP V2 device into 4x mode
Nov  5 23:01:14 schleppi BUG: unable to handle kernel NULL pointer dereference at 00000010
Nov  5 23:01:14 schleppi IP: [<c02989b8>] radeon_read_fb_location+0x98/0xa7
Nov  5 23:01:14 schleppi *pde = 35899067 *pte = 00000000
Nov  5 23:01:14 schleppi Oops: 0000 [#1] PREEMPT
Nov  5 23:01:14 schleppi last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/enable
Nov  5 23:01:14 schleppi Modules linked in: michael_mic arc4 ecb ieee80211_crypt_tkip ipv6 rfcomm bnep l2cap snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device bluetooth firewire
_ohci firewire_core crc_itu_t usbhid snd_intel8x0 snd_intel8x0m ohci1394 yenta_socket ieee1394 rsrc_nonstatic rtc snd_ac97_codec ac97_bus i2c_i801 pcmcia_core ipw2100 snd_pcm ieee80211 ehci_hcd snd_timer uhc
i_hcd 8139cp snd snd_page_alloc ieee80211_crypt thermal processor button video battery ac evdev [last unloaded: hci_usb]
Nov  5 23:01:14 schleppi
Nov  5 23:01:14 schleppi Pid: 3767, comm: X Not tainted (2.6.28-rc2 #6) HP compaq nx7000 (DG706A#UUZ)
Nov  5 23:01:14 schleppi EIP: 0060:[<c02989b8>] EFLAGS: 00213283 CPU: 0
Nov  5 23:01:14 schleppi EIP is at radeon_read_fb_location+0x98/0xa7
Nov  5 23:01:14 schleppi EAX: 00000000 EBX: f71d8800 ECX: f71d8800 EDX: 00000006
Nov  5 23:01:14 schleppi ESI: f712f000 EDI: f628dcc0 EBP: f58b5ecc ESP: f58b5ec8
Nov  5 23:01:14 schleppi DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Nov  5 23:01:14 schleppi Process X (pid: 3767, ti=f58b4000 task=f58be720 task.ti=f58b4000)
Nov  5 23:01:14 schleppi Stack:
Nov  5 23:01:14 schleppi f71d8800 f58b5ee8 c029b05b c029255e f712f124 fffffff4 f628dcc0 40546440
Nov  5 23:01:14 schleppi f58b5f0c c0292590 f614d380 f712f000 c029ab82 f712f074 c0508ce4 f62bce40
Nov  5 23:01:14 schleppi bff3e04c f58b5f28 c0176801 bff3e04c 40546440 f6b50af8 fffffff7 40546440
Nov  5 23:01:14 schleppi Call Trace:
Nov  5 23:01:14 schleppi [<c029b05b>] ? radeon_cp_init+0x4d9/0x946
Nov  5 23:01:14 schleppi [<c029255e>] ? drm_ioctl+0x17e/0x224
Nov  5 23:01:14 schleppi [<c0292590>] ? drm_ioctl+0x1b0/0x224
Nov  5 23:01:14 schleppi [<c029ab82>] ? radeon_cp_init+0x0/0x946
Nov  5 23:01:14 schleppi [<c0176801>] ? vfs_ioctl+0x50/0x69
Nov  5 23:01:14 schleppi [<c0176be0>] ? do_vfs_ioctl+0x3c6/0x3f7
Nov  5 23:01:14 schleppi [<c016ceb9>] ? vfs_write+0xf0/0x12c
Nov  5 23:01:14 schleppi [<c0176c3d>] ? sys_ioctl+0x2c/0x45
Nov  5 23:01:14 schleppi [<c0102ead>] ? sysenter_do_call+0x12/0x30
Nov  5 23:01:14 schleppi Code: 8b 40 10 ba 04 00 7f 00 83 c0 70 89 10 8b 81 e4 00 00 00 8b 40 10 83 c0 74 8b 18 8b 81 e4 00 00 00 31 d2 8b 40 10 83 c0 70 eb c6 <8b> 40 10 05 48 01 00 00 8b 18 89 d8 5b 5d c3
55 89 c1 89 e5 53
Nov  5 23:01:14 schleppi EIP: [<c02989b8>] radeon_read_fb_location+0x98/0xa7 SS:ESP 0068:f58b5ec8
Nov  5 23:01:14 schleppi ---[ end trace a4c2035e18842d92 ]---
Nov  5 23:01:14 schleppi [drm:drm_release] *ERROR* Device busy: 1 0
Comment 21 Rafael J. Wysocki 2008-11-09 09:47:46 UTC
Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org>
Comment 22 Rafael J. Wysocki 2008-11-09 11:13:03 UTC
First-Bad-Commit : 0a3e67a4caac273a3bfc4ced3da364830b1ab241
Comment 23 Jesse Barnes 2008-11-10 15:16:04 UTC
Weird, so that crash looks like the first register read after cp_init... But the registers should already be mapped.  I must be missing something about how the DRM & 2D drivers interact.  Any ideas Dave?
Comment 24 Dave Airlie 2008-11-11 00:38:17 UTC
please try the fix I've sent upstream in the drm-fixes tree.

the radeon driver does bad things with memset on drm open/close.
Comment 25 Rafael J. Wysocki 2008-11-11 05:22:55 UTC
*** Bug 12005 has been marked as a duplicate of this bug. ***
Comment 26 Markus Meier 2008-11-13 12:08:15 UTC
just tried 2.6.28-rc4-git4, which works perfectly. thanks for fixing it.
closing this bug.