Bug 99421 - radeon: pm suspend/resume issue: hardware cursor not restored on resume (vt switch removed?)
Summary: radeon: pm suspend/resume issue: hardware cursor not restored on resume (vt s...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Console/Framebuffers (show other bugs)
Hardware: All Linux
: P1 low
Assignee: James Simmons
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-03 20:50 UTC by Andreas Mohr
Modified: 2016-03-23 18:34 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.1-rc5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Andreas Mohr 2015-06-03 20:50:21 UTC
Hi,

had 3.16.0 earlier, now moved to 4.1-rc5.

#
# Direct Rendering Manager
#
CONFIG_DRM=y
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_KMS_FB_HELPER=y
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
CONFIG_DRM_TTM=m

# CONFIG_DRM_TDFX is not set
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_RADEON_USERPTR=y
# CONFIG_DRM_RADEON_UMS is not set
# CONFIG_DRM_NOUVEAU is not set
# CONFIG_DRM_I810 is not set
# CONFIG_DRM_I915 is not set
CONFIG_DRM_MGA=y
# CONFIG_DRM_SIS is not set
# CONFIG_DRM_VIA is not set
CONFIG_DRM_SAVAGE=m
CONFIG_DRM_VGEM=m
# CONFIG_DRM_VMWGFX is not set
# CONFIG_DRM_GMA500 is not set
# CONFIG_DRM_UDL is not set
# CONFIG_DRM_AST is not set
# CONFIG_DRM_MGAG200 is not set
# CONFIG_DRM_CIRRUS_QEMU is not set
# CONFIG_DRM_QXL is not set
# CONFIG_DRM_BOCHS is not set

#
# Frame buffer Devices
#
CONFIG_FB=y
CONFIG_FIRMWARE_EDID=y
CONFIG_FB_CMDLINE=y
CONFIG_FB_DDC=m
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
CONFIG_FB_SVGALIB=m
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
CONFIG_FB_VESA=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_FB_RIVA=m
CONFIG_FB_RIVA_I2C=y
# CONFIG_FB_RIVA_DEBUG is not set
CONFIG_FB_RIVA_BACKLIGHT=y
# CONFIG_FB_I740 is not set
# CONFIG_FB_I810 is not set
# CONFIG_FB_LE80578 is not set
CONFIG_FB_MATROX=m
CONFIG_FB_MATROX_MILLENIUM=y
CONFIG_FB_MATROX_MYSTIQUE=y
CONFIG_FB_MATROX_G=y
CONFIG_FB_MATROX_I2C=m
CONFIG_FB_MATROX_MAVEN=m
CONFIG_FB_RADEON=m
CONFIG_FB_RADEON_I2C=y
CONFIG_FB_RADEON_BACKLIGHT=y
CONFIG_FB_RADEON_DEBUG=y
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
CONFIG_FB_S3=m
CONFIG_FB_S3_DDC=y
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
# CONFIG_FB_AUO_K190X is not set
# CONFIG_FB_SIMPLE is not set
.
.
.
CONFIG_VGASTATE=m
CONFIG_HDMI=y

I noticed on 4.1-rc5 that the hardware(?) cursor of my
    01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV250 [Radeon 9000 Series] (rev 01)
turned into an ugly mis-colored square (3x3 cm?) once having resumed into X.org
    ii  xserver-xorg-video-radeon             1:7.4.0-2                          i386         X.Org X server -- AMD/ATI Radeon display driver

I have a somewhat stronger hunch that it might be

commit b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe
Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Date:   Tue Jan 13 09:40:13 2015 +0100

    drm/radeon: dont switch vt on suspend
    
    Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

(~ v4.0-rc4), especially since manually switching VTs (Ctrl-Alt-F1) and back post-resume does immediately manage to get a proper X.Org hardware(?) cursor restored.

So, it would appear that this is *not* truly a "regression", but rather a "side-gression" :)
(due to having progressed to indicating "proper non-vt-switch driver support" when not actually fully supporting it).

So, does it look like proper restore of radeon hardware(?) cursor done on resume on this particular hardware dos not work, and any ideas on what it would take to actually have it implemented?

Thanks for all that you guys are doing,

Andreas Mohr
Comment 1 Maarten Lankhorst 2015-06-04 03:21:56 UTC
If manually switching to a VT and back to X fixes things, that seems to be likely. I didn't test on that hardware, but it seems you should check if the cursor is restored somewhere in the resume path somewhere. :)
Comment 2 Alex Deucher 2015-06-04 15:07:53 UTC
radeon_cursor_reset() is called at the end of the modeset sequence in the kernel to restore the cursor.  The cursor is getting enabled properly (otherwise it wouldn't be visible), but it seems the buffer object used for the cursor has garbage in it.
Comment 3 Andreas Mohr 2015-06-04 15:29:51 UTC
Yeah, I've had at least certain amounts of source research insofar as that allowed me to realize that calling radeon_cursor_reset() somewhere near the end of the resume handler could be successful (not tested yet, and when interpreting your remarks it's useless anyway), but that it is called from mode setting areas as well (as you also say).

Which areas manage the buffer object storage? Is this something governed by driver areas, or actually kept in X.org? IOW, are we talking about missing/incorrect resume parts in the fb driver, or does X.org not have proper notification that it ought to re-provide the bitmap for the hardware cursor?

To give precise info about state: The cursor *is* visible directly post-resume already (quote: "otherwise it wouldn't be visible"), it's just that it's a garbage square, and it does get restored properly either via vt switch or (much easier) once I'm leaving the boundaries ("non-client" area) of the current mapped window.

Thanks!
Comment 4 Andreas Mohr 2015-06-04 16:01:40 UTC
I've done some more experimentation:

Case 1: suspending with a blank desktop space (no window) and then resuming
will show the ugly garbage square which contains the original cursor shape *and* some oh-so-familiar jagged-pattern memory corruption stuff (just like with so many BIOS initialization effects when writing graphics memory in other graphics plane modes). Moving the mouse does not restore the cursor, since there are no window frame transitions which would have the usual window-related cursor change notifications.

Case 2: suspending within a text console, then after resume manually switching back to X: cursor immediately properly restored, rather unsurprisingly.

For the GEM memory object which manages cursor content, we're most likely talking about ?:

$ git grep cursor_bo
radeon_cursor.c:        if (radeon_crtc->cursor_bo) {
radeon_cursor.c:                struct radeon_bo *robj = gem_to_radeon_bo(radeon_crtc->cursor_bo);
radeon_cursor.c:                if (radeon_crtc->cursor_bo != obj)
radeon_cursor.c:                        drm_gem_object_unreference_unlocked(radeon_crtc->cursor_bo);
radeon_cursor.c:        radeon_crtc->cursor_bo = obj;
radeon_cursor.c:        if (radeon_crtc->cursor_bo) {
radeon_cursor.c:                ret = radeon_set_cursor(crtc, radeon_crtc->cursor_bo);
radeon_mode.h:  struct drm_gem_object *cursor_bo;

So, rethinking things: we *do* have a driver-side buffer object which does (well, is expected to) keep the cursor bitmap data, and that cursor bitmap data will get updated (by userspace layers!) e.g. whenever a window switch happens, but since the driver does maintain the cursor object and it *does* carry out radeon_cursor_reset() on resume (otherwise cursor would not be visible at all), we would expect it to reinitialize the cursor properly.

So, does it look like something is actively corrupting cursor_bo memory parts on resume? OTOH cursor_bo is a GEM memory object, and these are kept directly in graphics memory areas, right?? If so, then probably that graphics memory during resume gets used for certain graphics mode operations which overwrite (/re-init) these memory areas?? (which would explain that cursor_bo will re-gain valid content only at the next userspace cursor update request).

So, if GEM bo parts are not "safe" (i.e., persistent) across suspend/resume, who is the one that is supposed to restore them to their proper content?



Potentially helpful dmesg log resume parts:

[42315.000059] PM: noirq suspend of devices complete after 21.205 msecs
[42315.000202] ACPI: Preparing to enter system sleep state S3
[42315.180216] PM: Saving platform NVS memory
[42315.180250] reserve_memtype added [mem 0x1fff1000-0x1fff1fff], track write-ba
ck, req write-back, ret write-back
[42315.180307] reserve_memtype added [mem 0x1fff2000-0x1fff2fff], track write-ba
ck, req write-back, ret write-back
[42315.180307] ACPI: Low-level resume complete
[42315.180307] PM: Restoring platform NVS memory
[42315.180307] ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20
[42315.180307] ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21
[42315.180307] ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22
[42315.180307] ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23
[42315.180307] free_memtype request [mem 0x1fff1000-0x1fff1fff]
[42315.180307] free_memtype request [mem 0x1fff2000-0x1fff2fff]
[42315.180307] ACPI: Waking up from system sleep state S3
[42315.200220] uhci_hcd 0000:00:10.0: System wakeup disabled by ACPI
[42315.200274] uhci_hcd 0000:00:10.1: System wakeup disabled by ACPI
[42315.200328] uhci_hcd 0000:00:10.2: System wakeup disabled by ACPI
[42315.200670] PM: noirq resume of devices complete after 15.219 msecs
[42315.201508] PM: early resume of devices complete after 0.720 msecs
[42315.202506] snd_azt3328 0000:00:0d.0: missing read emulation for AC97 registe
r 0x1e!
[42315.202557] usb usb1: root hub lost power or was reset
[42315.202591] usb usb2: root hub lost power or was reset
[42315.202621] usb usb3: root hub lost power or was reset
[42315.206534] [drm] AGP mode requested: 4
[42315.206541] agpgart-via 0000:00:00.0: AGP 2.0 bridge
[42315.206555] agpgart-via 0000:00:00.0: putting AGP V2 device into 4x mode
[42315.206596] radeon 0000:01:00.0: putting AGP V2 device into 4x mode
[42315.206605] radeon 0000:01:00.0: GTT: 256M 0xC0000000 - 0xCFFFFFFF
[42315.228286] radeon 0000:01:00.0: WB disabled
[42315.228294] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x00000000c0000000 and cpu addr 0xe0812000
[42315.228368] [drm] radeon: ring at 0x00000000C0001000
[42315.230262] sd 0:0:0:0: [sda] Starting disk
[42315.234383] rtc_cmos 00:01: System wakeup disabled by ACPI
[42315.235403] serial 00:03: activated
[42315.236396] serial 00:04: activated
[42315.236914] [drm] ring test succeeded in 0 usecs
[42315.236935] [drm] ib test succeeded in 0 usecs
[42315.780079] usb 2-1: reset low-speed USB device number 2 using uhci_hcd
[42316.340061] usb 2-2: reset low-speed USB device number 3 using uhci_hcd

[42318.240033] floppy driver state

Thanks!
Comment 5 Alex Deucher 2015-06-05 15:13:53 UTC
(In reply to Andreas Mohr from comment #4)
> 
> For the GEM memory object which manages cursor content, we're most likely
> talking about ?:
> 
> $ git grep cursor_bo
> radeon_cursor.c:        if (radeon_crtc->cursor_bo) {
> radeon_cursor.c:                struct radeon_bo *robj =
> gem_to_radeon_bo(radeon_crtc->cursor_bo);
> radeon_cursor.c:                if (radeon_crtc->cursor_bo != obj)
> radeon_cursor.c:                       
> drm_gem_object_unreference_unlocked(radeon_crtc->cursor_bo);
> radeon_cursor.c:        radeon_crtc->cursor_bo = obj;
> radeon_cursor.c:        if (radeon_crtc->cursor_bo) {
> radeon_cursor.c:                ret = radeon_set_cursor(crtc,
> radeon_crtc->cursor_bo);
> radeon_mode.h:  struct drm_gem_object *cursor_bo;

yes.

> 
> So, rethinking things: we *do* have a driver-side buffer object which does
> (well, is expected to) keep the cursor bitmap data, and that cursor bitmap
> data will get updated (by userspace layers!) e.g. whenever a window switch
> happens, but since the driver does maintain the cursor object and it *does*
> carry out radeon_cursor_reset() on resume (otherwise cursor would not be
> visible at all), we would expect it to reinitialize the cursor properly.
> 
> So, does it look like something is actively corrupting cursor_bo memory
> parts on resume? OTOH cursor_bo is a GEM memory object, and these are kept
> directly in graphics memory areas, right?? If so, then probably that
> graphics memory during resume gets used for certain graphics mode operations
> which overwrite (/re-init) these memory areas?? (which would explain that
> cursor_bo will re-gain valid content only at the next userspace cursor
> update request).
> 
> So, if GEM bo parts are not "safe" (i.e., persistent) across suspend/resume,
> who is the one that is supposed to restore them to their proper content?
> 

Driver buffer objects are persistent.  They are copied to system memory on suspend and copied back to vram on resume.  It would seem that that buffer is getting corrupted somewhere.

Note You need to log in before you can comment on or make changes to this bug.