Bug 99421
Summary: | radeon: pm suspend/resume issue: hardware cursor not restored on resume (vt switch removed?) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Andreas Mohr (andi) |
Component: | Console/Framebuffers | Assignee: | James Simmons (jsimmons) |
Status: | NEW --- | ||
Severity: | low | CC: | alexdeucher, bugs, szg00000 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.1-rc5 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Andreas Mohr
2015-06-03 20:50:21 UTC
If manually switching to a VT and back to X fixes things, that seems to be likely. I didn't test on that hardware, but it seems you should check if the cursor is restored somewhere in the resume path somewhere. :) radeon_cursor_reset() is called at the end of the modeset sequence in the kernel to restore the cursor. The cursor is getting enabled properly (otherwise it wouldn't be visible), but it seems the buffer object used for the cursor has garbage in it. Yeah, I've had at least certain amounts of source research insofar as that allowed me to realize that calling radeon_cursor_reset() somewhere near the end of the resume handler could be successful (not tested yet, and when interpreting your remarks it's useless anyway), but that it is called from mode setting areas as well (as you also say). Which areas manage the buffer object storage? Is this something governed by driver areas, or actually kept in X.org? IOW, are we talking about missing/incorrect resume parts in the fb driver, or does X.org not have proper notification that it ought to re-provide the bitmap for the hardware cursor? To give precise info about state: The cursor *is* visible directly post-resume already (quote: "otherwise it wouldn't be visible"), it's just that it's a garbage square, and it does get restored properly either via vt switch or (much easier) once I'm leaving the boundaries ("non-client" area) of the current mapped window. Thanks! I've done some more experimentation: Case 1: suspending with a blank desktop space (no window) and then resuming will show the ugly garbage square which contains the original cursor shape *and* some oh-so-familiar jagged-pattern memory corruption stuff (just like with so many BIOS initialization effects when writing graphics memory in other graphics plane modes). Moving the mouse does not restore the cursor, since there are no window frame transitions which would have the usual window-related cursor change notifications. Case 2: suspending within a text console, then after resume manually switching back to X: cursor immediately properly restored, rather unsurprisingly. For the GEM memory object which manages cursor content, we're most likely talking about ?: $ git grep cursor_bo radeon_cursor.c: if (radeon_crtc->cursor_bo) { radeon_cursor.c: struct radeon_bo *robj = gem_to_radeon_bo(radeon_crtc->cursor_bo); radeon_cursor.c: if (radeon_crtc->cursor_bo != obj) radeon_cursor.c: drm_gem_object_unreference_unlocked(radeon_crtc->cursor_bo); radeon_cursor.c: radeon_crtc->cursor_bo = obj; radeon_cursor.c: if (radeon_crtc->cursor_bo) { radeon_cursor.c: ret = radeon_set_cursor(crtc, radeon_crtc->cursor_bo); radeon_mode.h: struct drm_gem_object *cursor_bo; So, rethinking things: we *do* have a driver-side buffer object which does (well, is expected to) keep the cursor bitmap data, and that cursor bitmap data will get updated (by userspace layers!) e.g. whenever a window switch happens, but since the driver does maintain the cursor object and it *does* carry out radeon_cursor_reset() on resume (otherwise cursor would not be visible at all), we would expect it to reinitialize the cursor properly. So, does it look like something is actively corrupting cursor_bo memory parts on resume? OTOH cursor_bo is a GEM memory object, and these are kept directly in graphics memory areas, right?? If so, then probably that graphics memory during resume gets used for certain graphics mode operations which overwrite (/re-init) these memory areas?? (which would explain that cursor_bo will re-gain valid content only at the next userspace cursor update request). So, if GEM bo parts are not "safe" (i.e., persistent) across suspend/resume, who is the one that is supposed to restore them to their proper content? Potentially helpful dmesg log resume parts: [42315.000059] PM: noirq suspend of devices complete after 21.205 msecs [42315.000202] ACPI: Preparing to enter system sleep state S3 [42315.180216] PM: Saving platform NVS memory [42315.180250] reserve_memtype added [mem 0x1fff1000-0x1fff1fff], track write-ba ck, req write-back, ret write-back [42315.180307] reserve_memtype added [mem 0x1fff2000-0x1fff2fff], track write-ba ck, req write-back, ret write-back [42315.180307] ACPI: Low-level resume complete [42315.180307] PM: Restoring platform NVS memory [42315.180307] ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20 [42315.180307] ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21 [42315.180307] ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22 [42315.180307] ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23 [42315.180307] free_memtype request [mem 0x1fff1000-0x1fff1fff] [42315.180307] free_memtype request [mem 0x1fff2000-0x1fff2fff] [42315.180307] ACPI: Waking up from system sleep state S3 [42315.200220] uhci_hcd 0000:00:10.0: System wakeup disabled by ACPI [42315.200274] uhci_hcd 0000:00:10.1: System wakeup disabled by ACPI [42315.200328] uhci_hcd 0000:00:10.2: System wakeup disabled by ACPI [42315.200670] PM: noirq resume of devices complete after 15.219 msecs [42315.201508] PM: early resume of devices complete after 0.720 msecs [42315.202506] snd_azt3328 0000:00:0d.0: missing read emulation for AC97 registe r 0x1e! [42315.202557] usb usb1: root hub lost power or was reset [42315.202591] usb usb2: root hub lost power or was reset [42315.202621] usb usb3: root hub lost power or was reset [42315.206534] [drm] AGP mode requested: 4 [42315.206541] agpgart-via 0000:00:00.0: AGP 2.0 bridge [42315.206555] agpgart-via 0000:00:00.0: putting AGP V2 device into 4x mode [42315.206596] radeon 0000:01:00.0: putting AGP V2 device into 4x mode [42315.206605] radeon 0000:01:00.0: GTT: 256M 0xC0000000 - 0xCFFFFFFF [42315.228286] radeon 0000:01:00.0: WB disabled [42315.228294] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x00000000c0000000 and cpu addr 0xe0812000 [42315.228368] [drm] radeon: ring at 0x00000000C0001000 [42315.230262] sd 0:0:0:0: [sda] Starting disk [42315.234383] rtc_cmos 00:01: System wakeup disabled by ACPI [42315.235403] serial 00:03: activated [42315.236396] serial 00:04: activated [42315.236914] [drm] ring test succeeded in 0 usecs [42315.236935] [drm] ib test succeeded in 0 usecs [42315.780079] usb 2-1: reset low-speed USB device number 2 using uhci_hcd [42316.340061] usb 2-2: reset low-speed USB device number 3 using uhci_hcd [42318.240033] floppy driver state Thanks! (In reply to Andreas Mohr from comment #4) > > For the GEM memory object which manages cursor content, we're most likely > talking about ?: > > $ git grep cursor_bo > radeon_cursor.c: if (radeon_crtc->cursor_bo) { > radeon_cursor.c: struct radeon_bo *robj = > gem_to_radeon_bo(radeon_crtc->cursor_bo); > radeon_cursor.c: if (radeon_crtc->cursor_bo != obj) > radeon_cursor.c: > drm_gem_object_unreference_unlocked(radeon_crtc->cursor_bo); > radeon_cursor.c: radeon_crtc->cursor_bo = obj; > radeon_cursor.c: if (radeon_crtc->cursor_bo) { > radeon_cursor.c: ret = radeon_set_cursor(crtc, > radeon_crtc->cursor_bo); > radeon_mode.h: struct drm_gem_object *cursor_bo; yes. > > So, rethinking things: we *do* have a driver-side buffer object which does > (well, is expected to) keep the cursor bitmap data, and that cursor bitmap > data will get updated (by userspace layers!) e.g. whenever a window switch > happens, but since the driver does maintain the cursor object and it *does* > carry out radeon_cursor_reset() on resume (otherwise cursor would not be > visible at all), we would expect it to reinitialize the cursor properly. > > So, does it look like something is actively corrupting cursor_bo memory > parts on resume? OTOH cursor_bo is a GEM memory object, and these are kept > directly in graphics memory areas, right?? If so, then probably that > graphics memory during resume gets used for certain graphics mode operations > which overwrite (/re-init) these memory areas?? (which would explain that > cursor_bo will re-gain valid content only at the next userspace cursor > update request). > > So, if GEM bo parts are not "safe" (i.e., persistent) across suspend/resume, > who is the one that is supposed to restore them to their proper content? > Driver buffer objects are persistent. They are copied to system memory on suspend and copied back to vram on resume. It would seem that that buffer is getting corrupted somewhere. |