Bug 32692 - Resume from suspend-to-RAM (S3) broken by drm/i915
Summary: Resume from suspend-to-RAM (S3) broken by drm/i915
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri-intel@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-05 09:55 UTC by Michael Doube
Modified: 2015-11-09 09:50 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.39-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Michael Doube 2011-04-05 09:55:40 UTC
The following commit breaks suspend to RAM on my Sony Vaio SZ650 with GM965 display adapter. During bisect I noticed that the more recent commits had different, more severe, behaviour than the commits closer to the first bad commit. The older behaviour was that I could Ctrl+Alt+F1 to get a terminal login when resume failed, the more recent behaviour, in v2.6.39-rc1 is for a complete hang showing the Ubuntu (10.10) bootup screen (Ubuntu text with orange dots beneath).

Apologies for reporting this a bit late, but it only came up when another bug got some attention: 
https://bugzilla.kernel.org/show_bug.cgi?id=26682

a7a75c8f70d6f6a2f16c9f627f938bbee2d32718 is the first bad commit
commit a7a75c8f70d6f6a2f16c9f627f938bbee2d32718
Author: Zhenyu Wang <zhenyuw@linux.intel.com>
Date:   Wed Mar 2 13:52:37 2011 +0800

    drm/i915: Don't save/restore hardware status page address register
    
    It's cleaned before saving and re-initialized after restoring.
    So don't need to save/restore it. And also new chip has new address
    for hardware status page register, don't write to old address.
    
    Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

:040000 040000 9f7f3f61301c898be07cabd27a47877b68cc2e48 7f7bfea28f8957c7130a352a5f64e26da9bdfbf8 M	drivers
Comment 1 Chris Wilson 2011-04-05 09:59:44 UTC
commit f0c860246472248a534656d6cdbed5a36d1feb2e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 23 17:53:28 2011 +0000

    Revert "drm/i915: Don't save/restore hardware status page address register"
    
    This reverts commit a7a75c8f70d6f6a2f16c9f627f938bbee2d32718.
    
    There are two different variations on how Intel hardware addresses the
    "Hardware Status Page". One as a location in physical memory and the
    other as an offset into the virtual memory of the GPU, used in more
    recent chipsets. (The HWS itself is a cacheable region of memory which
    the GPU can write to without requiring CPU synchronisation, used for
    updating various details of hardware state, such as the position of
    the GPU head in the ringbuffer, the last breadcrumb seqno, etc).
      
    These two types of addresses were updated in different locations of code
    - one inline with the ringbuffer initialisation, and the other during
    device initialisation. (The HWS page is logically associated with
    the rings, and there is one HWS page per ring.) During resume, only the
    ringbuffers were being re-initialised along with the virtual HWS page,
    leaving the older physical address HWS untouched. This then caused a
    hang on the older gen3/4 (915GM, 945GM, 965GM) the first time we tried
    to synchronise the GPU as the breadcrumbs were never being updated.
    
    Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Reported-by: Jan Niehusmann <jan@gondor.com>
    Reported-and-tested-by: Justin P. Mattock <justinmattock@gmail.com>
    Reported-and-tested-by: Michael "brot" Groh <brot@minad.de>
    Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Comment 2 Michael Doube 2011-04-05 10:03:07 UTC
Lovely, thanks Chris, I will get back to testing the killswitch.
Comment 3 Michael Doube 2011-04-06 13:42:30 UTC
Actually, there is another S3 resume bug in -rc1 (I see -rc2 is out a couple of hours ago, will test there too), the one I was actually trying to find bisected to:

40c7f2112ce18fa5eb6dc209c50dd0f046790191 is the first bad commit

Merge branch 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6    

* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (177 commits)
      drm/radeon: fixup refcounts in radeon dumb create ioctl.
      drm: radeon: *_cs_packet_parse_vline() cleanup
      radeon: merge list_del()/list_add_tail() to list_move_tail()
      drm: Retry i2c transfer of EDID block after failure
      drm/radeon/kms: fix typo in atom overscan setup
      drm: Hold the mode mutex whilst probing for sysfs status
      drm/nouveau: fix __nouveau_fence_wait performance
      drm/nv40: attempt to reserve just enough vram for all 32 channels
      drm/nv50: check for vm traps on every gr irq
      drm/nv50: decode vm faults some more
      drm/nouveau: add nouveau_enum_find() util function
      drm/nouveau: properly handle pushbuffer check failures
      drm/nvc0: remove vm hack forcing large/small pages to not share a PDE
      drm/i915: disable opregion lid detection for now.
      drm/i915: Only wait on a pending flip if we intend to write to the buffer
      drm/i915/dp: Sanity check eDP existence
      drm: add cap bit to denote if dumb ioctl is available or not.
      drm/core: add ioctl to query device/driver capabilities
      drm/radeon/kms: allow max clock of 340 Mhz on hdmi 1.3+
      drm/radeon/kms: add cayman pci ids
      ...

How should I proceed from here?
Comment 4 Michael Doube 2011-04-06 14:52:12 UTC
Just tested 2.6.39-rc2 and it's present there too.
Comment 5 Michael Doube 2011-04-13 12:16:25 UTC
So I checked out drm-core-next to investigate what broke during the merge above and found this bug in resuming from suspend.

a7a75c8f70d6f6a2f16c9f627f938bbee2d32718 is the first bad commit
commit a7a75c8f70d6f6a2f16c9f627f938bbee2d32718
Author: Zhenyu Wang <zhenyuw@linux.intel.com>
Date:   Wed Mar 2 13:52:37 2011 +0800

    drm/i915: Don't save/restore hardware status page address register
    
    It's cleaned before saving and re-initialized after restoring.
    So don't need to save/restore it. And also new chip has new address
    for hardware status page register, don't write to old address.
    
    Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

:040000 040000 9f7f3f61301c898be07cabd27a47877b68cc2e48 7f7bfea28f8957c7130a352a5f64e26da9bdfbf8 M	drivers
Comment 6 Michael Doube 2011-04-13 12:17:12 UTC
Ah phooey.  It's the same one.
Comment 7 Gilbie Rivas 2015-11-09 09:50:49 UTC
(In reply to Michael Doube from comment #3)
> Actually, there is another S3 resume bug in -rc1 (I see -rc2 is out a couple
> of hours ago, will test there too), the one I was actually trying to find
> bisected to:
> 
> 40c7f2112ce18fa5eb6dc209c50dd0f046790191 is the first bad commit
> 
> Merge branch 'drm-core-next' of
> git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6    
> 
> * 'drm-core-next' of
> git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (177 commits)
>       drm/radeon: fixup refcounts in radeon dumb create ioctl.
>       drm: radeon: *_cs_packet_parse_vline() cleanup
>       radeon: merge list_del()/list_add_tail() to list_move_tail()
>       drm: Retry i2c transfer of EDID block after failure
>       drm/radeon/kms: fix typo in atom overscan setup
>       drm: Hold the mode mutex whilst probing for sysfs status
>       drm/nouveau: fix __nouveau_fence_wait performance
>       drm/nv40: attempt to reserve just enough vram for all 32 channels
>       drm/nv50: check for vm traps on every gr irq
>       drm/nv50: decode vm faults some more
>       drm/nouveau: add nouveau_enum_find() util function
>       drm/nouveau: properly handle pushbuffer check failures
>       drm/nvc0: remove vm hack forcing large/small pages to not share a PDE
>       drm/i915: disable opregion lid detection for now.
>       drm/i915: Only wait on a pending flip if we intend to write to the
> buffer
>       drm/i915/dp: Sanity check eDP existence
>       drm: add cap bit to denote if dumb ioctl is available or not.
>       drm/core: add ioctl to query device/driver capabilities
>       drm/radeon/kms: allow max clock of 340 Mhz on hdmi 1.3+
>       drm/radeon/kms: add cayman pci ids
>       ...
> 
> How should I proceed from here?

Thought-provoking commentary , Just to add my thoughts , if your company are interested in merging of two images , my husband saw a service here <a href="http://www.altomerge.com/" >http://www.altomerge.com/</a>.

Note You need to log in before you can comment on or make changes to this bug.