Bug 38332 - System hang when enable rc6
System hang when enable rc6
Status: CLOSED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel)
All Linux
: P1 normal
Assigned To: drivers_video-dri-intel@kernel-bugs.osdl.org
:
Depends on:
Blocks: 36912
  Show dependency treegraph
 
Reported: 2011-06-27 08:16 UTC by Gu Rui
Modified: 2011-08-14 19:22 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.0-rc4+
Tree: Mainline
Regression: Yes


Attachments
enable rc6 for earlier commit (852 bytes, patch)
2011-06-27 22:35 UTC, Ben Widawsky
Details | Diff
dmesg of kernel that rc6 enabled (120.54 KB, text/plain)
2011-06-29 05:19 UTC, Gu Rui
Details
dmesg of rc6 disabled, but try to cat /sys/kernel/debug/dri/0/i915_context_status (146.46 KB, text/plain)
2011-06-29 05:20 UTC, Gu Rui
Details
more info (1.24 KB, patch)
2011-06-30 00:15 UTC, Ben Widawsky
Details | Diff
dmesg of 4a246cfc3c with rc6 enabled (54.17 KB, text/plain)
2011-06-30 01:40 UTC, Gu Rui
Details
dmesg of 4a246cfc3c with rc6 enabled, patch bca7bd19c9cf9 applied (83.46 KB, text/plain)
2011-06-30 02:04 UTC, Gu Rui
Details
disable rc6 while setting context (822 bytes, patch)
2011-06-30 03:02 UTC, Ben Widawsky
Details | Diff
better version of 63952 (906 bytes, patch)
2011-06-30 03:13 UTC, Ben Widawsky
Details | Diff
dumb delay to replicate the debug prints (1.31 KB, patch)
2011-06-30 06:48 UTC, Ben Widawsky
Details | Diff
see if reading pwrctx fix was the (723 bytes, patch)
2011-06-30 06:49 UTC, Ben Widawsky
Details | Diff
more debug info on the rc6 disable patch (895 bytes, patch)
2011-06-30 06:50 UTC, Ben Widawsky
Details | Diff
dmesg of patch Comment #19 (58.52 KB, text/plain)
2011-06-30 16:25 UTC, Gu Rui
Details
dmesg of patch Comment #20 (55.03 KB, text/plain)
2011-06-30 16:26 UTC, Gu Rui
Details
dmesg of patch Comment #21 (54.57 KB, text/plain)
2011-06-30 16:29 UTC, Gu Rui
Details
some more info on rstdbyctl (1.23 KB, patch)
2011-06-30 18:00 UTC, Ben Widawsky
Details | Diff
prevent any rc states while setting pwrcta (1.89 KB, patch)
2011-06-30 23:59 UTC, Ben Widawsky
Details | Diff
dmesg of 64112 applied (52.76 KB, text/plain)
2011-07-01 00:56 UTC, Gu Rui
Details

Description Gu Rui 2011-06-27 08:16:48 UTC
Commit a51f7a66fb5e give rc6 a try but I am the unlucky one. Enabling rc6 will cause the system hang(no keyboard response etc.)

I have booted 3.0-rc4+ with i915.i915_enable_rc6=0 and it will boot up fine.

My box is DELL Vostro3500 with core i5 inside.
Comment 1 Ben Widawsky 2011-06-27 22:35:04 UTC
Created attachment 63642 [details]
enable rc6 for earlier commit
Comment 2 Ben Widawsky 2011-06-27 22:35:45 UTC
Please answer in this order:

1. The previous bug you filed, the hang was caused during X startup. When is this hang caused?

2. When you get the hang, could you try to capture the kernel output using netconsole, and drm.debug=0xe? You can find documentation in Documentation/networking/netconsole.txt.

3. Could you try e76d3630810b0 with the patch  and see if it still hangs. I want to know if we can get back to a working state for you.
Comment 3 Gu Rui 2011-06-29 05:14:24 UTC
Ok, I got the data now ;)

1, X hangs during KDE startup. There is a "watch"(I don't know exactly what it is, something like the plain X shows) shown on the screen, then locked. No KDE splash screen or whatever.

2, I will attach the dmesg in the following posts.

3, e76d3630810b0 is a new debugfs entry. So I tried it on rc6-disabled kernel. But when I try to cat /sys/kernel/debug/dri/0/i915_context_status, kernel oops. I will attach dmesg as well.
Comment 4 Gu Rui 2011-06-29 05:17:38 UTC
Ok, I got the data now ;)

1, X hangs during KDE startup. There is a "watch"(I don't know exactly what it is, something like the plain X shows) shown on the screen, then locked. No KDE splash screen or whatever.

2, I will attach the dmesg in the following posts.

3, e76d3630810b0 is a new debugfs entry. So I tried it on rc6-disabled kernel. But when I try to cat /sys/kernel/debug/dri/0/i915_context_status, kernel oops. I will attach dmesg as well.
Comment 5 Gu Rui 2011-06-29 05:19:48 UTC
Created attachment 63842 [details]
dmesg of kernel that rc6 enabled
Comment 6 Gu Rui 2011-06-29 05:20:41 UTC
Created attachment 63852 [details]
dmesg of rc6 disabled, but try to cat /sys/kernel/debug/dri/0/i915_context_status
Comment 7 Ben Widawsky 2011-06-29 18:55:39 UTC
I cannot read your attachments. Not sure what happened. Can you please try again.
Comment 8 Ben Widawsky 2011-06-29 18:59:05 UTC
(In reply to comment #4)
> Ok, I got the data now ;)
> 
> 1, X hangs during KDE startup. There is a "watch"(I don't know exactly what it
> is, something like the plain X shows) shown on the screen, then locked. No KDE
> splash screen or whatever.
> 
> 2, I will attach the dmesg in the following posts.
> 
> 3, e76d3630810b0 is a new debugfs entry. So I tried it on rc6-disabled kernel.
> But when I try to cat /sys/kernel/debug/dri/0/i915_context_status, kernel oops.
> I will attach dmesg as well.

The fact that the kernel oopses is actually rather important. When loading the driver, we should have valid values for those objects. As I said in the other comment, I cannot read your attachments, but would really like to see the output from this (specifically, is it NULL or some other garbage).
Comment 9 Keith Packard 2011-06-29 20:08:28 UTC
The attachment is UTF-16 encoded, just switch the encoding in your browser.
Comment 10 Ben Widawsky 2011-06-29 22:02:29 UTC
(In reply to comment #6)
> Created an attachment (id=63852) [details]
> dmesg of rc6 disabled, but try to cat
> /sys/kernel/debug/dri/0/i915_context_status

Just read this one again. Yes, there was/is a bug where if the contexts aren't
set (like with rc6 disabled) then reading it from debugfs will oops.

I submitted the fix to keithp, but this is an irrelevant data point. 
https://bugs.freedesktop.org/show_bug.cgi?id=38777
https://patchwork.kernel.org/patch/930062/
Comment 11 Ben Widawsky 2011-06-29 22:05:27 UTC
(In reply to comment #4)
> Ok, I got the data now ;)

> 3, e76d3630810b0 is a new debugfs entry. So I tried it on rc6-disabled kernel.
> But when I try to cat /sys/kernel/debug/dri/0/i915_context_status, kernel oops.
> I will attach dmesg as well.

This was the last point in time where I think the kernel worked for you (my best guess). The patch was just supposed to enable rc6 by default. You can skip the patch, and just set it from the command line though.

We did have this working for you at some point, and I want to try to get back to that so we can bisect.
Comment 12 Ben Widawsky 2011-06-29 22:50:47 UTC
My current best guess for what broke this is:
2c7111dbaec72b01c804afb8ad77c6c7523986fd

The last patch I submitted, which you said worked for you is:
4a246cfc3c337ecb800d508ee5ed906534edb25c

So I'd like to try the following (dmesg for everything please), in order:
4a246cfc3c33 - I expect this to work, as you tested this previously
7df8721beb9c
2c7111dbaec7

It's important to verify 4a246cfc3c33 still works for you so we can get a bisection point.
Comment 13 Ben Widawsky 2011-06-30 00:15:30 UTC
Created attachment 63922 [details]
more info

You can also try using this patch with any of the failing cases to get more info.
Comment 14 Gu Rui 2011-06-30 01:40:14 UTC
Created attachment 63932 [details]
dmesg of 4a246cfc3c with rc6 enabled

Unfortunately, I found 4a246cfc3c don't work if set i915.i915_enable_rc6=1. I think we missed the point that rc6 is disabled by default at that commit.

So, which bisection good point should we start?

P.S. sorry for the UTF-16 thing. I have only one Linux box on LAN. So I got the dmesgs from other win7, which make things bad. I iconved the file so this attachment should be UTF-8 now.
Comment 15 Gu Rui 2011-06-30 02:04:46 UTC
Created attachment 63942 [details]
dmesg of 4a246cfc3c with rc6 enabled, patch bca7bd19c9cf9 applied

It is strange that 4a246cfc3c starts to working after I applied the patch in Comment #13.... Don't know why.
Comment 16 Ben Widawsky 2011-06-30 03:02:59 UTC
Created attachment 63952 [details]
disable rc6 while setting context

I don't think this should make a difference... but I've been wrong every other time, so why not try :)
Comment 17 Ben Widawsky 2011-06-30 03:13:16 UTC
Created attachment 63962 [details]
better version of 63952

This is safer than 63952. Let's go with this.
Comment 18 Gu Rui 2011-06-30 05:31:36 UTC
No, Comment #17 doesn't work... Because other Win7s are busy gaming so I cannot provide the dmesg yet... ;(
Comment 19 Ben Widawsky 2011-06-30 06:48:51 UTC
Created attachment 63972 [details]
dumb delay to replicate the debug prints
Comment 20 Ben Widawsky 2011-06-30 06:49:34 UTC
Created attachment 63982 [details]
see if reading pwrctx fix was the
Comment 21 Ben Widawsky 2011-06-30 06:50:16 UTC
Created attachment 63992 [details]
more debug info on the rc6 disable patch
Comment 22 Ben Widawsky 2011-06-30 06:53:30 UTC
3 new patches to try. You can skip the dmesg from 63962 since 63922 gets me more info anyway. Dmesg on all 3 would be great.

I'll bring my Ironlake out of retirement tomorrow morning.

The first two are to help us try to narrow down why the debug print patch worked.
The third provides me more info for the one I thought should work.
Comment 23 Gu Rui 2011-06-30 16:23:38 UTC
Actually, Ironlake don't always happy with patch in Comment #13. I got the case it hangs as well. More bad news is none of the last 3 patches work. I will attach the dmesgs one by one.

The last patch is broken, I changed DRM_INFO(I915_READ(RSTDBYCTL)); to DRM_INFO("RSTDBYCTL: %x\n", I915_READ(RSTDBYCTL));, which might be good.

Note, the RSTDBYCTL value when Ironlake works( Comment #15 ) differ from the value when it hangs. I don't know whether it helps.
Comment 24 Gu Rui 2011-06-30 16:25:58 UTC
Created attachment 64022 [details]
dmesg of patch Comment #19
Comment 25 Gu Rui 2011-06-30 16:26:57 UTC
Created attachment 64032 [details]
dmesg of patch Comment #20
Comment 26 Gu Rui 2011-06-30 16:29:24 UTC
Created attachment 64041 [details]
dmesg of patch Comment #21

note that RSTDBYCTL is 479c3000(hang) while in Comment #15 is 471c3000(no hang).
Comment 27 Ben Widawsky 2011-06-30 17:59:41 UTC
(In reply to comment #26)
> Created an attachment (id=64041) [details]
> dmesg of patch Comment #21
> 
> note that RSTDBYCTL is 479c3000(hang) while in Comment #15 is 471c3000(no
> hang).

The Bit 23 is expected, it is stating that we've asked the GPU to not go into any of the low power states. However, I was hoping to see 0x478c3000. Which would indicate that we've exited the low power modes. Instead we're seeing the GPU still in a low power mode. I'm going to attach a patch while should give slight more info. Can you please try it at your convenience.
Comment 28 Ben Widawsky 2011-06-30 18:00:45 UTC
Created attachment 64042 [details]
some more info on rstdbyctl

Please try this as well, and attach dmesg.
Comment 29 Jesse Barnes 2011-06-30 20:18:51 UTC
Did we already try something as simple as this?

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_d
index 27d7722..a190443 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7633,6 +7633,10 @@ void ironlake_enable_rc6(struct drm_device *dev)
                return;
        }
 
+       I915_WRITE(RSTDBYCTL, I915_READ(RSTDBYCTL) | RCX_SW_EXIT);
+       wait_for(((I915_READ(RSTDBYCTL) & RSX_STATUS_MASK) == RSX_STATUS_ON),
+                50);
+
        /*
         * GPU can automatically power down the render unit if given a page
         * to save state.
Comment 30 Ben Widawsky 2011-06-30 23:59:15 UTC
Created attachment 64112 [details]
prevent any rc states while setting pwrcta

Please try this patch. My Ironlake behaves as I expect with this patch, and hopefully it may rectify your hang.
Comment 31 Gu Rui 2011-07-01 00:56:46 UTC
Created attachment 64192 [details]
dmesg of 64112 applied

No luck. It still hangs...
Comment 32 Gu Rui 2011-07-01 00:59:55 UTC
(In reply to comment #29)
> Did we already try something as simple as this?
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c
> b/drivers/gpu/drm/i915/intel_d
> index 27d7722..a190443 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -7633,6 +7633,10 @@ void ironlake_enable_rc6(struct drm_device *dev)
>                 return;
>         }
> 
> +       I915_WRITE(RSTDBYCTL, I915_READ(RSTDBYCTL) | RCX_SW_EXIT);
> +       wait_for(((I915_READ(RSTDBYCTL) & RSX_STATUS_MASK) == RSX_STATUS_ON),
> +                50);
> +
>         /*
>          * GPU can automatically power down the render unit if given a page
>          * to save state.

Yes, we have. But it doesn't help...
Comment 33 Ben Widawsky 2011-07-01 02:22:04 UTC
(In reply to comment #31)
> Created an attachment (id=64192) [details]
> dmesg of 64112 applied
> 
> No luck. It still hangs...

I think this pretty safely eliminates rc modes changing while setting up the context as the problem. I'm pretty stumped. We may just have to blacklist this.

Gu, would you be willing to try to back to the last bug to try and get something working? https://bugzilla.kernel.org/show_bug.cgi?id=28582

Maybe one of Chris' patches which didn't make it to Keith's tree made it work.
Comment 34 Ben Widawsky 2011-07-01 16:05:31 UTC
The other thing I thought of is previously you were using fbc. Even though we've seen issues with RC6+fbc on ILK before, you can try enabling fbc to see if there is any difference.

i915_enable_fbc=1
Comment 35 Gu Rui 2011-07-02 03:56:45 UTC
But at the time of commit 4a246cfc3c337e, there is no i915_enable_fbc yet. So I tried with a51f7a66fb5e4af. It refused to work...

BTW, none of the patches in https://bugzilla.kernel.org/show_bug.cgi?id=28582 works for me now. I adjusted some function calls to new API but it still hangs. Does it make sense that I reverse to 2.6.37 and have a try?
Comment 36 Ben Widawsky 2011-07-02 05:35:38 UTC
(In reply to comment #35)
> But at the time of commit 4a246cfc3c337e, there is no i915_enable_fbc yet. So I
> tried with a51f7a66fb5e4af. It refused to work...
> 
> BTW, none of the patches in https://bugzilla.kernel.org/show_bug.cgi?id=28582
> works for me now. I adjusted some function calls to new API but it still hangs.
> Does it make sense that I reverse to 2.6.37 and have a try?

It probably doesn't make sense to go back to 2.6.37. FBC should be enabled by default prior to the param i915_enable_fbc, so I think you've done fine there.

Do you have any BIOS settings to increase the voltage for the GPU? Are there any available BIOS updates for your motherboard?
Comment 37 Gu Rui 2011-07-02 08:47:10 UTC
My BIOS is up-to-date ( http://support.dell.com/support/downloads/download.aspx?c=us&cs=04&l=en&s=bsd&releaseid=R286249&SystemID=VOS_N_3500&servicetag=&os=W732&osl=en&deviceid=23110&devlib=0&typecnt=0&vercnt=7&catid=-1&impid=-1&formatcnt=0&libid=1&typeid=-1&dateid=-1&formatid=-1&source=-1&fileid=425120 ). There is no settings for GPU voltage...

Anyway, if I am the only one with this problem, I can use i915.i915_enable_rc6=1 to cope with it.

I know nothing about the Ironlake, but is there a possibility that we searched in the wrong place? I mean, the bug may lay in somewhere else. It's just my guessing ;)
Comment 38 Ben Widawsky 2011-07-12 21:50:17 UTC
Chris Wilson found a bug in code which could very well explain this regression.

Can you please try the latest -fixes branch from keithp's tree. You will need to set i915_enable_rc6=1 when you load the module.

FYI you can find keithp's tree here:
git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6.git
Comment 39 Gu Rui 2011-07-13 15:19:54 UTC
To be clear, the git head is a94919eaddaa3f(drm/i915/ringbuffer: Idling requires waiting for the ring to be empty). But X still hangs...
Comment 40 Florian Mickler 2011-08-08 08:12:46 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit 4e20fa65a3ea789510eed1a15deb9e8aab2b8202
Author: Keith Packard <keithp@keithp.com>
Date:   Wed Aug 3 10:52:24 2011 -0700

    drm/i915: Try enabling RC6 by default (again)
Comment 41 Florian Mickler 2011-08-08 08:42:40 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.1-rc1:

commit 39060a07781b4930656752943cf5d66376d0533c
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Aug 5 10:56:29 2011 +0100

    Revert "drm/i915: Try enabling RC6 by default (again)"

Note You need to log in before you can comment on or make changes to this bug.