13509 – VT switch causes system to lockup

Bug 13509 - VT switch causes system to lockup

Summary: VT switch causes system to lockup

Status:	RESOLVED WILL_NOT_FIX

Alias:	None

Product:	Drivers
Classification:	Unclassified
Component:	Console/Framebuffers (show other bugs)
Hardware:	All Linux

Importance:	P1 high
Assignee:	James Simmons

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-06-11 18:42 UTC by Tobias Jakobi
Modified:	2009-12-05 12:01 UTC (History)
CC List:	1 user (show)

See Also:
Kernel Version:	2.6.30-gentoo-r1
Subsystem:
Regression:	Yes
Bisected commit-id:

Attachments
zipped config from the kernel (10.50 KB, application/octet-stream) 2009-06-11 18:54 UTC, Tobias Jakobi	Details
dmesg output (15.12 KB, text/plain) 2009-06-11 18:56 UTC, Tobias Jakobi	Details
unable to handle kernel NULL pointer dereference (3.47 KB, text/plain) 2009-06-24 20:00 UTC, Tobias Jakobi	Details
working system (lspci) (938 bytes, text/plain) 2009-07-03 21:14 UTC, Jan Bücken	Details
NOT working system (modules) (1.70 KB, text/plain) 2009-07-03 21:14 UTC, Jan Bücken	Details
not working system (lspci) (2.26 KB, text/plain) 2009-07-03 21:16 UTC, Jan Bücken	Details
working system (lsmod) (633 bytes, text/plain) 2009-07-03 21:19 UTC, Jan Bücken	Details
log of bisect between (vanilla) 2.6.29 and 2.6.30-rc1 (2.25 KB, text/plain) 2009-08-06 15:59 UTC, Jan Bücken	Details
Add an attachment (proposed patch, testcase, etc.)

Description Tobias Jakobi 2009-06-11 18:42:29 UTC

Hi there,

I just updated my amd64 system from a 2.6.27 kernel to a fresh 2.6.30-gentoo-r1 one. The system boots fine, but as soon as I switch to another VT the whole system lockups.

The screen just freezes, still showing the old content of VT1. ACPI buttons don't work, and ctrl-alt-del doesn't cause a restart. I haven't tried remote ssh login yet, nor magic SysRq (going to do this next).

To emphasize this: Just doing work on the first console is perfectly fine - the system is rockstable this way. So it's not some instability that is triggered by the VT switch.

Anyway, I'm using uvesafb here (gfx card is a integrated Radeon HD 3200).
More informations to follow.

Greets,
Tobias

Comment 1 Jan Bücken 2009-06-11 18:49:41 UTC

Confirming with Radeon HD Mobility 2600 and 2.6.30-gentoo (not r1).

Comment 2 Tobias Jakobi 2009-06-11 18:54:24 UTC

Created attachment 21857 [details]
zipped config from the kernel

Comment 3 Tobias Jakobi 2009-06-11 18:56:36 UTC

Created attachment 21858 [details]
dmesg output

output when just staying on VT1

I didn't find any oops message in my /var/log/kern.log file, so I'm not sure how to debug this issue / provide any interesting logs.

Comment 4 Jan Bücken 2009-06-11 22:50:54 UTC

magic SysRq does work!
I used uvesafb (as module). Without uvesafb, I can switch to another VT!

Comment 5 Andrew Morton 2009-06-11 23:04:58 UTC

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 11 Jun 2009 18:42:30 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13509
> 
>            Summary: VT switch causes system to lockup
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.30-gentoo-r1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Console/Framebuffers
>         AssignedTo: jsimmons@infradead.org
>         ReportedBy: liquid.acid@gmx.net
>         Regression: Yes
> 
> 
> Hi there,
> 
> I just updated my amd64 system from a 2.6.27 kernel to a fresh
> 2.6.30-gentoo-r1
> one. The system boots fine, but as soon as I switch to another VT the whole
> system lockups.
> 
> The screen just freezes, still showing the old content of VT1. ACPI buttons
> don't work, and ctrl-alt-del doesn't cause a restart. I haven't tried remote
> ssh login yet, nor magic SysRq (going to do this next).
> 
> To emphasize this: Just doing work on the first console is perfectly fine -
> the
> system is rockstable this way. So it's not some instability that is triggered
> by the VT switch.
> 
> Anyway, I'm using uvesafb here (gfx card is a integrated Radeon HD 3200).
> More informations to follow.

hm, 2.6.27->2.6.30 is a large hop.

Perhaps you were using vesafb in 2.6.27 and you're now using uvesafb?

Comment 6 Tobias Jakobi 2009-06-12 08:40:06 UTC

Andrew Morton wrote:
> 
> hm, 2.6.27->2.6.30 is a large hop.
Hi Andrew!

Yeah, I know. I was going to build a 2.6.29 though, since the fglrx
driver seems to have some problems with the 30 one.
I'm going to check then if the problem also occurs with 29.

> 
> Perhaps you were using vesafb in 2.6.27 and you're now using uvesafb?
> 
Nope, I can assure you that. :)
The 27 kernel has uvesafb activated and is also using it. In fact I just
used make oldconfig to port the 27 config over to 30 (I'm not using any
distro tools for kernel compiling, just plain make).

Greets,
Tobias

Comment 7 Tobias Jakobi 2009-06-12 08:40:14 UTC

Andrew Morton wrote:
> 
> hm, 2.6.27->2.6.30 is a large hop.
Hi Andrew!

Yeah, I know. I was going to build a 2.6.29 though, since the fglrx
driver seems to have some problems with the 30 one.
I'm going to check then if the problem also occurs with 29.

> 
> Perhaps you were using vesafb in 2.6.27 and you're now using uvesafb?
> 
Nope, I can assure you that. :)
The 27 kernel has uvesafb activated and is also using it. In fact I just
used make oldconfig to port the 27 config over to 30 (I'm not using any
distro tools for kernel compiling, just plain make).

Greets,
Tobias

Comment 8 Anonymous Emailer 2009-06-12 10:27:09 UTC

Reply-To: jbuecken@gmx.de

Tobias Jakobi schrieb:
> Andrew Morton wrote:
>   
>> hm, 2.6.27->2.6.30 is a large hop.
>>     
> Hi Andrew!
>
> Yeah, I know. I was going to build a 2.6.29 though, since the fglrx
> driver seems to have some problems with the 30 one.
> I'm going to check then if the problem also occurs with 29.
>   
I used the 29 before with uvesafb: There it was not a problem.
Seems to be a regression between 29 and 30.
>
>   
>> Perhaps you were using vesafb in 2.6.27 and you're now using uvesafb?
>>
>>     
> Nope, I can assure you that. :)
> The 27 kernel has uvesafb activated and is also using it. In fact I just
> used make oldconfig to port the 27 config over to 30 (I'm not using any
> distro tools for kernel compiling, just plain make).
>
> Greets,
> Tobias
>
>

Comment 9 Anonymous Emailer 2009-06-12 10:27:17 UTC

Reply-To: jbuecken@gmx.de

Tobias Jakobi schrieb:
> Andrew Morton wrote:
>   
>> hm, 2.6.27->2.6.30 is a large hop.
>>     
> Hi Andrew!
>
> Yeah, I know. I was going to build a 2.6.29 though, since the fglrx
> driver seems to have some problems with the 30 one.
> I'm going to check then if the problem also occurs with 29.
>   
I used the 29 before with uvesafb: There it was not a problem.
Seems to be a regression between 29 and 30.
>
>   
>> Perhaps you were using vesafb in 2.6.27 and you're now using uvesafb?
>>
>>     
> Nope, I can assure you that. :)
> The 27 kernel has uvesafb activated and is also using it. In fact I just
> used make oldconfig to port the 27 config over to 30 (I'm not using any
> distro tools for kernel compiling, just plain make).
>
> Greets,
> Tobias
>
>

Comment 10 Tobias Jakobi 2009-06-14 23:32:19 UTC

Update:
Not loading the uvesafb module when using the 2.6.30 kernel solves the problem. Like Jan stated SysRq still works to bring the system down.

The problem is not existant with the 2.6.29 kernel.
Did not find anything useful in the kernel logs though. What should I look out for?

I'm going to try SSHing into the box next, maybe I can get more info that way.

Comment 11 Tobias Jakobi 2009-06-24 20:00:07 UTC

Created attachment 22083 [details]
unable to handle kernel NULL pointer dereference

This is the dmesg log I fetched via ssh.

I recompiled the kernel to make sure everything was correct, selecting uvesafb as module. Then I rebooted the system with this kernel. Once on the console I modprobe uvesafb, which worked and switched to the correct res.
Then I tried VT switch and the screen froze. Logged in via ssh from my other machine and called dmesg to see that was going on. The dmesg snip is attached.

Comment 12 Jan Bücken 2009-07-03 20:51:47 UTC

New info:
I have an old system (sempron 2200+ with 1,5 ghz, geforce 2 mx/mx 400, no X-driver yet, because I want to try nouveau and the system is a new set-up with gentoo, same gentoo-sources (2.6.30-r1))
and I DON'T have this problem!!! (

Comment 13 Jan Bücken 2009-07-03 20:58:04 UTC

But I don't know if this is a problem with my ATI Card or the fglrx (Ati Mobility Radeon HD 2600) because I uninstalled the fglrx ati-drivers AND I deleted the kernel module (I uninstalled the ati-driver, but after a reboot, fglrx was still loaded.)
and the bug remains.

Comment 14 Jan Bücken 2009-07-03 21:14:06 UTC

Created attachment 22198 [details]
working system (lspci)

lspci of the working system with uvesafb

Comment 15 Jan Bücken 2009-07-03 21:14:55 UTC

Created attachment 22199 [details]
NOT working system (modules)

loaded modules on the working machine

Comment 16 Jan Bücken 2009-07-03 21:16:02 UTC

Created attachment 22200 [details]
not working system (lspci)

System on which uvesafb fails

Comment 17 Jan Bücken 2009-07-03 21:17:25 UTC

Comment on attachment 22199 [details]
NOT working system (modules)

System on which uvesafb fails

Comment 18 Jan Bücken 2009-07-03 21:19:45 UTC

Created attachment 22202 [details]
working system (lsmod)

running modules on the system on which uvesafb doesn't fail!

Comment 19 Jan Bücken 2009-07-03 21:22:04 UTC

The infos above may help to find the problem comparing the modules loaded!?!

Comment 20 Jan Bücken 2009-08-04 10:57:35 UTC

still present in vanilla 2.6.31_rc4

Comment 21 Jan Bücken 2009-08-04 15:57:41 UTC

new test with vanilla kernels: 
regression was introduced between 2.6.29 and 2.6.30_rc1

Comment 22 Jan Bücken 2009-08-06 15:57:14 UTC

Bisect done (going to verify this by reverting the commit in 2.6.30 and last failing kernel in bisect):

1cc9fb6dbf915e5c7e7e59bb7fab10572ddbb349 is first bad commit
commit 1cc9fb6dbf915e5c7e7e59bb7fab10572ddbb349
Author: Roel Kluin <roel.kluin@gmail.com>
Date:   Tue Mar 31 15:25:35 2009 -0700

    uvesafb: bitwise OR has higher precedence than ?:
    
    Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
    Acked-by: Michal Januszewski <michalj@gmail.com>
    Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 a1a33f2dc7ba811f4a70c5b23126c3db3549b2b5 0af8f8a2c431118333dfa89b
efc76c056953d3e6 M      drivers

Comment 23 Jan Bücken 2009-08-06 15:59:31 UTC

Created attachment 22626 [details]
log of bisect between (vanilla) 2.6.29 and 2.6.30-rc1

Should I add Kluin to CC?

Comment 24 Jan Bücken 2009-08-06 16:27:33 UTC

(In reply to comment #22)
> Bisect done (going to verify this by reverting the commit in 2.6.30 and last
> failing kernel in bisect):

And yes!: Revert the commit and this bug is fixed, as well in the gentoo-sources-2.6.30-r4

Comment 25 Tobias Jakobi 2009-08-06 16:34:01 UTC

Funny thing is that vesafb also got such an update:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b83734ec0975e1f53420b7a2d454612fc905a9d0;hp=1cc9fb6dbf915e5c7e7e59bb7fab10572ddbb349

(the other commit is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1cc9fb6dbf915e5c7e7e59bb7fab10572ddbb349;hp=b935257b1f98291ec1c8cbf7dbccbe0b20665bf6)

Comment 26 Jan Bücken 2009-08-07 13:11:24 UTC

Replacing
        info->flags = FBINFO_FLAG_DEFAULT |
               (ypan ? FBINFO_HWACCEL_YPAN : 0);
by 

        info->flags = 0;

or

        info->flags = FBINFO_HWACCEL_YPAN;

fixes the bug, too.


With the current (broken) code the behaviour does not change if you use the scroll=redraw or ypan or ywrap option.

Comment 27 Jan Bücken 2009-12-03 19:34:47 UTC

Added many information to the bug, but did not recongnize:

>(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).



Thus please have a look at the bug:

http://bugzilla.kernel.org/show_bug.cgi?id=13509


For me, I use radeon kms now and I leaving this bug, thus you can set it 
to abandoned or fix it...

Greetings
Jan

Comment 28 Jan Bücken 2009-12-03 19:34:55 UTC

Added many information to the bug, but did not recongnize:

>(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).



Thus please have a look at the bug:

http://bugzilla.kernel.org/show_bug.cgi?id=13509


For me, I use radeon kms now and I leaving this bug, thus you can set it 
to abandoned or fix it...

Greetings
Jan

Comment 29 Tobias Jakobi 2009-12-05 12:01:46 UTC

I'd like to set this one to abandoned, but bugzilla doesn't offer me an option to do so.

Well, I think this issue only happens on radeon hardware anyway and there we've got KMS which is mostly working. So people experiencing this issue should just move from uvesafb to radeon+KMS.

So I'm setting the bug to RESOLVED: I'm now using KMS on my hardware which works very well :)

Greets,
Tobias

Note You need to log in before you can comment on or make changes to this bug.