Bug 13869 - Radeon framebuffer (w/o KMS) corruption at boot.
Summary: Radeon framebuffer (w/o KMS) corruption at boot.
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks: 13615
  Show dependency tree
 
Reported: 2009-07-29 16:44 UTC by Duncan
Modified: 2009-10-06 21:06 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.31-rc4-198-g7d3e91b
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Duncan 2009-07-29 16:44:58 UTC
I have an older rv280 Radeon 9200 SE AGP, dual CRTC, DVI + VGA (plus unconnected TV-Out).  To it I have connected dual 1920x1200 monitors, one each to the DVI and VGA out ports.

I run the radeon framebuffer in native 1920x1200 mode at the text console, and haven't yet enabled KMS.

I've noted that for most of the 2.6.31 cycle, thru rc4-198-g7d3e91b pulled just this morning, at boot, sometimes both monitors come up fine, sometimes the VGA connected monitor comes up fine in framebuffer, while the DVI connected monitor has the characteristic larger print stair-step scramble of unmatched hardware and software resolution.  Normally, they come up as clones of each other.

I strongly suspect that the changes introducing Radeon KMS (even tho I don't have it enabled) disrupted the hardware mode reset of the DVI CRTC, such that it stays in whatever mode grub or early-boot uses, before the framebuffer mode switch.  The VGA CRTC switches just fine, thus allowing me to actually see what I'm doing on it, login, do whatever, startx, etc.  I'm guessing the code now only checks for and resets one of the CRTCs instead of both of them, as it did before.   It's only when I startx and its mode switches kick in that the DVI connected monitor gets reset to normal, after which I can VT-switch back to a text VT, and they both come up fine.  However, before starting X, simply switching between text/framebuffer mode VTs doesn't unscramble the DVI connected one.

However, sometimes it works just fine.  I /think/ it has something to do with whether it's a cold startup, or a warm C-A-D based reboot, possibly with whever mode it was in before the reboot as a triggering factor.  Whatever.  I've not been able to pin that angle down specifically.  But that, combined with other factors including the post-hibernate load bug (bug 13750, see there for much more detail on my system, hardware, kernel config, compiler, etc) that was just fixed prior to rc4, meant that every time I was about ready to file a bug, it seemed to go away, only to return a bit later.

I can git bisect this if necessary, but hopefully the above is sufficient to nail it, as I have my hands full with a problematic kde4 upgrade ATM.
Comment 1 Andrew Morton 2009-07-30 00:20:03 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

(lots of cc's added)

On Wed, 29 Jul 2009 16:45:00 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13869
> 
>            Summary: Radeon framebuffer (w/o KMS) corruption at boot.
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.31-rc4-198-g7d3e91b
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Bluetooth
>         AssignedTo: drivers_bluetooth@kernel-bugs.osdl.org
>         ReportedBy: 1i5t5.duncan@cox.net
>         Regression: No
> 
> 
> I have an older rv280 Radeon 9200 SE AGP, dual CRTC, DVI + VGA (plus
> unconnected TV-Out).  To it I have connected dual 1920x1200 monitors, one
> each
> to the DVI and VGA out ports.
> 
> I run the radeon framebuffer in native 1920x1200 mode at the text console,
> and
> haven't yet enabled KMS.
> 
> I've noted that for most of the 2.6.31 cycle, thru rc4-198-g7d3e91b pulled
> just
> this morning, at boot, sometimes both monitors come up fine, sometimes the
> VGA
> connected monitor comes up fine in framebuffer, while the DVI connected
> monitor
> has the characteristic larger print stair-step scramble of unmatched hardware
> and software resolution.  Normally, they come up as clones of each other.
> 
> I strongly suspect that the changes introducing Radeon KMS (even tho I don't
> have it enabled) disrupted the hardware mode reset of the DVI CRTC, such that
> it stays in whatever mode grub or early-boot uses, before the framebuffer
> mode
> switch.  The VGA CRTC switches just fine, thus allowing me to actually see
> what
> I'm doing on it, login, do whatever, startx, etc.  I'm guessing the code now
> only checks for and resets one of the CRTCs instead of both of them, as it
> did
> before.   It's only when I startx and its mode switches kick in that the DVI
> connected monitor gets reset to normal, after which I can VT-switch back to a
> text VT, and they both come up fine.  However, before starting X, simply
> switching between text/framebuffer mode VTs doesn't unscramble the DVI
> connected one.
> 
> However, sometimes it works just fine.  I /think/ it has something to do with
> whether it's a cold startup, or a warm C-A-D based reboot, possibly with
> whever
> mode it was in before the reboot as a triggering factor.  Whatever.  I've not
> been able to pin that angle down specifically.  But that, combined with other
> factors including the post-hibernate load bug (bug 13750, see there for much
> more detail on my system, hardware, kernel config, compiler, etc) that was
> just
> fixed prior to rc4, meant that every time I was about ready to file a bug, it
> seemed to go away, only to return a bit later.
> 
> I can git bisect this if necessary, but hopefully the above is sufficient to
> nail it, as I have my hands full with a problematic kde4 upgrade ATM.
> 

I don't actually see any post-2.6.31 commit to drivers/video/aty/ which
could be attributed to KMS-related things.  Perhaps the change lay
elsewhere in the tree?

Yes, I suspect that a bisect would be useful, thanks.

I'll tentatively reassign this bugzilla report to DRI (how'd it get
assigned to bluetooth??).  I shall mark it as a regression and shall ask
Rafael to add it to his (large) list.  I assume that it's a post-2.6.30
regression.
Comment 2 Dave Airlie 2009-07-30 00:48:38 UTC
If you don't have KMS enabled the code never ever gets called.

so please bisect to find it, radeonfb has never really been useful for multi-head so I suspect it worked by luck, so it could be a timing change somewhere else that affected it.

it could also be a userspace issue, since X + radeonfb on x86 hw has always been a works by luck solution.
Comment 3 Duncan 2009-07-30 12:06:00 UTC
On Wednesday 29 July 2009 17:19:40 Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> (lots of cc's added)
>
> On Wed, 29 Jul 2009 16:45:00 GMT
>
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=13869
> >
> >            Summary: Radeon framebuffer (w/o KMS) corruption at boot.
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: 2.6.31-rc4-198-g7d3e91b
> >         ReportedBy: 1i5t5.duncan@cox.net
> >
> >
> > I have an older rv280 Radeon 9200 SE AGP, dual CRTC, DVI + VGA (plus
> > unconnected TV-Out).  To it I have connected dual 1920x1200 monitors, one
> > each to the DVI and VGA out ports.
> >
> > I run the radeon framebuffer in native 1920x1200 mode at the text
> > console, and haven't yet enabled KMS.
> >
> > I've noted that for most of the 2.6.31 cycle, thru rc4-198-g7d3e91b
> > pulled just this morning, at boot, sometimes both monitors come up fine,
> > sometimes the VGA connected monitor comes up fine in framebuffer, while
> > the DVI connected monitor has the characteristic larger print stair-step
> > scramble of unmatched hardware and software resolution.  Normally, they
> > come up as clones of each other.
> >
> > I strongly suspect that the changes introducing Radeon KMS (even tho I
> > don't have it enabled) disrupted the hardware mode reset of the DVI CRTC,
> > such that it stays in whatever mode grub or early-boot uses, before the
> > framebuffer mode switch. 
> >
> > I can git bisect this if necessary, but hopefully the above is sufficient
> > to nail it, as I have my hands full with a problematic kde4 upgrade ATM.
>
> I don't actually see any post-2.6.31 commit to drivers/video/aty/ which
> could be attributed to KMS-related things.  Perhaps the change lay
> elsewhere in the tree?
>
> Yes, I suspect that a bisect would be useful, thanks.
>
> I'll tentatively reassign this bugzilla report to DRI (how'd it get
> assigned to bluetooth??).  I shall mark it as a regression and shall ask
> Rafael to add it to his (large) list.  I assume that it's a post-2.6.30
> regression.

Bluetooth?  <setting="Family Matters" voice=Urkel>Did I do that?</voice> =:^P

I thought I chose console/framebuffers, but evidently the mouse slipped.

I don't quite understand your version references here.  2.6.31 isn't out yet, 
so no, it's not a post 2.6.31 commit.  Unless that was supposed to be 
post-2.6.30, which it is, in the 2.6.31 cycle, as I stated.  The  
2.6.31-rc4-198-g7d3e91b I listed as kernel version is what I'm currently 
running, where it isn't yet fixed.  But yes, I believe it started post-2.6.30.

Bisecting... I can do if I can get it sufficiently reproducible.  At this 
point, it happens regularly, but I don't have a solid trigger, which makes 
bisecting difficult.  (I stated that in the original bug filing but then 
edited it out, I guess, before hitting submit.)


... And adding a reply to this, which was on the bug, so it gets CCed to the 
same people:

--- Comment #2 from Dave Airlie <airlied@linux.ie>  2009-07-30 00:48:38 ---

> If you don't have KMS enabled the code never ever gets called.

Perhaps it isn't KMS, but whatever code, "never gets called" for the DVI port 
and presumably CRTC, appears to be the problem, as it doesn't get reset from 
whatever other mode to framebuffer mode, with the typical stair-step display 
that results when the horizontal scans are longer than the hardware mode.

> so please bisect to find it, radeonfb has never really been useful for
> multi-head so I suspect it worked by luck, so it could be a timing change
> somewhere else that affected it.

As I indicated above, "I can bisect" refers to the fact that I'm setup to do 
so.  However, unless I can make it more reliably reproducible, it'll take 
quite some time and I'm not sure it's practical.  False positives aren't an 
issue but false negatives would be as it's happening regularly but not 
reliably.  But since the largest domain area elimination steps are first, just 
reducing the commit domain some will hopefully help, and I can probably do 
that even if it takes me a couple days a step in the process, to verify a 
negative is really a negative and didn't just happen to work that time.

As you said, the framebuffer isn't really useful for multi-head as it's 
cloned, but since the scrambled one is my "working" monitor, it gets annoying 
really fast, because I have to crane my neck to use the other one, until I do 
the first startx and trigger the first post-framebuffer-init mode-switch, 
after which everything (including the framebuffers once again) work correctly.

And since the DVI output is sharper, I want to keep it as my working display.  
If it was reversed and the VGA display scrambled until X started, it'd be 
fine, as it's clone mode until X anyway, so I could ignore the less convenient 
to look at for primary usage VGA output until I started X and the issue 
disappeared even back at the non-X text framebuffer.

> it could also be a userspace issue, since X + radeonfb on x86 hw has always
> been a works by luck solution.

The thing is, it's happening at boot, at the initial switch from whatever grub 
uses or pre-frame-buffer, to framebuffer mode, with the mode set for the VGA 
output but not the (normally cloned except in X) DVI output.  Once I login and 
start KDE, that forces the mode-switch on the DVI, and after that initial mode 
switch, everything works fine, including VT-switch from X/VT-7 to 
framebuffer/console and back, mode switches in X, etc.  It's just the initial 
switch to text mode framebuffer that screws up.  So X should have nothing to 
do with the problem since it hasn't even run at that point.  The only 
possibility is that it only triggers when the reboot was from a particular X 
mode, which does seem like it might be happening, but since it's a reboot, the 
hardware mode should be reset when entering radeon framebuffer in any case, 
and that's not happening.
Comment 4 Rafael J. Wysocki 2009-08-10 00:17:15 UTC
On Monday 10 August 2009, Duncan wrote:
> On Sunday 09 August 2009 13:44:24 Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30.  Please verify if it still should be listed and let me know
> > (either way).
> >
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=13869
> > Subject             : Radeon framebuffer (w/o KMS) corruption at boot.
> > Submitter   : Duncan <1i5t5.duncan@cox.net>
> > Date                : 2009-07-29 16:44 (12 days old)
> 
> Yes, as of today's pull, it's still there.
Comment 5 Duncan 2009-08-26 15:33:36 UTC
> The following bug entry is on the current list of known regressions
> from 2.6.30.  Please verify if it still should be listed and let me know
> (either way).

It's still there, but...

1) As mentioned earlier, it doesn't occur every boot, so if it didn't occur since my last git pull and build, that doesn't necessarily indicate it's not there.

2) Still, I DID see it with my last pull, git show says 6b9883f, BUT...

3) As of that last pull, I'm also still seeing bug #13939 (lm_sensors fail), and as described there, I'm reverting commit 7cb7f45c7feef43c8f71f5cfed (which is itself a revert), in ordered to get working lm_sensors.

4) Again as in #1, that it doesn't occur for couple boots doesn't mean much, which is why I've not bisected, but while I was testing to see if the other bug had been fixed, thus WITHOUT the revert of #3, I didn't see it.  I DID see it on doing the revert in #3.  Thus, it is POSSIBLE that the commit in #3 fixes the scrambled framebuffer issue (this bug), while killing lm_sensors (thus the other bug).

5) Since I don't reboot every day, and it doesn't occur at every boot, I may well go a week (which might be only a boot or two) without seeing it, only to see it several times in a row after that.

I really wish it was easier to pin down...  Any suggestions on what might make it reproduceable enough to reliably bisect?  Maybe a boot log from when it works vs not, if I can manage, would help?
Comment 6 Florian Mickler 2009-08-27 19:02:22 UTC
as far as i know is the lm_sensors issue the following: the lm_sensors module could cause irregular readings and possibly corruption because it messes with the registers directly and thus should use platform-dependent accessors for the sensor-readings. see bug #13967

(just quoting out of my memory)
Comment 7 James Cloos 2009-08-27 23:42:03 UTC
I'm not sure whether I'm seeing the same bug, so just in case:

Since sometime after the v2.6.30 tag, on every boot using either
radeonfb or radeon kms, the kernel mildly screws up the display.

It used to be that grub would print its loading message, then the
kernel's printk() output would follow that, just like one might
expect from a dumb tty.

Now, the screen flashes, returns with the same contents grub left it
with, but the kernel starts its output at the top, overwriting what
was there.

This is, of course, very short lived; the screen is properly cleared
when the fb is changed to the panel's native resolution.

Box is ancient; an R100 class, 7500 M7 LW.
Comment 8 Duncan 2009-10-06 14:44:52 UTC
I'm marking this resolved/unreproducible, as I've not seen it in quite some time now.  I can't for sure say it's fixed, as it never was regular, but it does seem so.  It's also possible a stick of memory that has gone bad since then was starting to go, and that triggered it.  Whatever, I've not seen it in a long time now, so I'll call it resolved.

Thanks, everyone.

Note You need to log in before you can comment on or make changes to this bug.