Bug 215001

Summary: Regression in 5.15, Firmware-initialized graphics console selects FB_VGA16, screen corruption
Product: Drivers Reporter: Kris Karas (bugs-a21)
Component: Video(Other)Assignee: drivers_video-other
Status: NEW ---    
Severity: normal CC: darktemplar, dri-devel, javier, mikes, regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.15 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: [PATCH] video: vga16fb: Only probe for EGA and VGA 16 color graphic cards
[PATCH] video: vga16fb: Only probe for EGA and VGA 16 color graphic cards

Description Kris Karas 2021-11-13 06:20:39 UTC
A regression in kernel 5.15 causes FB_VGA16 (vga16fb) to fail to detect that it has been passed a firmware-initialized graphics bitmap instead of a character-mapped 80x25 display.  It takes ownership of the console, instead of passing control to FB_EFI, FB_VESA, FB_SIMPLE and so on.  This results in writing ASCII bytes into the RGB bitmap, with random bits appearing in the first few scanlines of the screen.  (The remainder of the screen is untouched, e.g. Grub's window saying it is loading the Linux kernel.)

Once udevd loads the appropriate modesetting driver (in my case, amdgpu), the graphics screen is (re)initialized properly and becomes usable.

Kernel config options CONFIG_SYSFB_SIMPLEFB, CONFIG_FB_SIMPLE, CONFIG_FB_EFI, and CONFIG_FB_VESA had no effect when toggled on and off.

Workaround: Disabling CONFIG_FB_VGA16 blocks vga16fb from grabbing the console, allowing the EFI framebuffer to properly take ownership of the console.

This bug is a duplicate of #214603.  Credit goes to that reporter for disabling FB_VGA16 as a workaround.  I would have updated that report rather than file a duplicate, except that #214603 does not have its metadata (product, component, version, regression, or even its summary fields) set correctly.  Hopefully this new report will be seen by the correct maintainers.
Comment 1 Artem S. Tashkinov 2021-11-16 09:53:59 UTC
*** Bug 214603 has been marked as a duplicate of this bug. ***
Comment 2 Artem S. Tashkinov 2021-11-16 09:57:36 UTC
CC'ing the relevant mailing list.

> A regression in kernel 5.15 causes FB_VGA16 (vga16fb) to fail to detect that
> it has been passed a firmware-initialized graphics bitmap instead of a
> character-mapped 80x25 display.  It takes ownership of the console, instead
> of passing control to FB_EFI, FB_VESA, FB_SIMPLE and so on.  This results in
> writing ASCII bytes into the RGB bitmap, with random bits appearing in the
> first few scanlines of the screen.  (The remainder of the screen is
> untouched, e.g. Grub's window saying it is loading the Linux kernel.)
> 
> Once udevd loads the appropriate modesetting driver (in my case, amdgpu), the
> graphics screen is (re)initialized properly and becomes usable.
> 
> Kernel config options CONFIG_SYSFB_SIMPLEFB, CONFIG_FB_SIMPLE, CONFIG_FB_EFI,
> and CONFIG_FB_VESA had no effect when toggled on and off.
> 
> Workaround: Disabling CONFIG_FB_VGA16 blocks vga16fb from grabbing the
> console, allowing the EFI framebuffer to properly take ownership of the
> console.
> 
> This bug is a duplicate of #214603.  Credit goes to that reporter for
> disabling FB_VGA16 as a workaround.  I would have updated that report rather
> than file a duplicate, except that #214603 does not have its metadata
> (product, component, version, regression, or even its summary fields) set
> correctly.  Hopefully this new report will be seen by the correct
> maintainers.
Comment 3 Kris Karas 2022-01-06 02:39:05 UTC
Added Javier Martinez to the CC list, as he's the patch author.

I just bisected this bug.  "First bad commit" is
d391c58271072d0b0fad93c82018d495b2633448

Author: Javier Martinez Canillas <javierm@redhat.com>
Date:   Fri Jun 25 15:09:46 2021 +0200

    drivers/firmware: move x86 Generic System Framebuffers support
    
    The x86 architecture has generic support to register a system framebuffer
    platform device. It either registers a "simple-framebuffer" if the config
    option CONFIG_X86_SYSFB is enabled, or a legacy VGA/VBE/EFI FB device.
    
    But the code is generic enough to be reused by other architectures and can
    be moved out of the arch/x86 directory.
    
    This will allow to also support the simple{fb,drm} drivers on non-x86 EFI
    platforms, such as aarch64 where these drivers are only supported with DT.
    
    Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
    Acked-by: Borislav Petkov <bp@suse.de>
    Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210625130947.1803678-2-javierm@redhat.com
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-01-06 15:16:45 UTC
(In reply to Kris Karas from comment #3)

> I just bisected this bug.  "First bad commit" is
> d391c58271072d0b0fad93c82018d495b2633448

Just to be sure: did you recheck if the issue still occurs with 5.16-rc or the latest 5.15.y release? There was another regression caused by this commit fixed a few weeks ago:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=master&id=fb561bf9abde49f7e00fdbf9ed2ccf2d86cac8ee
Comment 5 Javier Martinez Canillas 2022-01-07 02:16:50 UTC
Created attachment 300235 [details]
[PATCH] video: vga16fb: Only probe for EGA and VGA 16 color graphic cards

Can you try the attached (untested) patch?

The commit you mention in Comment 3 didn't change any logic but just moved the platform device registration that match the vesafb and efifb drivers.

As far as I can tell, the problem is that the vga16fb driver doesn't check if the video modes are one that's supported by it and just blindly attempts to probe the driver.

This probably wasn't a problem before because the sysfb.o was in arch/x86/kernel but no is in drivers/firmware. But the issue in vga16fb was present before, just moving exposed it.

I believe that the correct thing is to check if screen_info says that's a {E,V}GA 16 color mode and fail to probe the vga16fb driver otherwise.
Comment 6 Javier Martinez Canillas 2022-01-07 02:24:38 UTC
Created attachment 300236 [details]
[PATCH] video: vga16fb: Only probe for EGA and VGA 16 color graphic cards
Comment 7 Kris Karas 2022-01-07 07:52:25 UTC
Hi Javier - I tested the (updated) patch from comment 6 on three different systems, two servers with a character-graphic BIOS (expected to use VGA16), and my development system with a graphical UEFI boot (expected to use EFIFB).  I am happy to report that in all cases, the patch worked perfectly.

Thanks for having whipped up this patch!

Feel free to submit upstream to Linus, and also to Greg for inclusion in 5.15.y.
Tested-By: Kris Karas <bugs-a21@moonlit-rail.com>



In reply to Thorsten in comment 4, I did indeed test against 5.15.13 and 5.16-rc8 without success.  This makes me curious...

... I filed this bug nearly 2 months ago with the "Regression = Y" metadata clearly set; but the various kernel regression trackers haven't picked up on that fact until just days ago.  Are automated regression trackers only looking at messages on LKML?  What's the best way to get a bugzilla report for a regression tracked so that it doesn't get forgotten or lost in the shuffle?  For example, bug 199533 was filed back in 2018, with a proper bisect even, but hasn't garnered even so much as a single comment.  (Probably ought to be closed as I no longer have the hardware to test a fix :-)

Kris
Comment 8 Javier Martinez Canillas 2022-01-07 11:13:41 UTC
Hello Kris,

(In reply to Kris Karas from comment #7)
> Hi Javier - I tested the (updated) patch from comment 6 on three different
> systems, two servers with a character-graphic BIOS (expected to use VGA16),
> and my development system with a graphical UEFI boot (expected to use
> EFIFB).  I am happy to report that in all cases, the patch worked perfectly.
> 
> Thanks for having whipped up this patch!
> 
> Feel free to submit upstream to Linus, and also to Greg for inclusion in
> 5.15.y.
> Tested-By: Kris Karas <bugs-a21@moonlit-rail.com>
> 

Thanks for testing the patch!

I've just posted it along with another fix for a bug I noticed by reading the code (EGA is never used and the driver always assume to be VGA):

https://lkml.org/lkml/2022/1/7/270
Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-01-07 11:31:49 UTC
(In reply to Kris Karas from comment #7)
> 
> ... I filed this bug nearly 2 months ago with the "Regression = Y" metadata
> clearly set; but the various kernel regression trackers

Which trackers? Humans or software? I'm running a bot, but that is still in the early stages -- and has no bugzilla integration as of now (but it's on the todo list, but only in the middle), as the kernel's docs discourage from using it most of the time. See https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html (written by yours truly). See also the last section of that document.

> What's the best way to get a bugzilla report
> for a regression tracked so that it doesn't get forgotten or lost in the
> shuffle?

Docs on that are in the work:
https://lore.kernel.org/linux-doc/cover.1641203216.git.linux@leemhuis.info/

>  For example, bug 199533 was filed back in 2018, with a proper
> bisect even, but hasn't garnered even so much as a single comment. 
> (Probably ought to be closed as I no longer have the hardware to test a fix
> :-)

There are many things wrt kernel bug/regression reporting that IMHO need to be fixed or improved. I'm working on this, but I'm mostly doing it in my spare time (except for the regzbot, for which I found external funding).
Comment 10 Kris Karas 2022-01-07 23:12:41 UTC
Hi Thorsten -

Thanks for the helpful pointer on bug reporting (comment 9).  I've been so used to dealing with LKML and Bugzilla (since 1993) that I'd rather forgotten about Lore.  :-)

I didn't know whether the regression trackers were a real "bot" sort of thing, or just folks pitching in to try to keep track by hand.  Good to know that it's still human-powered.

I've got another bug to file now, multiple flavors of OOPSen showing up on boot in _kmalloc due to creation/manipulation of network devices (bridges, veth), perhaps as a result of the "igb" patches in 5.15 as it seems hardware-specific.  I'll try to cc those lists when I have it bisected or narrowed down a bit...

Kris
Comment 11 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-01-08 11:16:04 UTC
(In reply to Kris Karas from comment #10)

> I didn't know whether the regression trackers were a real "bot" sort of
> thing,

Well, I'm working towards a bot that does the heavy lifting ;-)

> or just folks pitching in to try to keep track by hand.  Good to know
> that it's still human-powered.

"Still"? there was none most of the time in the past ten years, expcept when I did it in 2017 and since I started again end of October.

> I've got another bug to file now, multiple flavors of OOPSen showing up on
> boot in _kmalloc due to creation/manipulation of network devices (bridges,
> veth), perhaps as a result of the "igb" patches in 5.15 as it seems
> hardware-specific.

Be sure to try the latest version (and give mainline a try, too), there was a regression that was fixed ten days ago. Good luck!.
Comment 12 Javier Martinez Canillas 2022-01-11 09:50:45 UTC
Hello Kris,

Could you please also test the patches in v2? 

https://lore.kernel.org/dri-devel/20220110095625.278836-1-javierm@redhat.com/

There shouldn't be any changes with respect to v1, but just wanted to be safe before pushing to drm-misc since I can't test them.

Thanks a lot and best regards,
Javier
Comment 13 Kris Karas 2022-01-12 02:14:47 UTC
Hi Javier, et al,

I have just tested version two of the patch (from email, I don't see it listed in the attachments), on the original two BIOS/VGAC servers, one new UEFI server, and my original UEFI desktop.  Once again, I'm happy to report flawless operation on all four.

Tested-By: Kris Karas <bugs-a21@moonlit-rail.com>

Thanks again, Javier!  I hope this one also makes Geert happy, too.

(I'd still be happier if non-X86 would be patched to use orig_video_isVGA as an integer; but for expediency, this seems fine.)

Kris