Bug 198123

Summary: Console is the wrong color at boot with radeon 6670
Product: Drivers Reporter: Deposite Pirate (dpirate)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEEDINFO ---    
Severity: normal CC: alexdeucher, bill.fraser, daniel, devzero, felix.schwarz, fepaw95099, harry.wentland, koct9i, shibe, tobias.pal, ttallink
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.14.3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: test patch on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2
fix of-by-one, or what looks like one
test patch on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2, v2
ast patch to load the lut on modesets
Red console text with kernel 4.14 and ast driver
test patch for deposite pirate
dmesg output
another test patch for Deposite Pirate
dmesg log 2
radeonreg black background
radeonreg grey background
insert huge sleep into lut load path

Description Deposite Pirate 2017-12-10 16:53:23 UTC
After upgrading to kernel 4.14.3, at boot the console screen has a white background with light gray characters that are barely readable. After X starts, the problem disappears. But if you don't start X, it stays that way. Rolling back to 4.13.16 fixes the problem. Same problem with 4.14.4.

$lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] RS880 Host Bridge
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] RS780 PCI to PCI bridge (ext gfx port 0)
00:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] RS780 PCI to PCI bridge (PCIE port 2)
00:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] RS780/RS880 PCI to PCI bridge (PCIE port 3)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 41)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control
01:00.0 USB controller: Fresco Logic FL1000G USB 3.0 Host Controller (rev 01)
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks XT [Radeon HD 6670/7670]
02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series]
03:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06)

The Radeon is a 6670.

$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 6
model name : AMD Athlon(tm) II X2 240e Processor
stepping : 2
microcode : 0x10000c7
cpu MHz : 800.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save
bugs : tlb_mmatch apic_c1e fxsave_leak sysret_ss_attrs null_seg amd_e400
bogomips : 5612.22
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
Comment 1 Alex Deucher 2017-12-11 03:56:58 UTC
Can you bisect?
Comment 2 Deposite Pirate 2017-12-11 23:31:47 UTC
(In reply to Alex Deucher from comment #1)
> Can you bisect?

Hi,

I'm working on it. I haven't compiled a kernel in a few years so I had to set up all this stuff. This PC is quite slow, so it might take me a few days before I find the commit with the problem.
Comment 3 Deposite Pirate 2017-12-12 23:11:45 UTC
Ok, I found the commit:

commit 2040c47361646d18b9832fd930d2a025da002a57 (HEAD -> master)
Merge: 3154b133711f 37899a525491
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Aug 18 05:30:53 2017 +1000

    Merge branch 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux into drm-next
    
    More features for 4.14.  Nothing too major here.  I have a few more additional
    patches for large page support in vega10 among other things, but they require
    some resevation object patches from drm-misc-next, so I'll send that request
    once you've pulled the latest drm-misc-next.  Highlights:
    - Fixes for ACP audio on stoney
    - SR-IOV fixes for vega10
    - various powerplay fixes
    - lots of code clean up
    
    * 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux: (62 commits)
      drm/amdgpu/gfx7: fix function name
      drm/amd/amdgpu: Disabling Power Gating for Stoney platform
      drm/amd/amdgpu: Added a quirk for Stoney platform
      drm/amdgpu: jt_size was wrongly counted twice
      drm/amdgpu: fix missing endian-safe guard
      drm/amdgpu: ignore digest_size when loading sdma fw for raven
      drm/amdgpu: Uninitialized variable in amdgpu_ttm_backend_bind()
      drm/amd/powerplay: fix coding style in hwmgr.c
      drm/amd/powerplay: refine dmesg info under powerplay.
      drm/amdgpu: don't finish the ring if not initialized
      drm/radeon: Fix preferred typo
      drm/amdgpu: Fix preferred typo
      drm/radeon: Fix stolen typo
      drm/amdgpu: Fix stolen typo
      drm/amd/powerplay: fix coccinelle warnings in vega10_hwmgr.c
      drm/amdgpu: set gfx_v9_0_ip_funcs as static
      drm/radeon: switch to drm_*{get,put} helpers
      drm/amdgpu: switch to drm_*{get,put} helpers
      drm/amd/powerplay: add CZ profile support
      drm/amd/powerplay: fix PSI not enabled by kmd
Comment 4 Felix Schwarz 2017-12-12 23:26:00 UTC
Unfortunately a merge commit is almost never the right bisection result. Did you use git for bisecting? Can you share your bisect log? Probably you need to go on, marking commit 2040c473 as "bad" and git bisect will give you more commits to try.
Comment 5 Deposite Pirate 2017-12-12 23:33:50 UTC
I just cloned linux-stable, did git log, searched for "radeon" and built the revision before this commit, tested it and then I built the revision with this commit and tested it. Is there another linux repository I need to clone where this commit is broken into smaller pieces? I don't have bisect log.
Comment 6 Felix Schwarz 2017-12-12 23:48:51 UTC
Ideally you use Linus' git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

As you mentioned in your first post you experienced problems when upgrading from 4.13.16 -> 4.14.3 so most likely something changed in the main development period of 4.14 for you. The merge commit 2040c473 you found supports that theory.

Now you need to start actually bisecting. Just google for "git bisect". There is a lot of blog posts, tutorials etc for it. (Maybe https://stackoverflow.com/a/4714297/138526 helps you)

As a start you can mark the merge commit and its predecessor as bad/good and git will take you from there. You will need to compile quite a few kernels and test them (way more than 2, usually 10-20 kernels).

A common mistake is to mark a non-bootable or otherwise broken version as "bad". Only mark kernels as "bad" if they clearly have the problem you described in your report. If you can't test a specific version, use "git bisect skip". Use "good" if you are sure that a version is ok.

Oh, btw: I recommend testing the most recent 4.15 rc (or even git master) to check the problem is still present. If it is fixed in a the current development version probably the easiest way for you is to ride the leading edge. Of course you can bisect anyway, maybe the patch can be backported to 4.14.

Happy compiling.
Comment 7 Deposite Pirate 2017-12-13 00:59:10 UTC
Thanks for the explanation. I think I understand how this works now. I was afraid I would not be able to use the Arch PKGBUILD for the kernel together with git bisect. But it turns out I can, which makes things way simpler for me.
Comment 8 Deposite Pirate 2017-12-16 01:41:26 UTC
Ok, I went through all the git bisect process. Here are the results:

git bisect start
# bad: [a638349bf6c29433b938141f99225b160551ff48] Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
git bisect bad a638349bf6c29433b938141f99225b160551ff48
# good: [47b4a457e4cc816b3fdd2ee55c65fda8ea6de051] alarmtimer: Fix unavailable wake-up source in sysfs
git bisect good 47b4a457e4cc816b3fdd2ee55c65fda8ea6de051
# bad: [2040c47361646d18b9832fd930d2a025da002a57] Merge branch 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect bad 2040c47361646d18b9832fd930d2a025da002a57
# good: [5771a8c08880cdca3bfb4a3fc6d309d6bba20877] Linux v4.13-rc1
git bisect good 5771a8c08880cdca3bfb4a3fc6d309d6bba20877
# bad: [37899a5254917e17418bbb23086d55e38faaa659] drm/amdgpu/gfx7: fix function name
git bisect bad 37899a5254917e17418bbb23086d55e38faaa659
# good: [520eccdfe187591a51ea9ab4c1a024ae4d0f68d9] Linux 4.13-rc2
git bisect good 520eccdfe187591a51ea9ab4c1a024ae4d0f68d9
# bad: [9a61c54b9bff88e692ac7b1245546539ac5274a1] drm/rockchip: vop: group vop registers
git bisect bad 9a61c54b9bff88e692ac7b1245546539ac5274a1
# bad: [8572636e45167adbdb997fe5c43122bf77fd2291] drm/nouveau: Handle drm_atomic_helper_swap_state failure
git bisect bad 8572636e45167adbdb997fe5c43122bf77fd2291
# good: [05ccf211efbb6c8b5da2b5fda4f9399a7bc0db2e] drm: ttm: virtio-gpu: dma-buf: Constify ttm_place structures.
git bisect good 05ccf211efbb6c8b5da2b5fda4f9399a7bc0db2e
# bad: [30ea752146e147c5a1f0367aa5303929f7bfd697] drm/imx: Use atomic iterator macros
git bisect bad 30ea752146e147c5a1f0367aa5303929f7bfd697
# bad: [dd2adf743bc47ac14999bb375fed390af6524f29] drm/bridge: analogix-anx78xx: clean up drm_bridge_add call
git bisect bad dd2adf743bc47ac14999bb375fed390af6524f29
# good: [bdac4a052a47920eeae22441ab608612dc0ef4e5] drm/fb-helper: Push locking in fb_is_bound
git bisect good bdac4a052a47920eeae22441ab608612dc0ef4e5
# good: [6b7dc6e9f82615836b389cb5f806914048b132cd] drm/fb-helper: Split dpms handling into legacy and atomic paths
git bisect good 6b7dc6e9f82615836b389cb5f806914048b132cd
# bad: [a3562a0e471df02234f74ab4e0625042f44a76e9] drm/fb-helper: keep the .gamma_store updated in drm_fb_helper_setcmap
git bisect bad a3562a0e471df02234f74ab4e0625042f44a76e9
# bad: [b8e2b0199cc377617dc238f5106352c06dcd3fa2] drm/fb-helper: factor out pseudo-palette
git bisect bad b8e2b0199cc377617dc238f5106352c06dcd3fa2
# first bad commit: [b8e2b0199cc377617dc238f5106352c06dcd3fa2] drm/fb-helper: factor out pseudo-palette

b8e2b0199cc377617dc238f5106352c06dcd3fa2 is the first bad commit
commit b8e2b0199cc377617dc238f5106352c06dcd3fa2
Author: Peter Rosin <peda@axentia.se>
Date:   Tue Jul 4 12:36:57 2017 +0200

    drm/fb-helper: factor out pseudo-palette
    
    The pseudo-palette has nothing to do with the crtc, so move it
    out of the crtc loop and update the palette once, then break out
    early.
    
    Signed-off-by: Peter Rosin <peda@axenita.se>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Link: http://patchwork.freedesktop.org/patch/msgid/1499164632-5582-2-git-send-email-peda@axentia.se

:040000 040000 a8c2650554e199fee994ac63c2700c73ba2ecffe 7f72ed414efadd77ef1d718e7477475c4ba1127d M      drivers
Comment 9 Bill Fraser 2017-12-28 09:35:51 UTC
I'm seeing a very similar problem with the 'ast' driver: a black-on-black text console, with only red-colored text visible -- luckily I have a systemd unit that fails on boot with a red message, which I can see scrolling up the screen as it boots, but nothing else.

I've bisected it back to the same commit as Deposite Pirate above, [b8e2b0199cc377617dc238f5106352c06dcd3fa2].
Comment 10 Daniel Vetter 2018-01-11 10:11:14 UTC
Created attachment 273531 [details]
test patch on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2

- is the red message correct, or maybe also inverted/funny?
- on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2 please try the attached diff
Comment 11 Bill Fraser 2018-01-12 08:46:58 UTC
- The red color looks to be the right shade, but it's hard to say without any other colors to reference it against.

- With that patch, loading the 'ast' driver results in screen corruption and the kernel hard locks up, with not even any panic message on the serial console. :(
Comment 12 Daniel Vetter 2018-01-12 09:50:46 UTC
Created attachment 273553 [details]
fix of-by-one, or what looks like one

This is another theory that crossed my mind. Please test both this patch here (on latest upstream) and the previous patch on the first bad commit, I'd like to know the outcome of both experiments.
Comment 13 Daniel Vetter 2018-01-12 21:12:04 UTC
Created attachment 273575 [details]
test patch on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2, v2

Sry my brain must have been offline, first test patch on top of b8e2b was crap, and the 2nd patch is ... nonsense.

I hope this one has a bit better chances of giving us something to work with.
Comment 14 Bill Fraser 2018-01-13 06:50:45 UTC
(In reply to Daniel Vetter from comment #13)
> Created attachment 273575 [details]
> test patch on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2, v2

That patch fixes it for me!
Comment 15 Deposite Pirate 2018-01-13 13:34:59 UTC
As for me, the latest patch on top of HEAD doesn't fix the issue. But maybe I ought to have applied Michel Dänzer's first patch too?
Comment 16 Daniel Vetter 2018-01-14 18:32:19 UTC
Deposte Pirate, the latest patch I attached (attachment 273575 [details]) doesn't even apply on latest kernels, so I' not sure what exactly you tested. That latest patch should be tested on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2. Can you pls link to the exact patch you tested and the git sha1 you applied it to? Or if possible, upload your entire resulting git tree, with the patch committed, to github or some place.
Comment 17 Daniel Vetter 2018-01-14 20:40:01 UTC
Created attachment 273609 [details]
ast patch to load the lut on modesets

This only patches ast, since I'm still not clear what exactly is going on on the radeon driver. The code looks the same, but the reported test results are different.

Bill Fraser, can you pls test this?
Comment 18 Bill Fraser 2018-01-16 16:54:12 UTC
Assuming that patch is meant to be applied directly to b8e2b019 without any other patches, yep, that works for me.

I tried rebasing onto 4.15-rc8, and it did so without any merge conflicts, so I'm going to try testing that as well next, and get back to you.

Thanks for looking at this, by the way!
Comment 19 Daniel Vetter 2018-01-17 13:48:27 UTC
Yeah test result on latest -rc8 is needed, since maybe there's some other bug somewhere ...
Comment 20 Bill Fraser 2018-01-18 05:15:55 UTC
Rebased on top of -rc8 also works for me.
Comment 21 Paul Tobias 2018-01-22 08:39:07 UTC
I'm having the red console text problem with the ast driver too. It worked fine with kernel 4.12.12, the problem appeared after upgrading the kernel to 4.14.14. 

The boot process starts with proper console text colours, but right after kernel modesetting kicks in the white text changes to red. I've used the "nomodeset" kernel parameter to work around the problem for a while. I'll attach a picture which illustrates the problem.

I've bisected it to the same commit b8e2b0199cc377617dc238f5106352c06dcd3fa2 as in comment #8.

Applying the patch from comment #13 on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2 fixes it for me.

And the patch from comment #17 on top of stable 4.14.14 fixes it too (without the patch from comment #13).

Unfortunately I don't have a Radeon card, can't test with that.
Comment 22 Paul Tobias 2018-01-22 08:40:18 UTC
Created attachment 273783 [details]
Red console text with kernel 4.14 and ast driver
Comment 23 Konstantin Khlebnikov 2018-01-26 06:50:34 UTC
Patch from comment #17 helped for us too.
Comment 24 Michel Dänzer 2018-01-26 17:54:45 UTC
The radeon driver already calls load_lut when the CRTC dpms hook is called with DRM_MODE_DPMS_ON. Does the ast patch call load_lut even when DPMS is already on? If so, I wonder if it doesn't just paper over the real issue, which is that something changes the LUT but then doesn't call load_lut.
Comment 25 Deposite Pirate 2018-01-28 22:55:08 UTC
I've recompiled attachment 273575 [details] on top of b8e2b0199cc377617dc238f5106352c06dcd3fa2 and it doesn't fix the problem either. Earlier I had compiled this same patch on top of some revision of 4.15rc and it did apply cleanly.
Comment 26 Daniel Vetter 2018-01-31 10:36:01 UTC
Re #commment 24: crtc_commit is for modesets, the legacy helpers do _not_ call the DPMS functions in that case. radeon does what every reasonable legacy kms driver did and calls the dpms function from the prepare/commit hooks, but ast didn't do that. Hence why my patch fixed ast (but a similar patch for radeon doesn't make sense).
Comment 27 Daniel Vetter 2018-01-31 10:44:09 UTC
Ok I've reviewed all the drivers where Peter Rosin's patch series removed the load_lut hook:
- ast: fixed with my patch
- mga200g: already has a callchain like crtc_commit -> dpms -> load_lut, so works correctly already
- cirrus: same bug as ast, I'm typing a patch for that now
- radeon: head-scratcher, I still have no idea what's going on
- amdgpu: should work, but might have the same bug as radeon due to heritage
Comment 28 Daniel Vetter 2018-01-31 11:14:41 UTC
Created attachment 273941 [details]
test patch for deposite pirate

Should apply on any recent-ish kernel. Please apply, boot, and grab the complete dmesg (from kernel loading up to latest message), there should be plenty of backtraces all around.
Comment 29 Deposite Pirate 2018-02-01 15:39:11 UTC
Created attachment 273963 [details]
dmesg output

Here's the dmesg output with the patch applied On top of v4.15 (compiling HEAD fails)
Comment 30 Daniel Vetter 2018-02-19 08:20:16 UTC
There was not a single backtrace in that attachement. Was this only up to fbcon, or did you start X too? Where the colors correct afterwards?

It's also strange that it fails to compile on HEAD, that shouldn't happen (and would indicate some other leftovers or something).

I'll attach another test patch, that time around please make sure you boot until X (or whatever ends up fixing your colors).
Comment 31 Daniel Vetter 2018-02-19 08:35:56 UTC
Created attachment 274229 [details]
another test patch for Deposite Pirate
Comment 32 Deposite Pirate 2018-02-19 15:19:27 UTC
Hi,

Yeah, I wasn't sure because I never enabled backtraces before. But it didn't seem to me that there were many backtraces either. Though I did double check the patch was applied.

I have X automatically started by systemd/lightdm at boot but the booting process is slow enough that I can see the wrong colors while the kernel prints it's stuff for a few seconds. And yes the colors are correct after X starts when I switch back to consoles with CTRL+ALT+F?.

The failure to compile on HEAD of back then had nothing to do with the patch. I don't remember exactly where it failed to compile but it was a completely unrelated source file. It's not like regressions even with stable kernel releases happen rarely. Right now, I have another Athlon 64 system with an onboard GeForce card that won't even start X because it can't find /dev/card/dri0 with kernel 4.15.4 and another more recent AMD A8-something ThinkPad that OOPSes at boot because of something related to SATA with 4.15.2.
Comment 33 Deposite Pirate 2018-02-26 02:57:51 UTC
Created attachment 274465 [details]
dmesg log 2

Applied on top of 4.15.6.
Comment 34 Deposite Pirate 2018-02-26 02:59:31 UTC
Ok, I've had some time to apply the new patch. The tracing works and with this patch applied on top of 4.15.6 the issue has vanished for some reason.
Comment 35 shibe 2018-03-13 18:53:53 UTC
Has the cause of this issue been identified/confirmed?

On boot I have grey console background instead of black.

Linux 4.15.8-1-ARCH x86_64

1002:6758 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks XT [Radeon HD 6670/7670]
Comment 36 Michel Dänzer 2018-03-14 09:10:44 UTC
(In reply to shibe from comment #35)
> Has the cause of this issue been identified/confirmed?

Not yet, unfortunately.

> On boot I have grey console background instead of black.

Can you try the latest test patch attached here, and attach the dmesg output from running with it?
Comment 37 shibe 2018-03-14 23:48:07 UTC
(In reply to Michel Dänzer from comment #36)
> Can you try the latest test patch attached here, and attach the dmesg output
> from running with it?

With the patch, the background is always black. I guess, there is a race, and the patch affects the outcome.
Comment 38 Deposite Pirate 2018-03-15 09:20:12 UTC
That's what I think as well.
Comment 39 shibe 2018-03-15 22:18:43 UTC
I booted the stock kernel with nosmp few times. One time - gray, other times - black. I wonder if the initialization code could be preempted and the race still occurs on a single CPU. Otherwise, it may be a race with GPU. Maybe need to wait for a previous command to complete.

What to try next?
Comment 40 Michel Dänzer 2018-03-16 09:29:15 UTC
(In reply to shibe from comment #39)
> Otherwise, it may be a race with GPU.

Yeah, things are pointing to something like that.

Harry, does anything jump out in dce5_crtc_load_lut, or can you think of anything else that might affect this, possibly specific to DCE5 or Turks?
Comment 41 Alex Deucher 2018-03-16 19:20:19 UTC
Maybe something will jump out if we look at the register differences?  You can use radeonreg (https://cgit.freedesktop.org/~airlied/radeontool/) to dump the display registers. e.g., radeonreg regs dce5 > working.out.  And then diff the working and non-working cases.  At least if we can narrow down the register(s), we can trace that back to the code.
Comment 42 shibe 2018-03-17 18:57:38 UTC
Some values from register dumps. All kernel modules are the same.

Black background:

000069FC        00000002

Grey background:

000003D8        00000000

000064C8        7F007F00

000069FC        00000000

00006BBC        00000001

00006BCC        00000000

00006D54        00000001

Other values are either different between dumps of the same case or common to both cases.
Comment 43 Alex Deucher 2018-03-18 14:33:35 UTC
Please attach the full dump for each case.
Comment 44 shibe 2018-03-19 09:40:33 UTC
Created attachment 274813 [details]
radeonreg black background

Linux 4.15.10-1-ARCH SMP PREEMPT x86_64

I made 8 dumps from different boots and selected one that has the most bits in common with other dumps. The console background is black on all these boots. The kernel, modules, and the command line are the same.
Comment 45 shibe 2018-03-19 09:43:33 UTC
Created attachment 274815 [details]
radeonreg grey background

Linux 4.15.10-1-ARCH SMP PREEMPT x86_64

I made 11 dumps from different boots and selected one that has the most bits in common with other dumps. The console background is light grey on most of these boots, but there were a couple of dark grey. The kernel, modules, and the command line are the same as for the black dumps.
Comment 46 Harry Wentland 2018-03-19 13:55:28 UTC
Michel, nobody jumps out to me in dce5_crtc_load_lut but I have only cursory familiary with those registers.
Comment 47 shibe 2018-03-20 04:37:13 UTC
I analysed individual bits in register dumps. Some bits are set in all 8 dumps with black background and unset in most of dumps with grey background.

The value at 0x69F0 is 0x00401004 in all black dumps, vary in grey dumps.

The value at 0x69FC is 0x00000002 in all black dumps, 0x00000000 in all grey dumps.

Other registers, least significant bit first:

6BB0  0 + ? ? ? ? ? ?  ? ? ? 0 0 0 0 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0
6E98  ? ? + ? ? ? ? ?  ? 1 0 0 0 0 0 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0
6E9C  ? ? + ? ? ? ? ?  ? 1 0 0 0 0 0 0  ? ? ? ? ? ? ? ?  ? ? ? 0 0 0 0 0

Legend:
0 always 0 in black dumps
1 always 1 in black dumps
? vary in black dumps
+ always 1 in black dumps, usually 0 in grey dumps
Comment 48 shibe 2018-03-20 06:03:02 UTC
Turns out, I missed some interesting bits. Here, least significant bit first:

69E4  - - 0 0 - - - 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0
69E8  0 0 0 0 0 0 - -  0 0 - - - 0 0 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0
69EC  0 0 0 0 0 0 - -  - - - - - - 0 0  0 0 0 0 0 0 1 -  0 0 - - - 0 0 0
69F0  0 0 + - 0 0 - -  - 0 0 0 + - 0 0  - - - 0 0 0 + -  0 0 - - - 0 0 0
69FC  0 + 0 0 0 0 0 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0
6E68  - 0 0 - 0 - 0 0  0 - - 0 0 - 0 -  0 0 0 - - 0 0 -  0 - 0 0 0 - 0 0

Legend:
0 always 0 in black dumps
1 always 1 in black dumps
+ always 1 in black dumps, usually 0 in grey dumps
- always 0 in black dumps, usually 1 in grey dumps

I've found that 0x69?? addresses are defined as constants with the prefix "EVERGREEN_DC_LUT_". Why could these registers be messed up?
Comment 49 Daniel Vetter 2018-08-09 07:39:38 UTC
Created attachment 277775 [details]
insert huge sleep into lut load path

Maybe this helps in blowing up the race and perhaps shed some light here. Please test and report what happens.
Comment 50 Peter Gazer 2018-11-26 18:21:15 UTC
Simply adding a 10ms msleep call in radeon_crtc_load_lut completely fixes the issue for me, so yes, race condition for sure (HD 6850 card).

Let me know if there's anything else I can try to nail down the actual issue, I'd like to find a proper patch for this.
Comment 51 anonymous 2019-03-05 01:12:39 UTC
https://bugzilla.kernel.org/show_bug.cgi?id=202739 might be a fix.
Comment 52 Roland Kletzing 2023-10-01 21:32:16 UTC
i have some similar broken console colour issue with rx300 s6 since kernel 6.1.  not sure if it's related, but wanted to let you know.  

for most of the time, text is barely visible and contrast totally sucks, so i always need to use nomodeset for boot as a workaround

and - same here:  after starting X and switching back to console, the problem disappears

https://gitlab.freedesktop.org/drm/misc/-/issues/31

still busy with git bisecting