Bug 73041

Summary: radeon: not responding, "atombios stuck in loop"
Product: Drivers Reporter: Bjorn Helgaas (bjorn)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, alexdeucher, luto, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://lkml.kernel.org/r/CALCETrVG6uRysbXSbyPfKzG7uBHu9PX6SKvDbUQWG1TOfc04zQ@mail.gmail.com
Kernel Version: 3.14-rc7 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: v3.13 dmesg
v3.14-rc7 dmesg
v3.13 lspci
v3.14-rc7 lspci
v3.14-rc7 Xorg.0.log
dmesg, 3.14-rc7, NR_CPUS=12
lspci, 3.14, NR_CPUS=12, as root
Bad config
Good config

Description Bjorn Helgaas 2014-03-27 17:21:53 UTC
Andy Lutomirski reported (see URL above):

My system works on a 3.13 Fedora kernel.  It does not work on a
more-or-less identically configured 3.14-rc7+ kernel.  The symptom is
that the Plymouth password prompt flashes and them the screen goes
blank.  Hitting escape brings back the text console, and all is well
until X tries to start.  Then I get a blank screen.  killall -9 Xorg
from ssh causes these errors to be logged:


[  226.239747] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  226.239751] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD34 (len 55, WS 0, PS 0) @ 0xCD57
[  231.241492] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  231.241496] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
[  236.243111] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  236.243115] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
[  241.244625] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  241.244628] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD6C (len 62, WS 0, PS 0) @ 0xCD88

lspci -vvvxxxnn on 3.14-rc7+ says:

09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
(rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: radeon
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Comment 1 Bjorn Helgaas 2014-03-27 17:22:30 UTC
Created attachment 130811 [details]
v3.13 dmesg
Comment 2 Bjorn Helgaas 2014-03-27 17:22:52 UTC
Created attachment 130821 [details]
v3.14-rc7 dmesg
Comment 3 Bjorn Helgaas 2014-03-27 17:23:13 UTC
Created attachment 130831 [details]
v3.13 lspci
Comment 4 Bjorn Helgaas 2014-03-27 17:23:31 UTC
Created attachment 130841 [details]
v3.14-rc7 lspci
Comment 5 Bjorn Helgaas 2014-03-27 17:23:57 UTC
Created attachment 130851 [details]
v3.14-rc7 Xorg.0.log
Comment 6 Alex Deucher 2014-03-27 17:44:13 UTC
Can you bisect?
Comment 7 Andy Lutomirski 2014-03-27 20:16:32 UTC
I apologize for the bad bug report.  There is, indeed, a change in 3.14 that sort of caused this, but it's not a real regression.  Somehow make oldconfig on Fedora's 3.13 config results in NR_CPUS=8, and NR_CPUS=8 seems to break radeon.  ISTR that there was at least one issue related to PCI issues when NR_CPUs was too low -- am I hitting that?

On my current boot, I have X working, although I still had an issue with Plymouth flashing a graphical prompt and then going blank.

I can try to do some explicit tests with NR_CPUS and/or maxcpus later today.
Comment 8 Andy Lutomirski 2014-03-27 20:17:30 UTC
Created attachment 130871 [details]
dmesg, 3.14-rc7, NR_CPUS=12
Comment 9 Andy Lutomirski 2014-03-27 20:17:55 UTC
Created attachment 130881 [details]
lspci, 3.14, NR_CPUS=12, as root
Comment 10 Bjorn Helgaas 2014-03-27 20:33:29 UTC
Hm, I don't remember a PCI issue related to NR_CPUS; do you remember any more details about that?

If you can narrow it down, e.g., NR_CPUS=8 fails and NR_CPUS=12 works on the same kernel, that might give a place to start, although I still don't know where I would look unless there was some hint in dmesg.  I'm trying to avoid the hassle of you bisecting it, but I'm afraid I don't have any better ideas.

As far as the lspci output, I was just grasping at straws and comparing the v3.13 and v3.14 output because I didn't have any better ideas.
Comment 11 Andy Lutomirski 2014-03-27 21:26:27 UTC
Created attachment 130891 [details]
Bad config
Comment 12 Andy Lutomirski 2014-03-27 21:26:45 UTC
Created attachment 130901 [details]
Good config
Comment 13 Andy Lutomirski 2014-03-27 21:29:51 UTC
It's a config issue.  The differences are (- = bad, + = good):

+CONFIG_USER_NS=y
+CONFIG_X86_UV=y
-CONFIG_GART_IOMMU=y
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTPLUG_SPARSE=y
-CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
+CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
+CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
+CONFIG_ACPI_HOTPLUG_MEMORY=y
-CONFIG_ACPI_EXTLOG=m
+CONFIG_IPV6_VTI=m
-CONFIG_NF_TABLES_INET=m
-CONFIG_NFT_QUEUE=m
-CONFIG_NFT_REJECT=m
-CONFIG_NFT_REJECT_INET=m
+CONFIG_IP_SET_HASH_NETPORTNET=m
+CONFIG_IP_SET_HASH_NETNET=m
-CONFIG_NFT_REJECT_IPV4=m
-CONFIG_NFT_REJECT_IPV6=m
-CONFIG_NET_SCH_HHF=m
-CONFIG_NET_SCH_PIE=m
+CONFIG_NFC_DIGITAL=m
+CONFIG_NFC_PORT100=m
+CONFIG_BLK_DEV_NULL_BLK=m
+CONFIG_BLK_DEV_SKD=m
-CONFIG_VIRTIO_BLK=m
+CONFIG_VIRTIO_BLK=y
+CONFIG_SGI_XP=m
+CONFIG_SGI_GRU=m
+CONFIG_INTEL_MIC_HOST=m
+CONFIG_INTEL_MIC_CARD=m
-CONFIG_VIRTIO_NET=m
+CONFIG_VIRTIO_NET=y
+CONFIG_USB_NET_HUAWEI_CDC_NCM=m
-CONFIG_USB_NET_SR9800=m
+CONFIG_WCN36XX=m
+CONFIG_TOUCHSCREEN_ZFORCE=m
+CONFIG_UV_MMTIMER=m
-CONFIG_TCG_TIS_I2C_ATMEL=m
-CONFIG_TCG_TIS_I2C_NUVOTON=m
+CONFIG_DRM_BOCHS=m
+CONFIG_SND_DICE=m
+CONFIG_SONY_FF=y
-CONFIG_VIRT_DRIVERS=y
-CONFIG_VIRTIO_BALLOON=m
-CONFIG_VIRTIO_MMIO=m
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_CHROME_PLATFORMS=y
+CONFIG_CHROMEOS_LAPTOP=m
-CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
+CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x0
+CONFIG_EARLY_PRINTK_EFI=y

I'm running ea1cd65a648bd98ff9d028a647462d28313aadfd.  Does anything stand out?  If not, I can try to narrow it down.  CONFIG_EARLY_PRINTK_EFI and CONFIG_GART_IOMMU sounds like the more relevant.
Comment 14 Bjorn Helgaas 2017-03-02 22:51:26 UTC
I'm closing this as obsolete.  If it still happens, please reopen with any additional information you have.