Bug 14054 - Toshiba MTRR issues making radeon driver(s) crash
Summary: Toshiba MTRR issues making radeon driver(s) crash
Status: RESOLVED OBSOLETE
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL: http://bugs.freedesktop.org/show_bug....
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-25 11:20 UTC by Mehmet Giritli
Modified: 2012-01-18 18:03 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.30
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (38.88 KB, text/plain)
2009-08-26 08:57 UTC, Mehmet Giritli
Details
/proc/mtrr when 4GB installed (351 bytes, text/plain)
2009-11-03 12:57 UTC, Mehmet Giritli
Details
/proc/mtrr when 2GB installed (213 bytes, text/plain)
2009-11-03 12:58 UTC, Mehmet Giritli
Details
output of lspci -vv (36.07 KB, text/plain)
2009-11-03 12:59 UTC, Mehmet Giritli
Details
/proc/iomem when 4GB installed (2.33 KB, text/plain)
2009-11-03 19:10 UTC, Mehmet Giritli
Details
/proc/iomem when 2GB installed (2.61 KB, text/plain)
2009-11-03 19:10 UTC, Mehmet Giritli
Details
dmesg with 4GB installed and no framebuffer (34.00 KB, text/plain)
2009-11-05 10:42 UTC, Mehmet Giritli
Details
/proc/mtrr when 4GB installed and no framebuffer (279 bytes, text/plain)
2009-11-05 10:44 UTC, Mehmet Giritli
Details
Xorg crash log when 4GB installed (20.55 KB, text/plain)
2009-11-05 11:07 UTC, Mehmet Giritli
Details
dmesg with 4GB installed and 2.6.32-rc6-git3 (31.84 KB, text/plain)
2009-11-06 11:05 UTC, Mehmet Giritli
Details
dmesg with 4GB installed, 2.6.32-rc6-git3, PCI_DEBUG (46.67 KB, text/plain)
2009-11-06 20:32 UTC, Mehmet Giritli
Details
Xserver log with radeonHD 2GB (150.69 KB, text/plain)
2009-11-07 16:16 UTC, Mehmet Giritli
Details
Xserver log with radeonHD 4GB (62.77 KB, text/plain)
2009-11-07 16:17 UTC, Mehmet Giritli
Details
Xserver log with radeon 2GB (47.65 KB, text/plain)
2009-11-07 16:19 UTC, Mehmet Giritli
Details
Xserver log with radeon 4GB (26.37 KB, text/plain)
2009-11-07 16:19 UTC, Mehmet Giritli
Details
dmesg with 4GB installed and 2.6.30-git22 (32.07 KB, text/plain)
2010-01-19 20:46 UTC, Mehmet Giritli
Details
/proc/mtrr when 4GB installed on 2.6.30-git22 (353 bytes, text/plain)
2010-01-19 20:48 UTC, Mehmet Giritli
Details

Description Mehmet Giritli 2009-08-25 11:20:26 UTC
This bug seems to occur mostly on Toshiba laptops with ATI graphic cards. For some people, CONFIG_MTRR_SANITIZER seemed to cure the issue but made no difference for others like me.

RadeonHD people seem to have problems in grasping the nature of this bug. Please have a look at the thread:

http://lists.opensuse.org/radeonhd/2009-05/msg00040.html

and also 

http://bugs.freedesktop.org/show_bug.cgi?id=20645

It seems that graphics card (ATI HD 3650) wants 4GB of VideoRAM, when only 512M is available and making radeonhd driver crash. Radeonhd people suggested that this is a bios issue that must be fixed in the kernel (see the freedesktop bug above).

Right now, the only way I can use my computer is by reducing the RAM to 2GB.
Comment 1 Andrew Morton 2009-08-25 20:06:43 UTC
Reassigning this to x86.

I guess we'd need to see the bootup logs (dmesg -s 1000000) from an affected machine.

It should be possible for you to manually configure the machine via /proc/mtrr to make X work.  Once that's done and is known to work, we perhaps could arrange for that model machine to be automatically fixed up by the kernel.
Comment 2 Mehmet Giritli 2009-08-26 08:57:02 UTC
Created attachment 22856 [details]
dmesg

dmesg when 4GB is installed and X fails to start
Comment 3 Mehmet Giritli 2009-08-26 09:04:13 UTC
I feel like I have to add the following remark: This computer has the following feature in its product specifications concerning the graphics adapter:

memory amount : 512 MB dedicated VRAM (up to 2,302 MB total available graphics memory using HyperMemory™ technology with 4 GB system memory)

memory type : GDDR2 (500 MHz) Video RAM (resp. Video RAM and system memory combined) 

I am not sure if this helps at all, but it felt like they could be related.
Comment 4 Mehmet Giritli 2009-09-05 11:54:34 UTC
Andrew, any developments on this bug? Is there anything I can do or provide you with? I'm kinda stuck with 2GBs and it really sucks...
Comment 5 Mehmet Giritli 2009-09-23 20:27:42 UTC
I tried 2.6.31 and the bug is still there...
Comment 6 Mehmet Giritli 2009-11-03 12:57:53 UTC
Created attachment 23632 [details]
/proc/mtrr when 4GB installed
Comment 7 Mehmet Giritli 2009-11-03 12:58:32 UTC
Created attachment 23633 [details]
/proc/mtrr when 2GB installed
Comment 8 Mehmet Giritli 2009-11-03 12:59:31 UTC
Created attachment 23634 [details]
output of lspci -vv
Comment 9 Daniel J Blueman 2009-11-03 15:04:35 UTC
should be mapped as write-combining:
Region 0: Memory at c0000000 (32-bit, prefetchable) [size=512M]

should be mapped as uncacheable:
Region 2: Memory at bfef0000 (32-bit, non-prefetchable) [size=64K]

This is an interesting surprise:

reg03: base=0x0ffe00000 ( 4094MB), size=  512KB, count=1: write-protect

Can you 'cat /proc/iomaps' and check which device this maps to?
Comment 10 Mehmet Giritli 2009-11-03 17:05:13 UTC
Hi Daniel,

I don't have /proc/iomaps! Under /proc I have iomem and ioports.

Would you like me to post any of those instead?
Comment 11 Daniel J Blueman 2009-11-03 17:25:27 UTC
I'm sorry - /proc/iomem is what I was looking for.
Comment 12 Mehmet Giritli 2009-11-03 19:10:12 UTC
Created attachment 23640 [details]
/proc/iomem when 4GB installed
Comment 13 Mehmet Giritli 2009-11-03 19:10:51 UTC
Created attachment 23641 [details]
/proc/iomem when 2GB installed
Comment 14 Daniel J Blueman 2009-11-03 19:25:12 UTC
My concern was that the BIOS was preventing write access to the local APIC, but this is clearly below the readonly region in the 4GB case:

fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved

So, no problem here. It's worthwhile ensuring you're on the current BIOS (even for Vista/Win7) with all defaults loaded (since this is what the vendor validated).
Comment 15 Mehmet Giritli 2009-11-03 19:51:58 UTC
Yes, I have the most recent version of the BIOS (I frequently check from the toshiba website if they put new versions). BIOS settings on this laptop are very few actually, no "interesting" settings. Despite I did not change any of the settings, I did try in the past to revert back to the factory settings with no result :-(.
Comment 16 Yinghai Lu 2009-11-05 08:18:17 UTC
(In reply to comment #2)
> Created an attachment (id=22856) [details]
> dmesg
> 
> dmesg when 4GB is installed and X fails to start

please try to disable vesafb. just remove vga= in your /boot/grub/menu.lst

it looks it create one [0xc0000000, + 32M mtrr write through, and your x driver may want to set it as WC
Comment 17 Yinghai Lu 2009-11-05 08:24:06 UTC
remove 
video=vesafb:ywrap,mtrr:4 vga=0x323 splash=silent,theme:natural_gentoo
Comment 18 Mehmet Giritli 2009-11-05 10:39:59 UTC
I did that and it made no difference at all, unfortunately. Attaching logs..
Comment 19 Mehmet Giritli 2009-11-05 10:42:37 UTC
Created attachment 23660 [details]
dmesg with 4GB installed and no framebuffer
Comment 20 Mehmet Giritli 2009-11-05 10:44:12 UTC
Created attachment 23661 [details]
/proc/mtrr when 4GB installed and no framebuffer
Comment 21 Mehmet Giritli 2009-11-05 10:58:58 UTC
I'm also attaching the log file from xorg, just in case it might give someone a clue...
Comment 22 Mehmet Giritli 2009-11-05 11:07:56 UTC
Created attachment 23662 [details]
Xorg crash log when 4GB installed
Comment 23 Yinghai Lu 2009-11-06 05:59:36 UTC
can you current Linus git tree?

also please compiled with CONFIG_PCI_DEBUG.

looks BIOS doesn't assign res to some devices, and not sure if kernel assign current resource to them

pci 0000:00:1a.7: reg 10 32bit mmio: [0xf2504800-0xf2504bff]
pci 0000:00:1b.0: reg 10 64bit mmio: [0xf2500000-0xf2503fff]
pci 0000:00:1d.7: reg 10 32bit mmio: [0xf2504c00-0xf2504fff]
pci 0000:00:1f.2: reg 24 32bit mmio: [0xf2504000-0xf25047ff]
pci 0000:00:1f.3: reg 10 64bit mmio: [0x000000-0x0000ff]
pci 0000:01:00.0: reg 10 32bit mmio: [0xc0000000-0xdfffffff]
pci 0000:01:00.0: reg 18 32bit mmio: [0xbfef0000-0xbfefffff]
pci 0000:01:00.0: reg 30 32bit mmio: [0x000000-0x01ffff]
pci 0000:01:00.1: reg 10 32bit mmio: [0xbfeec000-0xbfeeffff]
pci 0000:00:01.0: bridge 32bit mmio: [0xbfe00000-0xbfefffff]
pci 0000:00:01.0: bridge 64bit mmio pref: [0xc0000000-0xdfffffff]
pci 0000:00:1c.0: bridge 32bit mmio: [0xf4000000-0xf5ffffff]
pci 0000:00:1c.0: bridge 64bit mmio pref: [0xf0000000-0xf1ffffff]
pci 0000:0e:00.0: reg 18 64bit mmio: [0xf2010000-0xf2010fff]
pci 0000:0e:00.0: reg 20 64bit mmio: [0xf2000000-0xf200ffff]
pci 0000:0e:00.0: reg 30 32bit mmio: [0x000000-0x01ffff]
pci 0000:00:1c.2: bridge 64bit mmio pref: [0xf2000000-0xf20fffff]
pci 0000:14:00.0: reg 10 64bit mmio: [0xf2100000-0xf2101fff]
pci 0000:00:1c.3: bridge 32bit mmio: [0xf2100000-0xf21fffff]
pci 0000:20:00.0: reg 10 32bit mmio: [0xf2200000-0xf22007ff]
pci 0000:20:00.0: reg 14 32bit mmio: [0xf2201000-0xf220107f]
pci 0000:20:00.0: reg 20 32bit mmio: [0xf2200c00-0xf2200c7f]
pci 0000:20:00.0: reg 24 32bit mmio: [0xf2200800-0xf220087f]
pci 0000:20:00.1: reg 10 32bit mmio: [0xf2201400-0xf22014ff]
pci 0000:20:00.2: reg 10 32bit mmio: [0xf2201800-0xf22018ff]
pci 0000:20:00.3: reg 10 32bit mmio: [0xf2201c00-0xf2201cff]
pci 0000:20:00.4: reg 10 32bit mmio: [0xf2202000-0xf22020ff]
pci 0000:00:1c.5: bridge 32bit mmio: [0xf2200000-0xf22fffff]
Comment 24 Mehmet Giritli 2009-11-06 11:05:51 UTC
Created attachment 23678 [details]
dmesg with 4GB installed and 2.6.32-rc6-git3

I used 2.6.32-rc6-git3, but it made no difference regarding the X.

I was not able to find any such DEBUG option. I got only these DEBUG options:

CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_SLUB_DEBUG=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_X86_DEBUGCTLMSR=y
# CONFIG_X86_CPU_DEBUG is not set
# CONFIG_PM_DEBUG is not set
# CONFIG_ACPI_DEBUG is not set
# CONFIG_CPU_FREQ_DEBUG is not set
# CONFIG_PCIEASPM_DEBUG is not set
# CONFIG_CFG80211_REG_DEBUG is not set
# CONFIG_LIB80211_DEBUG is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
# CONFIG_PNP_DEBUG_MESSAGES is not set
CONFIG_FIREWIRE_OHCI_DEBUG=y
# CONFIG_IWLWIFI_DEBUG is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_SND_DEBUG is not set
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_MMC_DEBUG is not set
# CONFIG_MEMSTICK_DEBUG is not set
# CONFIG_RTC_DEBUG is not set
# CONFIG_EXT4_DEBUG is not set
# CONFIG_DEBUG_FS is not set
# CONFIG_DEBUG_KERNEL is not set
# CONFIG_SLUB_DEBUG_ON is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DMA_API_DEBUG is not set

Would you like any of those enabled instead?
Comment 25 Yinghai Lu 2009-11-06 19:16:22 UTC
CONFIG_PCI_DEBUG=y
Comment 26 Mehmet Giritli 2009-11-06 20:32:26 UTC
Created attachment 23684 [details]
dmesg with 4GB installed, 2.6.32-rc6-git3, PCI_DEBUG

Apologies for the annoyance, CONFIG_PCI_DEBUG apparently depends on CONFIG_DEBUG_KERNEL, which I had turned off. That's why there was no CONFIG_PCI_DEBUG option.

Now attachment with the dmesg with pci debug on.
Comment 27 Yinghai Lu 2009-11-06 22:09:55 UTC
it seems resource get allocated correctly.

not sure about 
vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded

also can you make sure you x server need drm?
Comment 28 Mehmet Giritli 2009-11-06 22:25:52 UTC
I'm sorry, I don't understand what you want me to do exactly? I also attached X server log file above which shows drm opened and working correctly. Yes, radeonhd driver requires drm.

This is a problem happening with specific kind of toshiba laptops, it is not a X configuration problem.

Or, I don't understand what you are asking of me...
Comment 29 Mehmet Giritli 2009-11-07 16:14:04 UTC
Hello again,

I found some time to investigate X server logs further and got the diffs between two log files (one with 2GB and the other with 4GB installed) for each of the two radeon drivers, radeon and radeonhd. Log files are attached below.

With the radeonhd driver, the interesting bits are as follows. However, it could be useful if someone viewed these two pairs of files with vimdiff


Isn't this a bit weird since the beginning and the size of the mapped IO are the same while the end are different?:
---------------------------------------------------------------------
In 2GB log file:

(II) RADEONHD(0): Mapped IO @ 0xbfef0000 to 0x7fe214600000 (size 0x00010000)

In 4GB log file: 

(II) RADEONHD(0): Mapped IO @ 0xbfef0000 to 0x7f06cb6e7000 (size 0x00010000)
---------------------------------------------------------------------



With the radeon driver and when the 4GB is installed, it detects 4GB total video ram. When 2GB is installed, only the dedicated video ram is detected:
---------------------------------------------------------------------
2GB

(II) RADEON(0): Detected total video RAM=524288K, accessible=524288K (PCI BAR=524288K)
(--) RADEON(0): Mapped VideoRAM: 524288 kByte (128 bit DDR SDRAM)

4GB

(II) RADEON(0): Detected total video RAM=4194303K, accessible=524288K (PCI BAR=524288K)
(--) RADEON(0): Mapped VideoRAM: 524288 kByte (32 bit DDR SDRAM)
-----------------------------------------------------------------------



There are address differences again and as a result in 4GB it fails to access the VGA fb:
-------------------------------------------------------------------
2GB

(II) RADEONHD(0): Mapped IO @ 0xbfef0000 to 0x7fe214600000 (size 0x00010000)
(II) RADEONHD(0): Mapped FB @ 0xc0000000 to 0x7fe1f04f2000 (size 0x20000000)
(II) RADEONHD(0): rhdVGASaveFB: VGA FB Offset 0x00000000 [0x00040000]

4GB

(II) RADEONHD(0): Mapped IO @ 0xbfef0000 to 0x7f06cb6e7000 (size 0x00010000)
(II) RADEONHD(0): Mapped FB @ 0xc0000000 to 0x7f06a75d9000 (size 0x20000000)
(WW) RADEONHD(0): rhdVGASaveFB: Unable to access the VGA framebuffer (0xFFFFFFFF)
--------------------------------------------------------------------------


There are other errors like "Query for AtomBIOS Get Panel EDID: failed" but these can be found from the log files attached in case they look useful.
Comment 30 Mehmet Giritli 2009-11-07 16:16:49 UTC
Created attachment 23691 [details]
Xserver log with radeonHD 2GB
Comment 31 Mehmet Giritli 2009-11-07 16:17:35 UTC
Created attachment 23692 [details]
Xserver log with radeonHD 4GB
Comment 32 Mehmet Giritli 2009-11-07 16:19:00 UTC
Created attachment 23693 [details]
Xserver log with radeon 2GB
Comment 33 Mehmet Giritli 2009-11-07 16:19:48 UTC
Created attachment 23694 [details]
Xserver log with radeon 4GB
Comment 34 sypky 2010-01-19 16:11:40 UTC
On unstable git versions kernel 2.6.30 above git18 - (2.6.30-git19 - 2.6.30-git22) ATI driver (fglrx) and open driver (radeon/radeonhd) work fine.
Comment 35 Mehmet Giritli 2010-01-19 20:44:52 UTC
(In reply to comment #34)
> On unstable git versions kernel 2.6.30 above git18 - (2.6.30-git19 -
> 2.6.30-git22) ATI driver (fglrx) and open driver (radeon/radeonhd) work fine.

Thanks for the heads up! I just tried it and it works (git22). I hope this helps the kernel people to figure out what is actually going on. Attaching dmesg..
Comment 36 Mehmet Giritli 2010-01-19 20:46:12 UTC
Created attachment 24641 [details]
dmesg with 4GB installed and 2.6.30-git22
Comment 37 Mehmet Giritli 2010-01-19 20:48:22 UTC
Created attachment 24642 [details]
/proc/mtrr when 4GB installed on 2.6.30-git22
Comment 38 sypky 2010-01-19 21:14:05 UTC
My kernel is compiled with mtrr options:

CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=0

bash-3.1# cat /proc/mtrr
reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x100000000 ( 4096MB), size= 1024MB, count=1: write-back
Comment 39 sypky 2010-01-23 22:26:23 UTC
I found resolution. Maybe it is not nature of the problem, but it work on new kernels. At this moment i use 2.6.31.12 with fglrx, i fixed linux-2.6.31.12/arch/x86/pci/acpi.c file : 

bash-3.1$ diff -Naur acpi.c.orig acpi.c
--- acpi.c.orig 2010-01-18 19:30:45.000000000 +0100
+++ acpi.c      2010-01-23 23:22:13.000000000 +0100
@@ -209,7 +209,7 @@
        } else {
                bus = pci_create_bus(NULL, busnum, &pci_root_ops, sd);
                if (bus) {
-                       if (pci_probe & PCI_USE__CRS)
+                       if (!(pci_probe & PCI_USE__CRS))
                                get_current_resources(device, busnum, domain,
                                                        bus);
                        bus->subordinate = pci_scan_child_bus(bus);


Tomasz Piasecki
Comment 40 sypky 2010-01-24 13:05:51 UTC
I find better options without modify kernel source - all work after add to boot options "pci=use_crs".
Comment 41 Mehmet Giritli 2010-01-25 10:23:38 UTC
Okay, that boot option works for me too up until 2.6.32 stable but I tried to use it with recent 2.6.33-rc5 and it didn't work.

So, I think you should write this to the kernel mailing list and explain your findings.
Comment 42 sypky 2010-01-25 16:10:25 UTC
Sorry, but You don't have right. I tried just now 2.6.33-rc5, it work fine with pci=use_crs option. Maybe You forget about this option or your kernel is compiled not properly. 

At leisure I report my suggestion to LKML. 

Thanks 

Tomasz Piasecki

Note You need to log in before you can comment on or make changes to this bug.