Created attachment 26441 [details] dmesg output When booting with 2.6.34, the initialisation of the radeon hd3470 GPU of my laptop fail with this trace (I'll attach the full dmesg output): [ 19.332986] WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x162/0x2fc() [ 19.332988] Hardware name: M51SE [ 19.332989] Modules linked in: radeon(+) ttm drm_kms_helper drm i2c_algo_bit ipv6 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep tpm_infineon joydev snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi btusb snd_rawmidi arc4 snd_seq_midi_event uvcvideo bluetooth snd_seq ecb videodev v4l1_compat v4l2_compat_ioctl32 iwlagn i2c_core video rtc_cmos snd_timer psmouse tpm_tis iwlcore snd_seq_device mac80211 snd cfg80211 asus_laptop output tpm rtc_core soundcore evdev sparse_keymap rtc_lib tpm_bios ac battery rfkill pcspkr button serio_raw processor snd_page_alloc ext4 mbcache jbd2 crc16 dm_mod ide_cd_mod cdrom sd_mod ata_generic ata_piix ide_pci_generic sdhci_pci sdhci uhci_hcd mmc_core ahci thermal piix atl1 ohci1394 led_class ieee1394 libata thermal_sys scsi_mod mii ide_core ehci_hcd intel_agp [last unloaded: scsi_wait_scan] [ 19.333057] Pid: 1481, comm: modprobe Not tainted 2.6.34 #1 [ 19.333058] Call Trace: [ 19.333066] [<ffffffff81030f1b>] ? warn_slowpath_common+0x76/0x8c [ 19.333068] [<ffffffff8101f1fe>] ? __ioremap_caller+0x162/0x2fc [ 19.333119] [<ffffffffa0450514>] ? igp_read_bios_from_vram+0x2c/0x85 [radeon] [ 19.333140] [<ffffffffa045060c>] ? radeon_get_bios+0x16/0x16f2 [radeon] [ 19.333159] [<ffffffffa0472b29>] ? r600_init+0x4d/0x2c8 [radeon] [ 19.333175] [<ffffffffa042f046>] ? radeon_device_init+0x2c0/0x33b [radeon] [ 19.333191] [<ffffffffa042fdf6>] ? radeon_driver_load_kms+0xb2/0x134 [radeon] [ 19.333203] [<ffffffffa03e966b>] ? drm_get_dev+0x341/0x44e [drm] [ 19.333209] [<ffffffff81155889>] ? local_pci_probe+0x12/0x16 [ 19.333212] [<ffffffff811564e3>] ? pci_device_probe+0xbf/0xec [ 19.333218] [<ffffffff811c06db>] ? driver_sysfs_add+0x42/0x69 [ 19.333218] [<ffffffff811c07fd>] ? driver_probe_device+0x8e/0x10e [ 19.333221] [<ffffffff811c08cc>] ? __driver_attach+0x4f/0x6f [ 19.333223] [<ffffffff811c087d>] ? __driver_attach+0x0/0x6f [ 19.333228] [<ffffffff811c00f3>] ? bus_for_each_dev+0x44/0x78 [ 19.333231] [<ffffffff811bfad8>] ? bus_add_driver+0xaf/0x1f7 [ 19.333233] [<ffffffff811c0b61>] ? driver_register+0x90/0xf8 [ 19.333236] [<ffffffff8115672b>] ? __pci_register_driver+0x4e/0xc0 [ 19.333249] [<ffffffffa04bb000>] ? radeon_init+0x0/0xc0 [radeon] [ 19.333261] [<ffffffffa04bb000>] ? radeon_init+0x0/0xc0 [radeon] [ 19.333264] [<ffffffff810001e0>] ? do_one_initcall+0x4f/0x13e [ 19.333269] [<ffffffff81057905>] ? sys_init_module+0xc6/0x221 [ 19.333272] [<ffffffff810028ab>] ? system_call_fastpath+0x16/0x1b Please, note that this laptop may have a buggy hardware. In 2008, I reported bug #11103 (that have been solved since): I couldn't use my laptop video card with two 2GiB memory modules. Here again, with one module it works fine, with two modules it fails. Note, that there is no problem with kernel 2.6.33.4 (hence I marked it as a regression). I can test patches or do a git bisect search if you think it could be useful. Sincerely, Yannick PS: Yinghai, I CCed you as you solved the problem last time.
yinghai, is this a DRM bug in the radeon driver? Thanks.
Reply-To: yinghai.lu@oracle.com looks like kernel problem. [ 0.203628] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 0.204028] ACPI: PCI Root Bridge [PCI0] (0000:00) [ 0.204058] pci_root PNP0A08:00: host bridge window [io 0x0000-0x0cf7] [ 0.204058] pci_root PNP0A08:00: host bridge window [io 0x0d00-0xffff] [ 0.204058] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff] [ 0.204058] pci_root PNP0A08:00: host bridge window [mem 0x000d0000-0x000dffff] [ 0.204058] pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff] ... [ 0.207053] pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref] [ 0.207058] pci 0000:01:00.0: reg 14: [io 0x9000-0x90ff] [ 0.207063] pci 0000:01:00.0: reg 18: [mem 0xfddf0000-0xfddfffff] [ 0.207078] pci 0000:01:00.0: reg 30: [mem 0xfddc0000-0xfdddffff pref] [ 0.207095] pci 0000:01:00.0: supports D1 D2 [ 0.207104] pci 0000:00:01.0: PCI bridge to [bus 01-01] [ 0.207107] pci 0000:00:01.0: bridge window [io 0x7000-0x9fff] [ 0.207109] pci 0000:00:01.0: bridge window [mem 0xfdd00000-0xfddfffff] [ 0.207113] pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] [ 0.230095] pci 0000:00:01.0: no compatible bridge window for [mem 0xbdf00000-0xddefffff 64bit pref] [ 0.230099] pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref] ... [ 0.240916] pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000) ... [ 0.240922] pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000) ... [ 19.141094] [drm] Initialized drm 1.1.0 20060810 [ 19.330812] [drm] radeon kernel modesetting enabled. [ 19.330883] radeon 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 19.330888] radeon 0000:01:00.0: setting latency timer to 64 [ 19.332730] [drm] initializing kernel modesetting (RV620 0x1002:0x95C4). [ 19.332945] [drm] register mmio base: 0xFDDF0000 [ 19.332946] [drm] register mmio size: 65536 [ 19.332975] ------------[ cut here ]------------ [ ... please try to boot with pci=nocrs Thanks
I assume this is the Asus M51Se laptop, T8300, ATI Radeon HD3470 environment mentioned in https://bugzilla.kernel.org/show_bug.cgi?id=11103 I'm pretty sure this will work with "pci=nocrs", but that doesn't really tell us anything about how we should fix this. The BIOS programmed the 00:01.0 bridge to a range that starts before the upstream host bridge window. We tried to reassign the 00:01.0 window, but there wasn't enough space for its original size (0x20000000), so we just disabled it: pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff] pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] pci 0000:00:01.0: no compatible bridge window for [mem 0xbdf00000-0xddefffff 64bit pref] pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000) pci 0000:00:01.0: bridge window [mem pref disabled] That took away the window leading to the Radeon BAR, so we disabled that, too, even though it was within the host bridge window: pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref] pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref] pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000) I can imagine doing something like adjusting the host bridge window if we find a P2P bridge window that doesn't fit, or trying to reallocate the P2P bridge window based on what we actually need (we only need 0x10000000, and it would have worked to allocate that). But I would *really* like to know how Windows handles this. If we can figure that out, we can make Linux work the same way, which is much more likely to work across many machines. Yannick, is there any way you can boot Windows on this config and collect the memory map using the "System Information", a.k.a. msinfo32.exe, tool? See https://bugzilla.kernel.org/attachment.cgi?id=26066 for an example of what I'm looking for.
Hi Bjorn, thank for considering the problem (thanks to Andrew and Yinghai too). > I assume this is the Asus M51Se laptop, T8300, ATI Radeon HD3470 Yes, it's the same machine. > I'm pretty sure this will work with "pci=nocrs", but that doesn't > really tell us anything about how we should fix this. Alas, no. I attach the dmesg output of booting with "pci=nocrs". > Yannick, is there any way you can boot Windows on this config and > collect the memory map using the "System Information" For once my Windows 7RC install is useful. ;-) I also attach the memory and display information from msinfo32.exe. I'm sorry, it's in French. Note, that when I faced bug #11103 I noticed that FreeBSD worked with 4GiB memory (with Vesa driver). If you need, I can do tests with it (preferably with Freesbie livecd as I'm not at easy with BSD slice partitioning). Yannick
Created attachment 26445 [details] dmesg output booting with "pci=nocrs"
Created attachment 26446 [details] Memory and display info form msinfo32.txt
Reply-To: yinghai.lu@oracle.com [ 0.230216] pci 0000:00:01.0: address space collision: [mem 0xbdf00000-0xddefffff 64bit pref] conflicts with System RAM [mem 0x00100000-0xbff9ffff] [ 0.230216] pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref] the BIOS looks crazy... memory is overlapped with mmio
Reply-To: yinghai.lu@oracle.com [ 0.207901] pci 0000:00:1c.4: PCI bridge to [bus 06-07] [ 0.207905] pci 0000:00:1c.4: bridge window [io 0xb000-0xbfff] [ 0.207908] pci 0000:00:1c.4: bridge window [mem 0xfe100000-0xfe8fffff] [ 0.207913] pci 0000:00:1c.4: bridge window [mem 0xddf00000-0xdfefffff 64bit pref] looks like you can clear bridge of 00:1c.4, and then kexec current kernel 00:01.0 will get resource it needed. and 00:1c.4 could be pushed to 0xf0000000
Handled-By : Yinghai Lu <yinghai.lu@oracle.com>
I think these: 0xC0000000-0xFFFFFFFF Bus PCI OK 0xC0000000-0xFFFFFFFF Port racine PCI Express Mobile Intel(R) PM965/GM965/GL960/GS965 Express - 2A01 OK correspond to these Linux messages: pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff] pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] So my preliminary theory is that Windows changed that 00:01.0 P2P bridge window to conform to the upstream host bridge window. But I wonder if you could find that bridge in the Device Manager (look under "System Devices" for a Root Port at PCI bus 0, device 1, function 0), and write down or take a screenshot of the "Resources" tab? I want to make sure we're comparing the same device.
Hi Bjorn, I couldn't find a Root Port at PCI bus 0, device 1, function 0 in the device manager of Windows 7. Nevertheless, displaying the devices "by attachment" makes the radeon card appear as attached to Port racine PCI Express Mobile Intel(R) PM965/GM965/GL960/GS965 Express - 2A01 Its resources are : Plage mémoire: 00000000FDD00000 - 00000000FDDFFFFF Plage mémoire: 00000000C0000000 - 00000000DFFFFFFF Plage d'E/S: 7000 - 9FFF IRQ : 0xFFFFFFFE(-2) Plage mémoire: 00000000000A0000 - 00000000000BFFFF Plage d'E/S: 03B0 - 03BB Plage d'E/S: 03C0 - 03DF Sincerely, Yannick
I don't think this problem is related to _CRS. The system used to work without _CRS. 2.6.34 automatically turns on "pci=use_crs", but the system fails the same way when booted with "pci=nocrs". I think we should concentrate on getting things to work again with "pci=nocrs", and then we can worry about whether enabling _CRS makes any difference. The problem is the 00:01.0 bridge prefetchable memory aperture that overlaps system memory. In bug 11103, with kernel 2.6.30-rc1, the dmesg in attachment 20984 [details] shows that we reduced the size of that bridge aperture and reassigned it so it no longer overlaps system memory: Linux version 2.6.30-rc1-tip (yannick@tardis) ... BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable) pci 0000:01:00.0: reg 10 32bit mmio: [0xc0000000-0xcfffffff] pci 0000:00:01.0: bridge 64bit mmio pref: [0xbdf00000-0xddefffff] (size 0x20000000) pci 0000:00:01.0: BAR 9: can't allocate resource pci 0000:01:00.0: BAR 0: can't allocate resource pci 0000:01:00.0: BAR 0: got res [0xc0000000-0xcfffffff] bus [0xc0000000-0xcfffffff] flags 0x21208 pci 0000:01:00.0: BAR 0: moved to bus [0xc0000000-0xcfffffff] flags 0x21208 pci 0000:00:01.0: PREFETCH window: 0xc0000000-0xcfffffff (size 0x10000000) In 2.6.34 (attachment 26445 [details]), we start with the same overlap, but for some reason, we can't reassign the 00:01.0 bridge aperture: Linux version 2.6.34 (yannick@tardis) ... BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable) pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref] pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] pci 0000:00:01.0: address space collision: [mem 0xbdf00000-0xddefffff 64bit pref] conflicts with System RAM [mem 0x00100000-0xbff9ffff] pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref] pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000) pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000) pci 0000:00:01.0: bridge window [mem pref disabled] I looked through the drivers/pci changes between 2.6.33 and .34, and this one: d65245c PCI: don't shrink bridge resources sounds like a possibility. In 2.6.30, we reduced the size of the aperture from 0x20000000 to 0x10000000. If d65245c prevents us from reducing the size, the allocation will fail. Yannick, can you try checking out cd81e1ea1a4c (the parent of d65245c) and d65245c itself, and booting them to see whether that's what introduced the problem? If d65245c isn't it, I'm afraid the quickest way forward will be to bisect.
Reply-To: yinghai.lu@oracle.com On 06/02/2010 02:36 PM, Bjorn Helgaas wrote: > I don't think this problem is related to _CRS. The system used to > work without _CRS. 2.6.34 automatically turns on "pci=use_crs", but > the system fails the same way when booted with "pci=nocrs". I think > we should concentrate on getting things to work again with "pci=nocrs", > and then we can worry about whether enabling _CRS makes any difference. > > The problem is the 00:01.0 bridge prefetchable memory aperture that > overlaps system memory. > > In bug 11103, with kernel 2.6.30-rc1, the dmesg in attachment 20984 [details] > shows that we reduced the size of that bridge aperture and reassigned > it so it no longer overlaps system memory: > > Linux version 2.6.30-rc1-tip (yannick@tardis) ... > BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable) > pci 0000:01:00.0: reg 10 32bit mmio: [0xc0000000-0xcfffffff] > pci 0000:00:01.0: bridge 64bit mmio pref: [0xbdf00000-0xddefffff] (size > 0x20000000) > pci 0000:00:01.0: BAR 9: can't allocate resource > pci 0000:01:00.0: BAR 0: can't allocate resource > pci 0000:01:00.0: BAR 0: got res [0xc0000000-0xcfffffff] bus > [0xc0000000-0xcfffffff] flags 0x21208 > pci 0000:01:00.0: BAR 0: moved to bus [0xc0000000-0xcfffffff] flags 0x21208 > pci 0000:00:01.0: PREFETCH window: 0xc0000000-0xcfffffff (size > 0x10000000) > > In 2.6.34 (attachment 26445 [details]), we start with the same overlap, but for > some reason, we can't reassign the 00:01.0 bridge aperture: > > Linux version 2.6.34 (yannick@tardis) ... > BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable) > pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref] > pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] > pci 0000:00:01.0: address space collision: [mem 0xbdf00000-0xddefffff 64bit > pref] conflicts with System RAM [mem 0x00100000-0xbff9ffff] > pci 0000:01:00.0: no compatible bridge window for [mem > 0xc0000000-0xcfffffff pref] > pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000) > pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000) > pci 0000:00:01.0: bridge window [mem pref disabled] > > I looked through the drivers/pci changes between 2.6.33 and .34, and > this one: > > d65245c PCI: don't shrink bridge resources > > sounds like a possibility. In 2.6.30, we reduced the size of the > aperture from 0x20000000 to 0x10000000. If d65245c prevents us from > reducing the size, the allocation will fail. your analyzing is right. please check if following patch is fixing the problem. [PATCH] x86, pci: clear bridge resource size if BIOS assign bad one make sure We can reject wrong size from BIOS. Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/pci/i386.c | 1 + 1 file changed, 1 insertion(+) Index: linux-2.6/arch/x86/pci/i386.c =================================================================== --- linux-2.6.orig/arch/x86/pci/i386.c +++ linux-2.6/arch/x86/pci/i386.c @@ -136,6 +136,7 @@ static void __init pcibios_allocate_bus_ * child resource allocations in this * range. */ + r->start = r->end = 0; r->flags = 0; } }
Hi everybody! Le Thursday 03 June 2010 04:09:14 Yinghai Lu, vous avez écrit : > please check if following patch is fixing the problem. Yes, Yinghai, your patch solves the problem. Kudos! Bjorn, do you want me to test the git versions you mentioned earlier to clearly delimit the problem or this working patch is enough to make it clear? Yannick
Yannick, would you mind attaching your dmesg log when using Yinghai's patch? I'd like to understand how that fix works, and maybe there's a clue in the log. Yinghai, if we use your patch, the changelog needs to include the URL of this bug report. Your patch affects x86, but there are several similar uses of pci_claim_resource() in other architectures. I think most of this code is actually generic and should not be architecture-specific, but until that is cleaned up, we should at least audit the other uses to see whether they need the same fix.
Created attachment 26636 [details] dmesg log with Yinghai's patch applied. Here is the log, Bjorn.
Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com> Handled-By : Yinghai Lu <yinghai@kernel.org> Patch : https://patchwork.kernel.org/patch/104169/
Fixed by commit 837c4ef13c44296bb763a0ca0e84a076592474cf .