Bug 16009

Summary: ioremap error with radeon (KMS)
Product: Drivers Reporter: Yannick Roehlly (yannick.roehlly)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: CLOSED CODE_FIX    
Severity: normal CC: maciej.rutecki, rjw, yinghai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 15310    
Attachments: dmesg output
dmesg output booting with "pci=nocrs"
Memory and display info form msinfo32.txt
dmesg log with Yinghai's patch applied.

Description Yannick Roehlly 2010-05-19 21:09:08 UTC
Created attachment 26441 [details]
dmesg output

When booting with 2.6.34, the initialisation of the radeon hd3470 GPU of my laptop fail with this trace (I'll attach the full dmesg output):

[   19.332986] WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x162/0x2fc()
[   19.332988] Hardware name: M51SE               
[   19.332989] Modules linked in: radeon(+) ttm drm_kms_helper drm i2c_algo_bit ipv6 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep tpm_infineon joydev snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi btusb snd_rawmidi arc4 snd_seq_midi_event uvcvideo bluetooth snd_seq ecb videodev v4l1_compat v4l2_compat_ioctl32 iwlagn i2c_core video rtc_cmos snd_timer psmouse tpm_tis iwlcore snd_seq_device mac80211 snd cfg80211 asus_laptop output tpm rtc_core soundcore evdev sparse_keymap rtc_lib tpm_bios ac battery rfkill pcspkr button serio_raw processor snd_page_alloc ext4 mbcache jbd2 crc16 dm_mod ide_cd_mod cdrom sd_mod ata_generic ata_piix ide_pci_generic sdhci_pci sdhci uhci_hcd mmc_core ahci thermal piix atl1 ohci1394 led_class ieee1394 libata thermal_sys scsi_mod mii ide_core ehci_hcd intel_agp [last unloaded: scsi_wait_scan]
[   19.333057] Pid: 1481, comm: modprobe Not tainted 2.6.34 #1
[   19.333058] Call Trace:
[   19.333066]  [<ffffffff81030f1b>] ? warn_slowpath_common+0x76/0x8c
[   19.333068]  [<ffffffff8101f1fe>] ? __ioremap_caller+0x162/0x2fc
[   19.333119]  [<ffffffffa0450514>] ? igp_read_bios_from_vram+0x2c/0x85 [radeon]
[   19.333140]  [<ffffffffa045060c>] ? radeon_get_bios+0x16/0x16f2 [radeon]
[   19.333159]  [<ffffffffa0472b29>] ? r600_init+0x4d/0x2c8 [radeon]
[   19.333175]  [<ffffffffa042f046>] ? radeon_device_init+0x2c0/0x33b [radeon]
[   19.333191]  [<ffffffffa042fdf6>] ? radeon_driver_load_kms+0xb2/0x134 [radeon]
[   19.333203]  [<ffffffffa03e966b>] ? drm_get_dev+0x341/0x44e [drm]
[   19.333209]  [<ffffffff81155889>] ? local_pci_probe+0x12/0x16
[   19.333212]  [<ffffffff811564e3>] ? pci_device_probe+0xbf/0xec
[   19.333218]  [<ffffffff811c06db>] ? driver_sysfs_add+0x42/0x69
[   19.333218]  [<ffffffff811c07fd>] ? driver_probe_device+0x8e/0x10e
[   19.333221]  [<ffffffff811c08cc>] ? __driver_attach+0x4f/0x6f
[   19.333223]  [<ffffffff811c087d>] ? __driver_attach+0x0/0x6f
[   19.333228]  [<ffffffff811c00f3>] ? bus_for_each_dev+0x44/0x78
[   19.333231]  [<ffffffff811bfad8>] ? bus_add_driver+0xaf/0x1f7
[   19.333233]  [<ffffffff811c0b61>] ? driver_register+0x90/0xf8
[   19.333236]  [<ffffffff8115672b>] ? __pci_register_driver+0x4e/0xc0
[   19.333249]  [<ffffffffa04bb000>] ? radeon_init+0x0/0xc0 [radeon]
[   19.333261]  [<ffffffffa04bb000>] ? radeon_init+0x0/0xc0 [radeon]
[   19.333264]  [<ffffffff810001e0>] ? do_one_initcall+0x4f/0x13e
[   19.333269]  [<ffffffff81057905>] ? sys_init_module+0xc6/0x221
[   19.333272]  [<ffffffff810028ab>] ? system_call_fastpath+0x16/0x1b

Please, note that this laptop may have a buggy hardware. In 2008, I reported bug #11103 (that have been solved since): I couldn't use my laptop video card with two 2GiB memory modules.
Here again, with one module it works fine, with two modules it fails.

Note, that there is no problem with kernel 2.6.33.4 (hence I marked it as a regression).

I can test patches or do a git bisect search if you think it could be useful.

Sincerely,

Yannick

PS: Yinghai, I CCed you as you solved the problem last time.
Comment 1 Andrew Morton 2010-05-19 21:22:08 UTC
yinghai, is this a DRM bug in the radeon driver?

Thanks.
Comment 2 Anonymous Emailer 2010-05-19 21:42:02 UTC
Reply-To: yinghai.lu@oracle.com

looks like kernel problem.

[    0.203628] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.204028] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    0.204058] pci_root PNP0A08:00: host bridge window [io  0x0000-0x0cf7]
[    0.204058] pci_root PNP0A08:00: host bridge window [io  0x0d00-0xffff]
[    0.204058] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
[    0.204058] pci_root PNP0A08:00: host bridge window [mem 0x000d0000-0x000dffff]
[    0.204058] pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff]

...

[    0.207053] pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref]
[    0.207058] pci 0000:01:00.0: reg 14: [io  0x9000-0x90ff]
[    0.207063] pci 0000:01:00.0: reg 18: [mem 0xfddf0000-0xfddfffff]
[    0.207078] pci 0000:01:00.0: reg 30: [mem 0xfddc0000-0xfdddffff pref]
[    0.207095] pci 0000:01:00.0: supports D1 D2
[    0.207104] pci 0000:00:01.0: PCI bridge to [bus 01-01]
[    0.207107] pci 0000:00:01.0:   bridge window [io  0x7000-0x9fff]
[    0.207109] pci 0000:00:01.0:   bridge window [mem 0xfdd00000-0xfddfffff]
[    0.207113] pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit pref]


[    0.230095] pci 0000:00:01.0: no compatible bridge window for [mem 0xbdf00000-0xddefffff 64bit pref]
[    0.230099] pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref]

...
[    0.240916] pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000)
...
[    0.240922] pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000)
...
[   19.141094] [drm] Initialized drm 1.1.0 20060810
[   19.330812] [drm] radeon kernel modesetting enabled.
[   19.330883] radeon 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[   19.330888] radeon 0000:01:00.0: setting latency timer to 64
[   19.332730] [drm] initializing kernel modesetting (RV620 0x1002:0x95C4).
[   19.332945] [drm] register mmio base: 0xFDDF0000
[   19.332946] [drm] register mmio size: 65536
[   19.332975] ------------[ cut here ]------------
[ 

...

please try to boot with pci=nocrs

Thanks
Comment 3 Bjorn Helgaas 2010-05-19 22:38:00 UTC
I assume this is the Asus M51Se laptop, T8300, ATI Radeon HD3470
environment mentioned in https://bugzilla.kernel.org/show_bug.cgi?id=11103

I'm pretty sure this will work with "pci=nocrs", but that doesn't
really tell us anything about how we should fix this.

The BIOS programmed the 00:01.0 bridge to a range that starts before
the upstream host bridge window.  We tried to reassign the 00:01.0
window, but there wasn't enough space for its original size (0x20000000),
so we just disabled it:

  pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff]
  pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
  pci 0000:00:01.0: no compatible bridge window for [mem 0xbdf00000-0xddefffff 64bit pref]
  pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000)
  pci 0000:00:01.0:   bridge window [mem pref disabled]

That took away the window leading to the Radeon BAR, so we disabled
that, too, even though it was within the host bridge window:

  pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref]
  pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref]
  pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000)

I can imagine doing something like adjusting the host bridge window if
we find a P2P bridge window that doesn't fit, or trying to reallocate
the P2P bridge window based on what we actually need (we only need
0x10000000, and it would have worked to allocate that).

But I would *really* like to know how Windows handles this.  If we
can figure that out, we can make Linux work the same way, which is
much more likely to work across many machines.

Yannick, is there any way you can boot Windows on this config and
collect the memory map using the "System Information", a.k.a.
msinfo32.exe, tool?  See https://bugzilla.kernel.org/attachment.cgi?id=26066
for an example of what I'm looking for.
Comment 4 Yannick Roehlly 2010-05-19 23:15:06 UTC
Hi Bjorn, thank for considering the problem (thanks to Andrew and Yinghai too).

> I assume this is the Asus M51Se laptop, T8300, ATI Radeon HD3470

Yes, it's the same machine.

> I'm pretty sure this will work with "pci=nocrs", but that doesn't
> really tell us anything about how we should fix this.

Alas, no. I attach the dmesg output of booting with "pci=nocrs".

> Yannick, is there any way you can boot Windows on this config and
> collect the memory map using the "System Information"

For once my Windows 7RC install is useful. ;-)

I also attach the memory and display information from msinfo32.exe. I'm sorry, it's in French.

Note, that when I faced bug #11103 I noticed that FreeBSD worked with 4GiB memory (with Vesa driver). If you need, I can do tests with it (preferably with Freesbie livecd as I'm not at easy with BSD slice partitioning).

Yannick
Comment 5 Yannick Roehlly 2010-05-19 23:16:34 UTC
Created attachment 26445 [details]
dmesg output booting with "pci=nocrs"
Comment 6 Yannick Roehlly 2010-05-19 23:17:14 UTC
Created attachment 26446 [details]
Memory and display info form msinfo32.txt
Comment 7 Anonymous Emailer 2010-05-19 23:56:10 UTC
Reply-To: yinghai.lu@oracle.com

[    0.230216] pci 0000:00:01.0: address space collision: [mem 0xbdf00000-0xddefffff 64bit pref] conflicts with System RAM [mem 0x00100000-0xbff9ffff]
[    0.230216] pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref]

the BIOS looks crazy...
memory is overlapped with mmio
Comment 8 Anonymous Emailer 2010-05-20 00:37:33 UTC
Reply-To: yinghai.lu@oracle.com

[    0.207901] pci 0000:00:1c.4: PCI bridge to [bus 06-07]
[    0.207905] pci 0000:00:1c.4:   bridge window [io  0xb000-0xbfff]
[    0.207908] pci 0000:00:1c.4:   bridge window [mem 0xfe100000-0xfe8fffff]
[    0.207913] pci 0000:00:1c.4:   bridge window [mem 0xddf00000-0xdfefffff 64bit pref]

looks like you can clear bridge of 00:1c.4, and then kexec current kernel
00:01.0 will get resource it needed.
and 00:1c.4 could be pushed to 0xf0000000
Comment 9 Rafael J. Wysocki 2010-05-20 21:49:05 UTC
Handled-By : Yinghai Lu <yinghai.lu@oracle.com>
Comment 10 Bjorn Helgaas 2010-05-21 04:36:54 UTC
I think these:
  0xC0000000-0xFFFFFFFF	Bus PCI	OK	
  0xC0000000-0xFFFFFFFF	Port racine PCI Express Mobile Intel(R) PM965/GM965/GL960/GS965 Express - 2A01	OK

correspond to these Linux messages:
  pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff]
  pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit pref]

So my preliminary theory is that Windows changed that 00:01.0 P2P
bridge window to conform to the upstream host bridge window.  But
I wonder if you could find that bridge in the Device Manager (look
under "System Devices" for a Root Port at PCI bus 0, device 1,
function 0), and write down or take a screenshot of the "Resources"
tab?  I want to make sure we're comparing the same device.
Comment 11 Yannick Roehlly 2010-05-21 20:26:32 UTC
Hi Bjorn,

I couldn't find a Root Port at PCI bus 0, device 1, function 0 in the
device manager of Windows 7. Nevertheless, displaying the devices "by
attachment" makes the radeon card appear as attached to

Port racine PCI Express Mobile Intel(R) PM965/GM965/GL960/GS965 Express - 2A01

Its resources are :

Plage mémoire: 00000000FDD00000 - 00000000FDDFFFFF
Plage mémoire: 00000000C0000000 - 00000000DFFFFFFF
Plage d'E/S:   7000 - 9FFF
IRQ :          0xFFFFFFFE(-2)
Plage mémoire: 00000000000A0000 - 00000000000BFFFF
Plage d'E/S:   03B0 - 03BB
Plage d'E/S:   03C0 - 03DF

Sincerely,

Yannick
Comment 12 Bjorn Helgaas 2010-06-02 21:37:44 UTC
I don't think this problem is related to _CRS.  The system used to
work without _CRS.  2.6.34 automatically turns on "pci=use_crs", but
the system fails the same way when booted with "pci=nocrs".  I think
we should concentrate on getting things to work again with "pci=nocrs",
and then we can worry about whether enabling _CRS makes any difference.

The problem is the 00:01.0 bridge prefetchable memory aperture that
overlaps system memory.

In bug 11103, with kernel 2.6.30-rc1, the dmesg in attachment 20984 [details]
shows that we reduced the size of that bridge aperture and reassigned
it so it no longer overlaps system memory:

  Linux version 2.6.30-rc1-tip (yannick@tardis) ...
  BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable)
  pci 0000:01:00.0: reg 10 32bit mmio: [0xc0000000-0xcfffffff]
  pci 0000:00:01.0: bridge 64bit mmio pref: [0xbdf00000-0xddefffff] (size 0x20000000)
  pci 0000:00:01.0: BAR 9: can't allocate resource
  pci 0000:01:00.0: BAR 0: can't allocate resource
  pci 0000:01:00.0: BAR 0: got res [0xc0000000-0xcfffffff] bus [0xc0000000-0xcfffffff] flags 0x21208
  pci 0000:01:00.0: BAR 0: moved to bus [0xc0000000-0xcfffffff] flags 0x21208
  pci 0000:00:01.0:   PREFETCH window: 0xc0000000-0xcfffffff (size 0x10000000)

In 2.6.34 (attachment 26445 [details]), we start with the same overlap, but for
some reason, we can't reassign the 00:01.0 bridge aperture:

  Linux version 2.6.34 (yannick@tardis) ...
  BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable)
  pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref]
  pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
  pci 0000:00:01.0: address space collision: [mem 0xbdf00000-0xddefffff 64bit pref] conflicts with System RAM [mem 0x00100000-0xbff9ffff]
  pci 0000:01:00.0: no compatible bridge window for [mem 0xc0000000-0xcfffffff pref]
  pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000)
  pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000)
  pci 0000:00:01.0:   bridge window [mem pref disabled]

I looked through the drivers/pci changes between 2.6.33 and .34, and
this one:

  d65245c PCI: don't shrink bridge resources

sounds like a possibility.  In 2.6.30, we reduced the size of the
aperture from 0x20000000 to 0x10000000.  If d65245c prevents us from
reducing the size, the allocation will fail.

Yannick, can you try checking out cd81e1ea1a4c (the parent of
d65245c) and d65245c itself, and booting them to see whether that's
what introduced the problem?  If d65245c isn't it, I'm afraid the
quickest way forward will be to bisect.
Comment 13 Anonymous Emailer 2010-06-03 02:11:55 UTC
Reply-To: yinghai.lu@oracle.com

On 06/02/2010 02:36 PM, Bjorn Helgaas wrote:
> I don't think this problem is related to _CRS.  The system used to
> work without _CRS.  2.6.34 automatically turns on "pci=use_crs", but
> the system fails the same way when booted with "pci=nocrs".  I think
> we should concentrate on getting things to work again with "pci=nocrs",
> and then we can worry about whether enabling _CRS makes any difference.
> 
> The problem is the 00:01.0 bridge prefetchable memory aperture that
> overlaps system memory.
> 
> In bug 11103, with kernel 2.6.30-rc1, the dmesg in attachment 20984 [details]
> shows that we reduced the size of that bridge aperture and reassigned
> it so it no longer overlaps system memory:
> 
>   Linux version 2.6.30-rc1-tip (yannick@tardis) ...
>   BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable)
>   pci 0000:01:00.0: reg 10 32bit mmio: [0xc0000000-0xcfffffff]
>   pci 0000:00:01.0: bridge 64bit mmio pref: [0xbdf00000-0xddefffff] (size
>   0x20000000)
>   pci 0000:00:01.0: BAR 9: can't allocate resource
>   pci 0000:01:00.0: BAR 0: can't allocate resource
>   pci 0000:01:00.0: BAR 0: got res [0xc0000000-0xcfffffff] bus
>   [0xc0000000-0xcfffffff] flags 0x21208
>   pci 0000:01:00.0: BAR 0: moved to bus [0xc0000000-0xcfffffff] flags 0x21208
>   pci 0000:00:01.0:   PREFETCH window: 0xc0000000-0xcfffffff (size
>   0x10000000)
> 
> In 2.6.34 (attachment 26445 [details]), we start with the same overlap, but for
> some reason, we can't reassign the 00:01.0 bridge aperture:
> 
>   Linux version 2.6.34 (yannick@tardis) ...
>   BIOS-e820: 0000000000100000 - 00000000bffa0000 (usable)
>   pci 0000:01:00.0: reg 10: [mem 0xc0000000-0xcfffffff pref]
>   pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
>   pci 0000:00:01.0: address space collision: [mem 0xbdf00000-0xddefffff 64bit
>   pref] conflicts with System RAM [mem 0x00100000-0xbff9ffff]
>   pci 0000:01:00.0: no compatible bridge window for [mem
>   0xc0000000-0xcfffffff pref]
>   pci 0000:00:01.0: BAR 9: can't assign mem pref (size 0x20000000)
>   pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000)
>   pci 0000:00:01.0:   bridge window [mem pref disabled]
> 
> I looked through the drivers/pci changes between 2.6.33 and .34, and
> this one:
> 
>   d65245c PCI: don't shrink bridge resources
> 
> sounds like a possibility.  In 2.6.30, we reduced the size of the
> aperture from 0x20000000 to 0x10000000.  If d65245c prevents us from
> reducing the size, the allocation will fail.

your analyzing is right.

please check if following patch is fixing the problem.

[PATCH] x86, pci: clear bridge resource size if BIOS assign bad one

make sure We can reject wrong size from BIOS.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/i386.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -136,6 +136,7 @@ static void __init pcibios_allocate_bus_
 					 * child resource allocations in this
 					 * range.
 					 */
+					r->start = r->end = 0;
 					r->flags = 0;
 				}
 			}
Comment 14 Yannick Roehlly 2010-06-03 09:06:58 UTC
Hi everybody!

Le Thursday 03 June 2010 04:09:14 Yinghai Lu, vous avez écrit :
> please check if following patch is fixing the problem.

Yes, Yinghai, your patch solves the problem. Kudos!

Bjorn, do you want me to test the git versions you mentioned earlier to 
clearly delimit the problem or this working patch is enough to make it clear?

Yannick
Comment 15 Bjorn Helgaas 2010-06-03 16:25:44 UTC
Yannick, would you mind attaching your dmesg log when using Yinghai's
patch?  I'd like to understand how that fix works, and maybe there's
a clue in the log.

Yinghai, if we use your patch, the changelog needs to include the URL
of this bug report.

Your patch affects x86, but there are several similar uses of
pci_claim_resource() in other architectures.  I think most of this
code is actually generic and should not be architecture-specific,
but until that is cleaned up, we should at least audit the other
uses to see whether they need the same fix.
Comment 16 Yannick Roehlly 2010-06-03 18:22:04 UTC
Created attachment 26636 [details]
dmesg log with Yinghai's patch applied.

Here is the log, Bjorn.
Comment 17 Rafael J. Wysocki 2010-06-13 13:53:19 UTC
Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com>
Handled-By : Yinghai Lu <yinghai@kernel.org>
Patch : https://patchwork.kernel.org/patch/104169/
Comment 18 Rafael J. Wysocki 2010-06-13 13:54:13 UTC
Fixed by commit 837c4ef13c44296bb763a0ca0e84a076592474cf .