My HP nx6325 doesn't boot any more with 2.6.37-rc2 and later. It boots normally with 2.6.36 with analogous .config. The last thing printed by the kernel is that the TSC is unstable with an insanely huge delta. Later it only displays vertical stripes and hangs hard, but I don't really think the problem is related to the radeon driver, because it's still reproducible with radeon.modeset=0. I'll try to bisect this over the weekend.
Bisection leads to the following commit: commit 1af3c2e45e7a641e774bbb84fa428f2f0bf2d9c9 Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Tue Oct 26 15:41:54 2010 -0600 x86: allocate space within a region top-down Request that allocate_resource() use available space from high addresses first, rather than the default of using low addresses first. The most common place this makes a difference is when we move or assign new PCI device resources. Low addresses are generally scarce, so it's better to use high addresses when possible. This follows Windows practice for PCI allocation. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reverting it from current mainline (HEAD=76db8ac45fc738f7d7664fe9b56d15c594a45228) causes the box to boot successfully again. First-Bad-Commit : 1af3c2e45e7a641e774bbb84fa428f2f0bf2d9c9
Created attachment 37752 [details] test patch (aviod last 64K) Thanks for the report and bisection! Matthew Garrett reported a similar problem on his HP 2530p, and I sent him this patch to test. Maybe you could try it, too? If you could also attach the dmesg log, that'd be useful. In Matthew's case, we allocated the very last page before the 4GB boundary to a PCI device. The E820 map probably should have reserved that area, but it didn't, so maybe it's a common HP BIOS bug.
Created attachment 37772 [details] dmesg output from HP nx6325 The patch doesn't help and dmesg output from the current mainline kernel with commit 1af3c2e45e7a641e774bbb84fa428f2f0bf2d9c9 reverted is attached.
It seems like a good idea to reserve 0xff000000..0xffffffff for BIOS as a general policy; 0xfexxxxxx tends to be used by things like APIC. bugzilla-daemon@bugzilla.kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=23332 > > > > > >--- Comment #2 from Bjorn Helgaas <bjorn.helgaas@hp.com> 2010-11-20 >17:43:29 --- >Created an attachment (id=37752) > --> (https://bugzilla.kernel.org/attachment.cgi?id=37752) >test patch (aviod last 64K) > >Thanks for the report and bisection! Matthew Garrett reported a >similar >problem on his HP 2530p, and I sent him this patch to test. Maybe you >could try it, too? If you could also attach the dmesg log, that'd be >useful. > >In Matthew's case, we allocated the very last page before the 4GB >boundary >to a PCI device. The E820 map probably should have reserved that area, >but >it didn't, so maybe it's a common HP BIOS bug. > >-- >Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email >------- You are receiving this mail because: ------- >You are watching the assignee of the bug.
> It seems like a good idea to reserve 0xff000000..0xffffffff for BIOS as a > general policy; 0xfexxxxxx tends to be used by things like APIC. You're certainly right that APICs and other things often live in that area. The question is how we decide on the right boundary. If we choose a boundary higher than Windows, there will be cases where Windows works but we don't. If we choose a boundary lower than Windows, we're "safer," but we may throw away perfectly usable space. I chose 0xffff0000 for the debug patch because that's what Windows used (in one test of Windows 7 on qemu, YMMV). Rafael, can you try a boot with "pci=use_crs"? Your box is old enough that we don't turn it on automatically, and amd_bus.c thinks the host bridge window leading to bus 00 is [mem 0x80000000-0xfcffffffff] (note that the end of that window has *ten* digits, not eight). I'm dubious about relying on that region above 4GB, especially since ACPI is telling us the window is only [mem 0x80000000-0xfedfffff]. There is also a window [mem 0xfee01000-0xffffffff] that goes right to the 4GB boundary, but in this case the BIOS does reserve [mem 0xfff00000-0xffffffff] via a motherboard device, so I don't think the debug patch will make a difference.
(In reply to comment #5) > > It seems like a good idea to reserve 0xff000000..0xffffffff for BIOS as a > > general policy; 0xfexxxxxx tends to be used by things like APIC. > > You're certainly right that APICs and other things often live in that area. > The question is how we decide on the right boundary. If we choose a boundary > higher than Windows, there will be cases where Windows works but we don't. > If we choose a boundary lower than Windows, we're "safer," but we may throw > away perfectly usable space. I chose 0xffff0000 for the debug patch because > that's what Windows used (in one test of Windows 7 on qemu, YMMV). > > Rafael, can you try a boot with "pci=use_crs"? That helps. > Your box is old enough that we don't turn it on automatically, and amd_bus.c > thinks the host bridge window leading to bus 00 is > [mem 0x80000000-0xfcffffffff] (note that the > end of that window has *ten* digits, not eight). I'm dubious about > relying on that region above 4GB, especially since ACPI is telling us > the window is only [mem 0x80000000-0xfedfffff]. Moreover, the box was shipped with 32-bit Windows XP and I run 64-bit Linux on it. > There is also a window [mem 0xfee01000-0xffffffff] that goes right to the 4GB > boundary, but in this case the BIOS does reserve [mem 0xfff00000-0xffffffff] > via a motherboard device, so I don't think the debug patch will make a > difference. Right.
(In reply to comment #6) > (In reply to comment #5) > > > It seems like a good idea to reserve 0xff000000..0xffffffff for BIOS as a > > > general policy; 0xfexxxxxx tends to be used by things like APIC. > > > > You're certainly right that APICs and other things often live in that area. > > The question is how we decide on the right boundary. If we choose a > boundary > > higher than Windows, there will be cases where Windows works but we don't. > > If we choose a boundary lower than Windows, we're "safer," but we may throw > > away perfectly usable space. I chose 0xffff0000 for the debug patch > because > > that's what Windows used (in one test of Windows 7 on qemu, YMMV). > > > > Rafael, can you try a boot with "pci=use_crs"? > > That helps. No, it doesn't. Sorry, I used a wrong kernel for that test.
Huh, I really expected that to be the problem. I'll puzzle over it some more. If you still have the ability to boot Windows, an Everest report (http://lavalys.com) might have a hint. If it's possible to get any kind of serial or netconsole log of the failed boots (especially with "pci=use_crs", since I just don't see how we can use the top of the region amd_bus.c reported), that would help, too.
Windows is not installed on this box any more and there's no serial port in it. I'll try to play with netconsole, but I'm afraid it crashes too early for that to work.
Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com>
(In reply to comment #9) > Windows is not installed on this box any more and there's no serial port in > it. > > I'll try to play with netconsole, but I'm afraid it crashes too early for > that > to work. I've tried with serial console on USB<-->serial converter, but it also crashes too early. Also tried to take some pictures, but kernel messages are moving too fast ;)
Maybe the boot_delay kernel parameter will allow you to catch it: Documentation/kernel-parameters.txt: boot_delay= Milliseconds to delay each printk during boot. Values larger than 10 seconds (10000) are changed to no delay (0).
(In reply to comment #12) > Maybe the boot_delay kernel parameter will allow you to catch it: > > Documentation/kernel-parameters.txt: > boot_delay= Milliseconds to delay each printk during boot. > Values larger than 10 seconds (10000) are changed to > no delay (0). Yes, I've also hit on this idea after writing previous post, I'll post taken photos in a few hours.
Here is my "dmesg" http://ftp.retis.net.pl/dmesg.jpeg I'll try to make dump using firewire if I find my firewire cable.
Created attachment 38322 [details] failing log (In reply to comment #14) > Here is my "dmesg" http://ftp.retis.net.pl/dmesg.jpeg Wow, nice. That is the coolest log I've ever seen:-) I did find an nx6325 and a docking station, so I was able to collect this serial log easily.
Created attachment 38332 [details] nx6325 Everest report I also have Windows XP on this system. Here's an Everest report showing Windows resource usage. Differences I see: 0000:01:05.0 Linux assigns ROM at [mem 0xd43e0000-0xd43fffff pref] 0000:00:12.0 WinXP moved BAR 5 to [mem 0xffeffe00-0xffefffff] 0000:00:12.0 Linux assigned ROM at [mem 0xffe80000-0xffefffff pref] 0000:00:14.2 Linux moved BAR 0 to [mem 0xfed7c000-0xfed7ffff 64bit] 0000:00:14.4 Linux assigned window at [mem 0xf8000000-0xfbffffff pref] The Linux option ROM assignments are typical; Windows doesn't assign resources for ROMs, but Linux does. The 12.0 and 14.2 changes are because of this collision: pci 0000:00:14.2: address space collision: [mem 0xd4408000-0xd440bfff 64bit] conflicts with 0000:00:12.0 [mem 0xd4409000-0xd44091ff] WinXP moved 12.0 and Linux moved 14.2; both resolve the collision. I used a test patch to keep Linux from assigning the mem pref window to the 14.4 subtractive decode bridge, and it avoided the problem, so my working theory is that there's something in the [mem 0xf8000000-0xfbffffff] region we should be avoiding.
However, the BIOS doesn't seem to tell us what it is?
Well, the E820 memory map doesn't mention anything in the [mem 0xf8000000-0xfbffffff] range, and I don't see any ACPI devices there either. I can't find it right now, but ISTR a recent problem where we placed a device on top of a uvesafb frame buffer, so I wonder whether we're supposed to use the VESA BIOS extensions in addition to the E820 map and ACPI namespace. But I don't know anything about VESA BIOS; it's just on my list to look into.
Created attachment 39272 [details] patch to avoid allocating PNP resources I did a lot of experimentation with this, and as far as I can tell, this is just a BIOS bug -- the BIOS forgot to tell us about a couple devices in the address space. The VESA framebuffer issue I was thinking of is bug 22132. In that case, the framebuffer was marked "reserved" in E820, but we put another PCI device there anyway because we don't do a very good job avoiding those reserved areas. In any case, that's not the issue here. Here's a quirk to avoid the hazards in the nx6325 address space. It works for me, but it'd be good if you could try it, too, because I only tested it as far as booting to the point of mounting the root filesystem.
Created attachment 39302 [details] debugging patch Here's the patch I used to explore the address space. I used boot arguments like "pci=cbmemsize=1M pci_top=0xf83fffff" to force the 00:14.4 bridge prefetchable memory window to be allocated at various sizes and addresses. Here are the results (note that we allocate 64M for CardBus bridge windows by default, so the 64M allocation is the default behavior that caused Rafael's hang): 64M [mem 0xf8000000-0xfbffffff] HANG 32M [mem 0xf8000000-0xf9ffffff] HANG 16M [mem 0xf8000000-0xf8ffffff] HANG 8M [mem 0xf8000000-0xf87fffff] HANG 4M [mem 0xf8000000-0xf83fffff] HANG 2M [mem 0xf8000000-0xf81fffff] OK 2M [mem 0xf8200000-0xf83fffff] HANG 1M [mem 0xf8200000-0xf82fffff] OK 1M [mem 0xf8300000-0xf83fffff] HANG 4M [mem 0xf8400000-0xf87fffff] HANG 2M [mem 0xf8400000-0xf85fffff] HANG 1M [mem 0xf8400000-0xf84fffff] OK 1M [mem 0xf8500000-0xf85fffff] HANG 2M [mem 0xf8600000-0xf87fffff] OK 8M [mem 0xf8800000-0xf8ffffff] OK 16M [mem 0xf9000000-0xf9ffffff] HANG 8M [mem 0xf9000000-0xf97fffff] HANG 4M [mem 0xf9000000-0xf93fffff] HANG 2M [mem 0xf9000000-0xf91fffff] HANG 1M [mem 0xf9000000-0xf90fffff] OK 1M [mem 0xf9100000-0xf91fffff] HANG 2M [mem 0xf9200000-0xf93fffff] OK 4M [mem 0xf9400000-0xf97fffff] OK 8M [mem 0xf9800000-0xf9ffffff] OK 32M [mem 0xfa000000-0xfbffffff] OK Based on the above, I think the nx6325 has the following unreported areas in use: 1M [mem 0xf8300000-0xf83fffff] HANG 1M [mem 0xf8500000-0xf85fffff] HANG 1M [mem 0xf9100000-0xf91fffff] HANG My quirk in the previous patch combined the first two areas because I hadn't been quite so systematic when I wrote the patch. I should probably have just used these three areas as-is.
Created attachment 39312 [details] v2 patch to avoid allocating PNP resources Updated patch to reserve the three specific areas mentioned in comment 20.
Created attachment 40212 [details] patch to avoid opening windows on subtractive decode bridges This is a different approach. This system has a subtractive decode bridge leading to a CardBus bridge: pci 0000:00:14.4: PCI bridge to [bus 02-03] (subtractive decode) pci 0000:02:04.0: CardBus bridge to [bus 03-06] Windows leaves the subtractive decode bridge alone and programs a 64MB window on the CardBus bridge. This 64MB window relies on subtractive decode. Linux programs two 64MB windows on the CardBus bridge, *and* opens a window on the 00:14.4 bridge, so it positively decodes at least part of the CardBus space. This patch makes Linux use the BIOS setup of subtractive decode bridges, without assigning new explicit windows to them.
My HP nx6325 boots correctly with this patch applied.
Patch : https://bugzilla.kernel.org/attachment.cgi?id=40212
Fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=46bdfe6a50b88942f5323f837a3afd93a1c86e60 .