Bug 10231
Summary: | On H12Y based notebooks 8139too (mmio) or sdhci freeze the system | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Arne Fitzenreiter (arne) |
Component: | i386 | Assignee: | Alan (alan) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, bjorn.helgaas, dominik.bodi, florian, jbarnes, kernel.bugs, yahgrp |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.4-rc6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Hack to force linux to use other mmio addresses
iomem without patch iomem with patch dmesg without patch dmesg with patch lspci without patch lspci with patch new version of the patch with more hardwarechecks H12Y mmio patch lspci -s1f.0 -xxx grep . /sys/bus/pnp/devices/*/* patch for FW/PCI bridge overlap debug dmesg with bridgewindowpatch lspci -v with bridgewindowpatch debug patch to disable FW decoding Dmesg output with fw disable Output of dmidecode Test patch to reserve the problem region working patch |
Description
Arne Fitzenreiter
2008-03-12 06:17:51 UTC
Just a note: This bug can be overcome by using a kernel compiled with PIO off in the sdhci module, as noted in the bug report above. The latest Ubuntu (8.04) comes with the kernel compiled that way, because of this bug. Sorry, that is PIO off in the 8139too module, not sdhci. See also Bug #9905 for more info about the sdhci module on this laptop (or its clone). Hi Jennifer, i have only linked to this report because the lspci log was inside. We have the same laptops. (I think Thwinhead H12Y is the correct name because this was printed on the PCB and in the Bios Data) The reason for crashing is the same for both modules. Some mmio read or write accesses in the assignid memory area. sdhci and 8139too mmio areas assigned very close in the address area. I have seen that windows reconfigure both devices. Maybee there is a addressing issue in a Mainboard chip. The other devices around this are working without problems. Arne Arne, any updates on this one ? Does the problem persist with recent kernel versions ? Created attachment 17681 [details]
Hack to force linux to use other mmio addresses
Yes the problem is still present also in 2.6.27-rc5 I think in the laptop is an other device that also use the MMIO Address Area: ffbfe800-ffbfecff of maybee also to ffbfefff If i apply the attached hack that change the size of the MMIO Areas of the Cardreader and the Networkcard so that linux has to move it to an other area both components work. In both this and bug #9905, the system hangs when we attempt MMIO to at least some of the devices behind the PCI bridge at 00:1e.0. Arne, you mention hangs when you use 8139too or sdhci. There's also a firewire device behind the bridge. Does the system hang when you load the firewire driver, too? You can get 8139too and sdhci to work with the quirk you attached. But we still have to use the 00:1e.0 bridge, which makes me think it might be an address conflict with another device, not a problem with the bridge itself. Can you please turn on CONFIG_PCI_DEBUG and CONFIG_PNP_DEBUG and attach the complete dmesg log and contents of /proc/iomem? Please do this both with your patch and without it. Created attachment 17939 [details]
iomem without patch
Created attachment 17940 [details]
iomem with patch
Created attachment 17941 [details]
dmesg without patch
Created attachment 17942 [details]
dmesg with patch
Created attachment 17943 [details]
lspci without patch
Created attachment 17944 [details]
lspci with patch
Created attachment 17945 [details]
new version of the patch with more hardwarechecks
I cant say if the other devices (firewire and memorystick) are affected. Memorystick was not supported at my kernel version and the ohci driver seems not to use the area at ffbfxxxx Created attachment 18179 [details]
H12Y mmio patch
Here a new version that also move the firewire and the Memorystick out of the problematic mem-area (now lscpi shown no deviced at 0xFFB00000-0xFFBFFFFF
Arne - I tried your patch, and it definitely allows me to do a modprobe sdhci and not crash (which in itself is useful), but I don't see any evidence that the SD card reader is working. I applied the patch to 2.6.27.7 (from the linux.org source), and am running Ubuntu 8.10 other than my custom kernel. I insered an SD card into the card reader, after doing "modprobe sdhci", and I don't see any dmesg indicating it was recognized, and it isn't in any of the /dev/sd* devices. Much less an automount via Nautilus... Is there something else I need to do in order to actually use the SD card reader? And by the way, thanks for all your work! I never would have been able to use Linux on this PC in the first place, if it weren't for the install disk you posted on your web site. Hi Jennifer, since Kernel 2.6.27 the sdhci module is split into two modules, sdhci and sdhci_pci. The module for the h12y cardreader is sdhci_pci With this the cardreader should work. I have applied this patch to the ubuntu kernel and created a new package. The sd-card was showed on the desktop like other mass storage. Arne - thanks - that worked - I hadn't realized the module was split, and now for the first time I have been able to use the card reader on my laptop. I have a USB card reader I could plug in, so it wasn't a huge problem, but it's nice to get the hardware actually working under Linux. Now if we can just get the kernel.org or Ubuntu folks to put in your patch, we won't have to compile our own kernels any more... Please attach the output of "lspci -s1f.0 -xxx" and "grep . /sys/bus/pnp/devices/*/*". We think there's an issue with something in the ffbfe800-ffbfecff address range (comment #6), and the ICH7 spec mentions ffb80000-ffbfffff as part of a Firmware Hub range. So I want to double-check that Firmware Hub decoding is disabled (see FWH_DEC_EN1, at 0xd8 in 1f.0 config space). Created attachment 19356 [details]
lspci -s1f.0 -xxx
Created attachment 19357 [details]
grep . /sys/bus/pnp/devices/*/*
The value of FWH_DEC_EN1 (at D8h) is C0C0h (bits 15, 14, 7, 6 set): 15: FFF80000h – FFFFFFFFh, FFB80000h – FFBFFFFFh enabled for Firmware Hub 14: FFF00000h – FFF7FFFFh, FFB00000h – FFB7FFFFh enabled for Firmware Hub 7: F0000h – FFFFFh enabled for Firmware Hub 6: E0000h – EFFFFh enabled for Firmware Hub The PCI bridge at 1e.0 also claims the FFB00000h - FFBFFFFFh range: 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge Bus: primary=00, secondary=03, subordinate=04, sec-latency=32 Memory behind bridge: ff300000-ffbfffff That looks wrong to me, although I'm not an ICH expert. I think the bridge window should be reduced to ff300000-ffafffff. I don't know the best way to do this, but I'll poke around and try to come up with a patch. Created attachment 19364 [details]
patch for FW/PCI bridge overlap debug
Can you try this patch? The idea is to reduce the size of the bridge aperture so it no longer overlaps the firmware range. Then we have to reassign any MMIO resources for downstream devices. This is a gross patch, but maybe we can at least figure out if we're on the right track.
Hi Bjorn, the patch works after i have adapted to 2.6.27. I think you have a newer development version. At my kernel the PCI-Quirks are not configurable. Created attachment 19374 [details]
dmesg with bridgewindowpatch
Created attachment 19375 [details]
lspci -v with bridgewindowpatch
Great! Thanks for testing this, Arne. I think this is pretty good evidence that this is just a BIOS defect. The BIOS should have either disabled the Firmware Hub decoding or reduced the size of the bridge aperture. I'll be on vacation for the next two weeks, but after the holidays, we can figure out a cleaner way to work around this. Created attachment 19725 [details] debug patch to disable FW decoding Arne, can I trouble you to try another patch? It occurred to me that my patch from comment #28 doesn't actually confirm that the conflict is with the firmware range. It's just a very complicated way to move the devices, which you already did with your patch in comment #16. This patch just disables the firmware range. If that range is really the source of the conflict, this patch should make the devices work even without moving them. My DL320 booted and seems to work with the firmware range disabled, but it's possible that SMM or other runtime BIOS code depends on it, so I'm not sure it's really safe to disable it. But if this patch makes those devices work, even for a little while, I'll be more confident that we've identified the conflict. I don't think this patch in Comment #29 is good. I did the following: - used git to get the 2.6.28 source from kernel.org - set the 8139too module to use the standard MMIO/PIO that ignites the problem in this laptop. - applied your patch to the quirks.c file - compiled into a deb package, installed (I'm running Ubuntu) The laptop will not boot -- it gets stuck right after it loads the 8239too module, just as the unpatched kernel would. Just to make sure, I am going to apply the patch from Comment #28 again and verify that everything works. Will report back in a couple of hours. Thanks testing this, Jennifer. I was sure the conflict was with the firmware range (and maybe it still is), but I guess I don't quite understand this yet. Would you mind attaching the dmesg log from the comment #29 patch when booted with "pci=earlydump"? One other idea: could you modify the comment #29 patch to clear only the upper bits of FWH_DEC_EN1? Arne reported this value: pci 0000:00:1e.0: FWH_DEC_EN1: 0xc0c0 so maybe you could write back 0xc0. Per comment #23, it's only the range enabled by bits 14 and 15 that conflicts with these devices. We should be able to leave the E0000h-FFFFFh range enabled. Just to make sure we were on the same page, I went back to the patch from comment #24, rebuilt the 2.6.28 kernel, and was able to boot up successfully, use the network card (8139too module), and use the SD card reader (sdhci-pci module). I want to run a few more tests on this kernel, while I have a working kernel :), such as seeing if suspend will work (there are suspend issues on this laptop in the 2.6.27 kernel), but after I finish that I'll try the non-working patch again, run the other tests you suggested in comment #31, and report back. Should be sometime in the next 6 hours or so. OK. I am back to the patch from comment #29, and trying to get you some dmesg output. But even with pci=earlydump added to my command line, I am apparently not getting far enough through the boot process for the dmesg log to be saved to the hard drive, before the bad 8139too module causes a problem. I also tried adding blacklist 8139too to my /etc/modprobe.d/blacklist file, but it seems to be ignoring that. I am not sure why that would be, as the directives in that file have always been respected before (especially in regards to the sdhci modules) I can set the 8139too PIO/MMIO flag in my .config and at least have a bootable 8139too module, but will that change the output you are looking for? I'm happy to try whatever you think will be useful... Sorry -- I am not all that knowledgeable about the kernel or device drivers -- I'm just an old C programmer (these days mostly doing web programming instead) with a laptop that matches Arne's... :) Any other suggestions for getting the dmesg output you want? You also asked about changing the range in your patch from comment #29. I am assuming you mean to change the line rc = raw_pci_write(0, 0, PCI_DEVFN(31, 0), 0xd8, 2, 0); to read rc = raw_pci_write(0, 0, PCI_DEVFN(31, 0), 0xc0, 2, 0); I will try making that change, rebuild the kernel, and see what happens. Give me a little while to get the build done... and let me know if I have misunderstood... With the change noted in comment #34, I still cannot boot up (still hangs just after loading the 8139too module), and I still don't get any dmesg output saved from the partial bootup, even with pci=earlydump on the boot command line. This time I tried also blacklisting both 8139too and 8139cp, but my directive in the /etc/modprobe.d/blacklist file is being ignored and they are still being loaded (I can see that on the screen right above where everything freezes, or if I boot into a different kernel, one with the PIO=MMIO fix or the comment #29 patch, where the 8139too module works, they do get loaded during bootup). Let me know if there is some other way I can test this that would be useful to you. Wow, thanks for doing all this work. I'm sorry it's been so difficult and time-consuming. Changing the PIO/MMIO .config flag won't change any of the debug output. Since blacklisting isn't working for you, I think the easiest thing would be to rename the module so the loader won't find it. That should be enough to get dmesg output. I did notice today that the pci=earlydump output is missing some devices, but I haven't looked into it yet. If we're lucky, it will include the ones we're interested in. I think you mentioned you're running Ubuntu. In that case, it's fairly easy to skip building the .deb and just build a plain bzImage ("make bzImage") without an initrd or anything. You do have to be careful to build in whatever you need for your root filesystem statically (the right disk driver, filesystem, etc). The advantage is that it's usually much quicker to build a bzImage than to build and install a .deb. Then you can build the single 8139too module you care about and modprobe it by hand. It is a little fiddly to get started -- I just mention it because you seem pretty willing to jump in. If you do build a bzImage from a git tree, it won't match the Ubuntu module versions you have installed, so I don't think any of them will load. As far as the comment #34 change, the value to write is actually the last parameter, so it would look like this: rc = raw_pci_write(0, 0, PCI_DEVFN(31, 0), 0xd8, 2, 0xc0); That's just a shot in the dark, though, because I don't think it's likely that we're using the firmware area at E0000h. I still think this firmware/bridge aperture overlap is the problem. Perhaps the quirk doesn't work on your system because it doesn't match the PCI IDs or something. The dmesg output will tell us. Or you could use lspci to verify that you have a 0x8086 0x2448 device, and change the patch if yours is different. My LSPCI output is identical to Arne's as listed above, with the exception that his wirleses adapater is Intel and mine is Realtek rt73usb. But that's not one of the devices causing the problem. I have identical SD card and 8139 network cards, and everything else except the wireless card. My lspci output is here (attached to a different bug, and it appears the wireless switch was off when it was run): http://bugzilla.kernel.org/attachment.cgi?id=16294 We have different names silk screened on the front of our computers too (his is Averatec and mine is Everex). :) Anyway, I am doing a kernel build now with your corrected quirk line and CONFIG_8139TOO_PIO=y in my .config, so that I can boot up (seems easier than doing all that bzImage stuff), and I'll (a) get the dmesg output and (b) see if the SDHCI works with this latest patch. -- Reporting back later today, this is Jennifer in Seattle. Over and Out. Created attachment 19738 [details] Dmesg output with fw disable OK. I rebuilt the kernel with the corrected firmware range from comment #36, and also with CONFIG_8139TOO_PIO=y This allows me to boot, as the 8139too module will load. I booted with pci=earlydump; attached is the dmesg output. Then I tried modprobe sdhci_pci, and that hung the system. So this patch does not fix the underlying conflict issue. Bjorn -- you and Arne obviously know much more about these device driver things than I do, but I somehow thought the 8139too and sdhci_pci devices (i.e. the ethernet card and the SD card reader) were conflicting with each other, not (or not just) with some other firmware range? Anyway, I'll let you sort that out. Hope this helps... --Jennifer Sorry for the late answer. I had no time last month. I hope i find time to test it next days and add an additional read after write. Maybee the register can't correct written. @Jennifer. The PCI devices are not conflicting with other pci devices but all of them are in this firmware area... Bit 15 cannot be cleared. after writing 0 to it i get pci 0000:00:1e.0: FWH_DEC_EN1: 0x8000 Arne Hi at least i have checked the ICH7 Datasheets and found that bit15 of FH_DEC_EN1 is always on and can't disabled. But there are other config bits to disable the whole Firmware Hub address decoding. RCBA + 3410 bit11:10 has set to 10 to access the PCI and not the FWH. But this bits are set correctly so i think the Firmware Hub is not the conflicting device. http://www.intel.com/assets/pdf/datasheet/307013.pdf I have also checked the bit 3 of RCBA+3401 It is also set to zero but i think this is a typo in the datasheet. (Footnote at MemoryMap table) Is there a way to reserve this memory, if we detect the Twinhead H12Y at boot. if i use reserve=0xFFB00000,100000 at bootprompt any kernel boots without problems. Confirmed still present in 3.4rc6 Can you attach the output of dmidecode Thanks Alan Created attachment 73279 [details]
Output of dmidecode
Created attachment 73289 [details]
Test patch to reserve the problem region
Test patch
The patch does not fix the problem. The hardware detection works (Reserve memory on H12Y message is showed) but the pci devices are not moved out of this area. Created attachment 73299 [details]
working patch
No i've got it to work.
Changed request_region to request_mem_region and
the matching device to lpc because older kernel (2.6.32) overwrite the subvendor informations of the bridge with zero.
Ok updated and submitted to x86 maintainers (In reply to comment #48) > Ok updated and submitted to x86 maintainers Will this also be migrated to the 64bit kernel? I have been following this bug with some interest after recently inheriting a "Phillips Freevents x53" laptop which is also a re badged Twinhead H12Y. The machine originally came with an Intel T2050 cpu, unfortunately this is a non PAE capable chip - which causes some problems with Ubuntu 12.04. I have successfully upgraded the BIOS to the latest version (1.08), which provides support for Merom Stepping B2 processors, and I have upgraded the CPU for a T5200 cpu which is a 64bit processor. I have successfully installed Ubuntu 12.04 both 32bit (using the mini ISO) and 64bit - using Arne's memory reserving boot parameter. However the final problem is the SD card reader (in fact the main reason for swapping processors was reading somewhere that the sd card had been fixed in the 64bit kernel). I'm not familiar with the bug fixing system for the kernel, hence the initial question. It'll cover both 32 and 64bit A patch referencing this bug report has been merged in Linux v3.5-rc1: commit 80b3e557371205566a71e569fbfcce5b11f92dbe Author: Alan Cox <alan@linux.intel.com> Date: Tue May 15 18:44:15 2012 +0100 x86: Fix boot on Twinhead H12Y |