Bug 30552
Summary: | kernel oops with snd-hda-intel: BUG: unable to handle kernel paging request at ffffc90011c08000 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Paul Menzel (paulepanter) |
Component: | PCI | Assignee: | Bjorn Helgaas (bjorn) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | bjorn, florian, hpa, jbarnes, jrnieder, lewscarroll, paulepanter, tiwai, yinghai |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
output of `dmesg` from Debian Linux kernel 3.0.0 with option `ignore_loglevel`
dmesg for failing kernel with 'debug ignore_loglevel' boot log with kernel 'debug ignore_loglevel pci=use_crs' options added. output of `dmidecode`(from Debian Sid/unstable 2.9-1.2) [PATCH] x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE output of `dmidecode` of board ASUS M2V-MX SE running coreboot |
Description
Paul Menzel
2011-03-06 13:40:12 UTC
This is #613979 in the Debian BTS. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613979 I also sent a message to alsa-devel [1], but got no replies. [1] http://mailman.alsa-project.org/pipermail/alsa-devel/2011-February/037063.html Try to replace sound/pci/hda/* files with the ones in 2.6.32.x tree. Run "make oldconfig" and add #include <linux/slab.h> if a build error occurs. If this works, the problem is somewhere in HD-audio side. If the problem persists even after replacing with the old code, it must be in a lower level, e.g. PCI core. $ git bisect start 0f2cc4ecd81dc1917a041dc93db0ada28f8356fa 8724fdb53d27d7b59b60c8a399cc67f9abfabb33 points to the following commit as the culprit [1]. $ git show --stat 3e3da00c commit 3e3da00c01d050307e753fb7b3e84aefc16da0d0 Author: Yinghai Lu <yinghai@kernel.org> Date: Wed Feb 10 01:20:09 2010 -0800 x86/pci: AMD one chain system to use pci read out res Found MSI amd k8 based laptops is hiding [0x70000000, 0x80000000) RAM from e820. enable amd one chain even for all. -v2: use bool for found, according to Andrew Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-6-git-send-email-yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> arch/x86/pci/amd_bus.c | 7 ++++--- arch/x86/pci/bus_numa.c | 5 ----- arch/x86/pci/bus_numa.h | 1 - 3 files changed, 4 insertions(+), 9 deletions(-) In `Makefile` it says this is version 2.6.33-rc7. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613979#122 I am not sure who to assign that report to and who to put into CC. Is that even allowed? I am just putting the patch autho and myself into the CC list. Should the mailing lists be added too? It would be great if somebody else could update the other fields (Assigned to, Product and Component). I found the following entries in `MAINTAINERS`. PCI SUBSYSTEM M: Jesse Barnes <jbarnes@virtuousgeek.org> L: linux-pci@vger.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6.git S: Supported F: Documentation/PCI/ F: drivers/pci/ F: include/linux/pci* X86 ARCHITECTURE (32-BIT AND 64-BIT) M: Thomas Gleixner <tglx@linutronix.de> M: Ingo Molnar <mingo@redhat.com> M: "H. Peter Anvin" <hpa@zytor.com> M: x86@kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git S: Maintained F: Documentation/x86/ F: arch/x86/ Does reverting the commit fix the problem? Regarding the people involved: at least, you can put people listed in the sign-off of the commit log. If this commit is really the culprit, change the bug category/component appropriately. (BTW, the commit was merged to upstream first in 2.6.34-rc1. You can check it via "git describe --contains $ID") (In reply to comment #6) > Does reverting the commit fix the problem? Can I savely revert that commit from Linux kernel 3.x? > Regarding the people involved: at least, you can put people listed in the > sign-off of the commit log. Understood. > If this commit is really the culprit, change the bug category/component > appropriately. I could not find PCI looking at category/component. Could you help me there please? > (BTW, the commit was merged to upstream first in 2.6.34-rc1. You can check > it > via "git describe --contains $ID") Strange. Your command finds the following tag as you wrote. $ git describe --contains b74fd238a9 v2.6.34-rc1~218^2~27 But looking at `Makefile` there are other versions. $ git grep -C3 SUBLEVEL b74fd238a9 -- Makefile b74fd238a9:Makefile-VERSION = 2 b74fd238a9:Makefile-PATCHLEVEL = 6 b74fd238a9:Makefile:SUBLEVEL = 33 b74fd238a9:Makefile-EXTRAVERSION = -rc7 b74fd238a9:Makefile-NAME = Man-Eating Seals of Antiquity b74fd238a9:Makefile- […] Adding the Debian developer helping with fixing this bug. (In reply to comment #7) > (In reply to comment #6) > > Does reverting the commit fix the problem? > > Can I savely revert that commit from Linux kernel 3.x? I suppose so. Just give it a try. > > If this commit is really the culprit, change the bug category/component > appropriately. > > I could not find PCI looking at category/component. Could you help me there > please? I meant the bugzilla "Products" and "Component" tags. > > (BTW, the commit was merged to upstream first in 2.6.34-rc1. You can check > it > > via "git describe --contains $ID") > > Strange. Your command finds the following tag as you wrote. > > $ git describe --contains b74fd238a9 > v2.6.34-rc1~218^2~27 > > But looking at `Makefile` there are other versions. > > $ git grep -C3 SUBLEVEL b74fd238a9 -- Makefile > b74fd238a9:Makefile-VERSION = 2 > b74fd238a9:Makefile-PATCHLEVEL = 6 > b74fd238a9:Makefile:SUBLEVEL = 33 > b74fd238a9:Makefile-EXTRAVERSION = -rc7 > b74fd238a9:Makefile-NAME = Man-Eating Seals of Antiquity > b74fd238a9:Makefile- > […] It means that the patch was written based on 2.6.33-rc7, but merged to the upstream in 2.6.34-rc1. (In reply to comment #9) > (In reply to comment #7) > > (In reply to comment #6) > > > Does reverting the commit fix the problem? > > > > Can I savely revert that commit from Linux kernel 3.x? > > I suppose so. Just give it a try. All right. I will try v3.0. > > > If this commit is really the culprit, change the bug category/component > appropriately. > > > > I could not find PCI looking at category/component. Could you help me there > > please? > > I meant the bugzilla "Products" and "Component" tags. Me too. ;-) But I do not know what entries to choose when it concerns x86/pci. > > > (BTW, the commit was merged to upstream first in 2.6.34-rc1. You can > check it > > > via "git describe --contains $ID") > > > > Strange. Your command finds the following tag as you wrote. > > > > $ git describe --contains b74fd238a9 > > v2.6.34-rc1~218^2~27 > > > > But looking at `Makefile` there are other versions. > > > > $ git grep -C3 SUBLEVEL b74fd238a9 -- Makefile > > b74fd238a9:Makefile-VERSION = 2 > > b74fd238a9:Makefile-PATCHLEVEL = 6 > > b74fd238a9:Makefile:SUBLEVEL = 33 > > b74fd238a9:Makefile-EXTRAVERSION = -rc7 > > b74fd238a9:Makefile-NAME = Man-Eating Seals of Antiquity > > b74fd238a9:Makefile- > > […] > > It means that the patch was written based on 2.6.33-rc7, but merged to the > upstream in 2.6.34-rc1. Thank you for the explanation.That would explain now looking at my protocols 2.6.33-rc8 worked in my tests. But I had forgotten about that during bisection. (In reply to comment #10) > (In reply to comment #9) > > (In reply to comment #7) > > > (In reply to comment #6) > > > > Does reverting the commit fix the problem? > > > > > > Can I savely revert that commit from Linux kernel 3.x? > > > > I suppose so. Just give it a try. > > All right. I will try v3.0. Well that did not work well and did not even compile. $ git checkout v3.0 $ git revert 3e3da00c $ cp -a /boot/config-3.0.0-1-686-pae .config $ make oldconfig $ make localmodconfig $ time make -j2 deb-pkg CC arch/x86/lib/memmove_64.o gcc: error: arch/x86/lib/memmove_64.c: Datei oder Verzeichnis nicht gefunden gcc: fatal error: no input files compilation terminated. make[3]: *** [arch/x86/lib/memmove_64.o] Fehler 1 make[2]: *** [arch/x86/lib] Fehler 2 make[1]: *** [deb-pkg] Fehler 2 make: *** [deb-pkg] Fehler 2 So I tried 2.6.34. This build fine but booting stopped with the following error. [0xc80000… - 0xcf000000… pref] conflicts with GART request_module: runaway loop modprobe binfmt-464c So I am not able to revert that commit, guessing that other later changes depend on it. Yinghai, do you have any ideas? I am going to add the maintainers to CC. […] On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > > Yinghai, do you have any ideas? I am going to add the maintainers to CC. can you post whole bootlog with "debug ignore_loglevel" Thanks Yinghai (In reply to comment #12) > On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > > > > Yinghai, do you have any ideas? I am going to add the maintainers to CC. > > can you post whole bootlog with "debug ignore_loglevel" I tried, but even adding only `debug`to the kernel command line the boot stops after two seconds and I do not even get the the cryptsetup LUKS passphrase dialog. Thank you for looking into this, Paul (In reply to comment #11) > $ git checkout v3.0 > $ git revert 3e3da00c > $ cp -a /boot/config-3.0.0-1-686-pae .config > $ make oldconfig > $ make localmodconfig > $ time make -j2 deb-pkg > CC arch/x86/lib/memmove_64.o > gcc: error: arch/x86/lib/memmove_64.c: Datei oder Verzeichnis nicht > gefunden From "git log -- arch/x86/lib/memmove_64.c", I learn that it was replaced by an assembler file. Hopefully rm arch/x86/lib/.memmove_64.o.cmd would get the build working again --- sorry for the fuss. Created attachment 69022 [details]
output of `dmesg` from Debian Linux kernel 3.0.0 with option `ignore_loglevel`
(In reply to comment #13) > (In reply to comment #12) > > On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > Yinghai, do you have any ideas? I am going to add the maintainers to CC. > > > > can you post whole bootlog with "debug ignore_loglevel" > > I tried, but even adding only `debug`to the kernel command line the boot > stops > after two seconds and I do not even get the the cryptsetup LUKS passphrase > dialog. This is also reproducible with Linux v3.0.0 from Debian. Only adding `ignore_loglevel` works though, so maybe the dmesg output I attach has enough information. (And this has nothing to do with Please note that Clemens already pointed out what went wrong from the stack trace [1], but I guess you still need more information. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613979#51 (In reply to comment #14) > (In reply to comment #11) > > > $ git checkout v3.0 > > $ git revert 3e3da00c > > $ cp -a /boot/config-3.0.0-1-686-pae .config > > $ make oldconfig > > $ make localmodconfig > > $ time make -j2 deb-pkg > > CC arch/x86/lib/memmove_64.o > > gcc: error: arch/x86/lib/memmove_64.c: Datei oder Verzeichnis nicht > gefunden > > From "git log -- arch/x86/lib/memmove_64.c", I learn that it > was replaced by an assembler file. Hopefully > > rm arch/x86/lib/.memmove_64.o.cmd > > would get the build working again --- sorry for the fuss. Jonathan, thank you for clearing that up and explaining the issue and how you figured it out. That helps a lot. I will try another build, but since the revert did not work with v2.6.34 (unbootable) I doubt that it will work with v3.0. (In reply to comment #16) > (In reply to comment #13) > > (In reply to comment #12) > > > On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > Yinghai, do you have any ideas? I am going to add the maintainers to > CC. > > > > > > can you post whole bootlog with "debug ignore_loglevel" > > > > I tried, but even adding only `debug`to the kernel command line the boot > stops > > after two seconds and I do not even get the the cryptsetup LUKS passphrase > > dialog. > > This is also reproducible with Linux v3.0.0 from Debian. Only adding > `ignore_loglevel` works though, so maybe the dmesg output I attach has enough > information. (And this has nothing to do with … this bug since this is also reproducible with 2.6.3{2,3}.) So this must be another problem that some buffer is too small. But I do not know how `debug` works. […] Created attachment 69042 [details]
dmesg for failing kernel with 'debug ignore_loglevel'
Attached is the output of the failing kernel 2.6.33-rc7-12 with debug and ignore_loglevel options set. Hope this helps to finally solve the regression for my AMD Athlon box.
it seems it is the same problem to http://us.generation-nt.com/answer/x86-pci-oops-config-snd-hda-intel-help-198326691.html and you could use pci=use_crs or just revert that commit. Thanks The similar bug Yinghai mentioned is https://bugzilla.kernel.org/show_bug.cgi?id=16007 which is handled using a quirk to enable use_crs (v2.6.36-rc1~518^2~8, 2010-07-23). I wonder if it is safe to enable pci=use_crs by default on old machines nowadays. Cc-ing Bjorn Helgaas in case he has thoughts on this. (Context: we were discussing [1].) [1] https://bugzilla.kernel.org/show_bug.cgi?id=30552 (In reply to comment #20) > it seems it is the same problem to > > http://us.generation-nt.com/answer/x86-pci-oops-config-snd-hda-intel-help-198326691.html I am sorry that I did not find that bug report #16007 [1] myself. The search engines must have a hard time finding it because `dmesg` output was just attached and not pasted. > and you could use > > pci=use_crs That indeed fixes the problem. I am changing the component therefore to PCI, which I somehow overlooked beforehand. … And I am not allowed to do so because I do not have the permission. Only Jaroslav or “administrators” are allowed to do this. > or just revert that commit. I will try that soon. I have two more questions. 1. Why was #16007 marked as resolved? I could not find when this was done and what commit fixed it. git log origin/master -- arch/x86/pci/amd_bus* 2. What is the correct upstream fix? `Documentation/kernel-parameters.txt` says to report it when `pci=use_crs` is needed. [1] https://bugzilla.kernel.org/show_bug.cgi?id=16007 [2] http://packages.debian.org/changelogs/pool/main/l/linux-2.6/linux-2.6_3.0.0-2/changelog Thanks, Paul I'm really sorry this issue has been languishing so long. Thanks for pointing it out to me, Jonathan. Bug #16007 was resolved by 2491762cfb475dbd (found by "git log | less +/bugzilla.*16007"). This looks like exactly the same problem. amd_bus.c reads HyperTransport routing registers, but that doesn't tell us about PCI host bridges. Looking at attachment #69022 [details], this: bus: [00, ff] on node 0 link 0 bus: 00 index 1 [mem 0x80000000-0xfcffffffff] tells us [mem 0x80000000-0xfcffffffff] is routed to PCI buses 00-ff. We have to assume a host bridge leading to bus 00. PCI buses are hierarchical, so there would have to be a PCI-to-PCI bridge that leads from bus 00 to the sound device on bus 20. But there is no such bridge, so all we can do is assume bus 20 is in another PCI hierarchy with its own MMIO space that's separate from the [mem 0x80000000-0xfcffffffff] found by amd_bus.c. That leads to this incorrect reassignment: pci 0000:20:01.0: reg 10: [mem 0xfbffc000-0xfbffffff 64bit] pci 0000:20:01.0: address space collision: [mem 0xfbffc000-0xfbffffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff] pci 0000:20:01.0: BAR 0: assigned [mem 0xfd00000000-0xfd00003fff 64bit] The original [mem 0xfbffc000-0xfbffffff 64bit] setting was done by the BIOS and is just fine. If we pay attention to the ACPI _CRS information (as with "pci=use_crs"), we learn about a second host bridge that leads to the sound device on bus 20: ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 20-ff]) pci_root PNP0A08:00: host bridge window [mem 0xfbf00000-0xfbffffff] (ignored) pci 0000:20:01.0: reg 10: [mem 0xfbffc000-0xfbffffff 64bit] This is probably on the same HT chain (node 0 link 0), so it's consistent with what amd_bus found. It's just that amd_bus can't tell us about *host bridges* on that chain. The fix? We could easily add another quirk to turn on "pci=use_crs" automatically. I hesitate a little because we have two of those quirks already, and we could end up adding many more as we trip over the same issue again and again. I'm not aware of any current problems with the PCI _CRS code, and I'm pretty confident that Windows has been using PCI _CRS for a long time, so we might consider moving the date cutoff (currently we disable it for BIOS dates before 2008) or removing it altogether. I'll research this a bit and come back with a proposal. Again, I'm sorry that it took so long to figure this out. (In reply to comment #23) […] > Bug #16007 was resolved by 2491762cfb475dbd (found by "git log | less > +/bugzilla.*16007"). Thank you. I should have figured this out myself. Git even has an option `--grep`. $ git log --grep=zilla.*16007 origin/master […] > I'll research this a bit and come back with a proposal. Again, I'm sorry > that > it took so long to figure this out. Thank you very much for looking into this issue! Created attachment 69372 [details]
boot log with kernel 'debug ignore_loglevel pci=use_crs' options added.
Yippee!
The computer boots properly with the added kernel option pci=use_crs. Attached is a complete boot log. There are still warnings about address space collisions, though. Hopefully you can come up with a permanent solution of this regression soon.
Thanks everyone!
Finally running 3.0.0-1 on Debian Testing. Thank-you for taking the time to research this issue. (In reply to comment #26) > Finally running 3.0.0-1 on Debian Testing. Thank-you for taking the time to > research this issue. Have you reported your issues somewhere already? If you have a different system than already reported, could you open a new report and attach there the information about your system? `dmesg` and relevant sections of `dmidecode` should be enough. Created attachment 71152 [details]
output of `dmidecode`(from Debian Sid/unstable 2.9-1.2)
I am attaching the output of `dmidecode`. Unfortunately ASUS does not seem to fill out all tables.
$ dmesg | grep -i m2v
[ 0.000000] DMI: System manufacturer System Product Name/M2V-MX SE, BIOS 0304 10/30/2007
Created attachment 71172 [details]
[PATCH] x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE
Bjorn is working on a real fix. But until he has time and this gets in it would be great to get this quirk in.
Since the ASUS M2V-MX SE is also supported by coreboot [1] I checked with the developer and a user. 1. There is no problem with boards running coreboot since there is only one root bus. 14:46 < PaulePanter> ruik: On ASUS M2V-MX SE did you ever experience <https://bugzilla.kernel.org/show_bug.cgi?id=30552>? 14:46 < PaulePanter> Or did you never run Linux kernels versions greater than 2.6.32? 14:58 < ruik> PaulePanter: let me check it 14:59 < mue_> PaulePanter: works fine here with >2.6.39 15:00 < ruik> PaulePanter: i can try evening with 3.0.1 15:00 < mue_> 3.0/3.1 both work 15:01 < ruik> PaulePanter: comment #23 is nice one 15:02 < ruik> the bus 20h is really new bus 15:03 < ruik> PaulePanter: this would not be seen with coreboot, where I configured the chipset to add a bus bridge 15:03 < ruik> to the sound card 15:03 < ruik> i dont know why they choose to do it this way 15:08 < PaulePanter> ruik, mue_: Thank you for checking. 15:09 < PaulePanter> ruik: So this is a BIOS bug? 15:09 < PaulePanter> ruik: Adding `pci=use_crs` to the Linux kernel command line will not harm boards running coreboot, it? 15:10 < ruik> PaulePanter: no it is linux bug because it does not expect that same HT chain 15:10 < ruik> has two pci root busses 15:10 < ruik> (i guess0 15:10 < ruik> ) 15:10 < PaulePanter> I see. 15:10 < ruik> PaulePanter: the pci=use_crs should work fine I can test in the evening 15:10 < PaulePanter> ruik: Thank you. 15:11 < PaulePanter> ruik, mue_: Can I paste this conversation into the bugzilla entry? 15:11 < mue_> PaulePanter: i am running coreboot on the board 15:11 < ruik> well if you think it helps... 15:12 < ruik> also the quirk will not match coreboot DMI 15:12 < ruik> or maybe it will 15:12 < ruik> it depends if it puts there Asus or Asustek 15:12 < PaulePanter> Ah you are right. 15:13 < PaulePanter> In the latter case the patch does not impact boards running coreboot at all. 15:13 < ruik> but coreboot does not need that 15:13 < ruik> because i have there only one root bus 15:13 < ruik> the audio is behind a bridge 15:14 < PaulePanter> I will add that to the commit message. 15:14 < PaulePanter> mue_: Could you paste the 15:14 < PaulePanter> … dmidecode output somewhere, please? 15:15 < PaulePanter> bbl and thank you all. 15:16 < ruik> ok 15:16 < mue_> PaulePanter: http://dpaste.com/609869/ 15:16 < ruik> mue_: if you use more recent coreboot it will have 15:16 < ruik> a more fancy one 15:16 < ruik> (plus you need master of seabios) 2. There should not be a problem with the patch/quirk and boards using coreboot because the output of `dmidecode` differs. [1] http://www.coreboot.org/ Created attachment 71842 [details]
output of `dmidecode` of board ASUS M2V-MX SE running coreboot
output of `dmidecode` of board ASUS M2V-MX SE running coreboot
A patch referencing this bug report has been merged in Linux v3.1-rc10: commit 29cf7a30f8a0ce4af2406d93d5a332099be26923 Author: Paul Menzel <paulepanter@users.sourceforge.net> Date: Wed Aug 31 17:07:10 2011 +0200 x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE Paul, can you confirm that 3.1 or later boots correctly without having to supply "pci=use_crs"? If so, I think we can close this. I suspect this is fixed for Paul but not for Svante. Svante, can you confirm? If so, please provide DMI information (using "dmesg | grep DMI", "grep . /sys/class/dmi/id/*", or "dmidecode"). Erm, Svante doesn't seem to be on the cc list. I'll ping him; sorry for the noise. (In reply to comment #33) > Paul, can you confirm that 3.1 or later boots correctly without having to > supply "pci=use_crs"? If so, I think we can close this. Yes, everything works fine for me now. Thank you again for your work and I am sorry for not updating this report myself. (In reply to comment #35) > Erm, Svante doesn't seem to be on the cc list. I'll ping him; sorry for the > noise. Oh, I closed this bug now. :/ I am sorry. I submitted a new report #42619 dedicated to Svante’s board MS-7253. I will try to update the forwarded to address in the Debian BTS [1] tomorrow. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=619034 A patch referencing a commit referencing this bug report has been merged in Linux v3.3-rc7: commit 8411371709610c826bf65684f886bfdfb5780ca1 Author: Jonathan Nieder <jrnieder@gmail.com> Date: Tue Feb 28 11:51:10 2012 -0700 x86/PCI: use host bridge _CRS info on MSI MS-7253 |