Bug 30552

Summary: kernel oops with snd-hda-intel: BUG: unable to handle kernel paging request at ffffc90011c08000
Product: Drivers Reporter: Paul Menzel (paulepanter)
Component: PCIAssignee: Bjorn Helgaas (bjorn)
Status: RESOLVED CODE_FIX    
Severity: high CC: bjorn, florian, hpa, jbarnes, jrnieder, lewscarroll, paulepanter, tiwai, yinghai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: output of `dmesg` from Debian Linux kernel 3.0.0 with option `ignore_loglevel`
dmesg for failing kernel with 'debug ignore_loglevel'
boot log with kernel 'debug ignore_loglevel pci=use_crs' options added.
output of `dmidecode`(from Debian Sid/unstable 2.9-1.2)
[PATCH] x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE
output of `dmidecode` of board ASUS M2V-MX SE running coreboot

Description Paul Menzel 2011-03-06 13:40:12 UTC
After upgrading from 2.6.32.28 to 2.6.37.1 the sound does not work anymore. The module `snd-hda-intel` gives an Ooops. I can also reproduce this with Linux 2.6.37.2.

** Kernel log:
[   41.265636] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[   41.680357] HDA Intel 0000:20:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[   41.680444] HDA Intel 0000:20:01.0: setting latency timer to 64
[   41.680462] BUG: unable to handle kernel paging request at ffffc90011c08000
[   41.680617] IP: [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
[   41.680728] PGD 13781a067 PUD 13781b067 PMD 1300ba067 PTE 800000fd00000173
[   41.680956] Oops: 0009 [#1] SMP 
[   41.681098] last sysfs file: /sys/module/snd_pcm/initstate
[   41.681159] CPU 0 
[   41.681203] Modules linked in: snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event tpm_tis tpm snd_seq tpm_bios psmouse parport_pc snd_timer snd_seq_device parport processor evdev snd i2c_viapro thermal_sys amd64_edac_mod k8temp i2c_core soundcore shpchp pcspkr serio_raw asus_atk0110 pci_hotplug edac_core button snd_page_alloc edac_mce_amd ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod raid1 md_mod usbhid hid sg sd_mod crc_t10dif sr_mod cdrom ata_generic uhci_hcd sata_via pata_via libata ehci_hcd usbcore scsi_mod via_rhine mii nls_base [last unloaded: scsi_wait_scan]
[   41.684180] 
[   41.684180] Pid: 1153, comm: work_for_cpu Not tainted 2.6.37-1-amd64 #1 M2V-MX SE/System Product Name
[   41.684180] RIP: 0010:[<ffffffffa0578402>]  [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
[   41.684180] RSP: 0018:ffff88013153fe50  EFLAGS: 00010286
[   41.684180] RAX: ffffc90011c08000 RBX: ffff88013029ec00 RCX: 0000000000000006
[   41.684180] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[   41.684180] RBP: ffff88013341d000 R08: 0000000000000000 R09: 0000000000000040
[   41.684180] R10: 0000000000000286 R11: 0000000000003731 R12: ffff88013029c400
[   41.684180] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88013341d090
[   41.684180] FS:  0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:00000000f7610ab0
[   41.684180] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   41.684180] CR2: ffffc90011c08000 CR3: 0000000132f57000 CR4: 00000000000006f0
[   41.684180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   41.684180] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   41.684180] Process work_for_cpu (pid: 1153, threadinfo ffff88013153e000, task ffff8801303c86c0)
[   41.684180] Stack:
[   41.684180]  0000000000000005 ffffffff8123ad65 00000000000136c0 ffff88013029c400
[   41.684180]  ffff8801303c8998 ffff88013341d000 ffff88013341d090 ffff8801322d9dc8
[   41.684180]  ffff88013341d208 0000000000000000 0000000000000000 ffffffff811ad232
[   41.684180] Call Trace:
[   41.684180]  [<ffffffff8123ad65>] ? __pm_runtime_set_status+0x162/0x186
[   41.684180]  [<ffffffff811ad232>] ? local_pci_probe+0x49/0x92
[   41.684180]  [<ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b
[   41.684180]  [<ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b
[   41.684180]  [<ffffffff8105afd0>] ? do_work_for_cpu+0xb/0x1b
[   41.684180]  [<ffffffff8105fd3f>] ? kthread+0x7a/0x82
[   41.684180]  [<ffffffff8100a824>] ? kernel_thread_helper+0x4/0x10
[   41.684180]  [<ffffffff8105fcc5>] ? kthread+0x0/0x82
[   41.684180]  [<ffffffff8100a820>] ? kernel_thread_helper+0x0/0x10
[   41.684180] Code: f4 01 00 00 ef 31 f6 48 89 df e8 29 dd ff ff 85 c0 0f 88 2b 03 00 00 48 89 ef e8 b4 39 c3 e0 8b 7b 40 e8 fc 9d b1 e0 48 8b 43 38 <66> 8b 10 66 89 14 24 8b 43 14 83 e8 03 83 f8 01 77 32 31 d2 be 
[   41.684180] RIP  [<ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
[   41.684180]  RSP <ffff88013153fe50>
[   41.684180] CR2: ffffc90011c08000
[   41.684180] ---[ end trace 8d1f3ebc136437fd ]---
[   42.127750] input: ImPS/2 Logitech Wheel Mouse as /devices/platform/i8042/serio1/input/input5
[   63.455908] EXT3-fs (dm-1): using internal journal
[   63.800394] loop: module loaded
[   65.413037] Adding 4194300k swap on /dev/mapper/speicher-swap.  Priority:-1 extents:1 across:4194300k 
[   67.008724] fuse init (API version 7.15)
[   67.140105] EXT3-fs: barriers not enabled
[   67.142596] kjournald starting.  Commit interval 5 seconds
[   67.147077] EXT3-fs (md0): using internal journal
[   67.147221] EXT3-fs (md0): mounted filesystem with ordered data mode
[   67.217115] EXT3-fs: barriers not enabled
[   67.220248] kjournald starting.  Commit interval 5 seconds
[   67.247088] EXT3-fs (dm-6): using internal journal
[   67.247236] EXT3-fs (dm-6): mounted filesystem with ordered data mode
[   67.357741] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[   67.360958] SGI XFS Quota Management subsystem
[   67.411476] XFS mounting filesystem dm-7
[   70.259895] Ending clean XFS mount for filesystem: dm-7
[   70.309239] XFS mounting filesystem dm-8
[   70.603227] Ending clean XFS mount for filesystem: dm-8
[   70.676209] REISERFS (device dm-5): found reiserfs format "3.6" with standard journal
[   70.676322] REISERFS (device dm-5): using ordered data mode
[   70.684321] REISERFS (device dm-5): journal params: device dm-5, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
[   70.685757] REISERFS (device dm-5): checking transaction log (dm-5)
[   70.708207] REISERFS (device dm-5): Using r5 hash to sort names
[   70.774065] EXT3-fs: barriers not enabled
[   70.779309] kjournald starting.  Commit interval 5 seconds
[   70.779606] EXT3-fs (dm-3): using internal journal
[   70.779753] EXT3-fs (dm-3): mounted filesystem with ordered data mode
[   70.822113] EXT3-fs: barriers not enabled
[   70.824247] kjournald starting.  Commit interval 5 seconds
[   70.824511] EXT3-fs (dm-4): using internal journal
[   70.824658] EXT3-fs (dm-4): mounted filesystem with ordered data mode
[   90.491792] powernow-k8: Found 1 AMD Athlon(tm) X2 Dual Core Processor BE-2350 (2 cpu cores) (version 2.20.00)
[   90.491843] powernow-k8:    0 : fid 0xd (2100 MHz), vid 0xe
[   90.491846] powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xf
[   90.491849] powernow-k8:    2 : fid 0xa (1800 MHz), vid 0x11
[   90.491851] powernow-k8:    3 : fid 0x2 (1000 MHz), vid 0x16
[   91.004026] Clocksource tsc unstable (delta = -238711064 ns)
[  245.332887] kvm: Nested Virtualization enabled
[  249.841785] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[  251.559202] ip_tables: (C) 2000-2006 Netfilter Core Team
[  251.825350] ip6_tables: (C) 2000-2006 Netfilter Core Team
[  257.927528] lo: Disabled Privacy Extensions
[  259.959208] [drm] Initialized drm 1.1.0 20060810
[  260.017917] pci 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  260.020451] [drm] Initialized via 2.11.1 20070202 for 0000:01:00.0 on minor 0
[  260.021056] ioctl32(Xorg:2800): Unknown cmd fd(11) cmd(c0106407){t:'d';sz:16} arg(ffa55520) on /dev/dri/card0
[  260.021104] ioctl32(Xorg:2800): Unknown cmd fd(11) cmd(c0106407){t:'d';sz:16} arg(ffa55520) on /dev/dri/card0
[  260.021121] ioctl32(Xorg:2800): Unknown cmd fd(11) cmd(c0086401){t:'d';sz:8} arg(ffa55518) on /dev/dri/card0
[  260.089858] ioctl32(Xorg:2800): Unknown cmd fd(11) cmd(c0246400){t:'d';sz:36} arg(097ad450) on /dev/dri/card0
[  260.174797] ioctl32(Xorg:2800): Unknown cmd fd(11) cmd(c0246400){t:'d';sz:36} arg(097ad450) on /dev/dri/card0
[  260.400043] eth0: no IPv6 routers present
[  637.451384] JFS: nTxBlock = 8192, nTxLock = 65536
[  637.495381] NTFS driver 2.1.29 [Flags: R/W MODULE].
[  637.568054] QNX4 filesystem 0.2.3 registered.
[  637.642781] Btrfs loaded
Comment 1 Paul Menzel 2011-03-06 13:42:35 UTC
This is #613979 in the Debian BTS.

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613979
Comment 2 Paul Menzel 2011-03-06 13:53:15 UTC
I also sent a message to alsa-devel [1], but got no replies.

[1] http://mailman.alsa-project.org/pipermail/alsa-devel/2011-February/037063.html
Comment 3 Takashi Iwai 2011-03-07 10:27:40 UTC
Try to replace sound/pci/hda/* files with the ones in 2.6.32.x tree.  Run "make oldconfig" and add #include <linux/slab.h> if a build error occurs.

If this works, the problem is somewhere in HD-audio side.  If the problem persists even after replacing with the old code, it must be in a lower level, e.g. PCI core.
Comment 4 Paul Menzel 2011-08-16 06:29:33 UTC
         $ git bisect start 0f2cc4ecd81dc1917a041dc93db0ada28f8356fa 8724fdb53d27d7b59b60c8a399cc67f9abfabb33

points to the following commit as the culprit [1].

        $ git show --stat 3e3da00c
        commit 3e3da00c01d050307e753fb7b3e84aefc16da0d0
        Author: Yinghai Lu <yinghai@kernel.org>
        Date:   Wed Feb 10 01:20:09 2010 -0800

            x86/pci: AMD one chain system to use pci read out res
            
            Found MSI amd k8 based laptops is hiding [0x70000000, 0x80000000) RAM
            from e820.
            
            enable amd one chain even for all.
            
            -v2: use bool for found, according to Andrew
            
            Signed-off-by: Yinghai Lu <yinghai@kernel.org>
            LKML-Reference: <1265793639-15071-6-git-send-email-yinghai@kernel.org>
            Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
            Signed-off-by: H. Peter Anvin <hpa@zytor.com>

         arch/x86/pci/amd_bus.c  |    7 ++++---
         arch/x86/pci/bus_numa.c |    5 -----
         arch/x86/pci/bus_numa.h |    1 -
         3 files changed, 4 insertions(+), 9 deletions(-)

In `Makefile` it says this is version 2.6.33-rc7.

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613979#122
Comment 5 Paul Menzel 2011-08-16 06:41:17 UTC
I am not sure who to assign that report to and who to put into CC. Is that even allowed? I am just putting the patch autho and myself into the CC list. Should the mailing lists be added too?

It would be great if somebody else could update the other fields (Assigned to, Product and Component). I found the following entries in `MAINTAINERS`.

        PCI SUBSYSTEM
        M:	Jesse Barnes <jbarnes@virtuousgeek.org>
        L:	linux-pci@vger.kernel.org
        T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6.git
        S:	Supported
        F:	Documentation/PCI/
        F:	drivers/pci/
        F:	include/linux/pci*

        X86 ARCHITECTURE (32-BIT AND 64-BIT)
        M:	Thomas Gleixner <tglx@linutronix.de>
        M:	Ingo Molnar <mingo@redhat.com>
        M:	"H. Peter Anvin" <hpa@zytor.com>
        M:	x86@kernel.org
        T:	git git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
        S:	Maintained
        F:	Documentation/x86/
        F:	arch/x86/
Comment 6 Takashi Iwai 2011-08-16 07:07:48 UTC
Does reverting the commit fix the problem?

Regarding the people involved: at least, you can put people listed in the sign-off of the commit log.  If this commit is really the culprit, change the bug category/component appropriately.

(BTW, the commit was merged to upstream first in 2.6.34-rc1.  You can check it via "git describe --contains $ID")
Comment 7 Paul Menzel 2011-08-16 07:32:06 UTC
(In reply to comment #6)
> Does reverting the commit fix the problem?

Can I savely revert that commit from Linux kernel 3.x?

> Regarding the people involved: at least, you can put people listed in the
> sign-off of the commit log.

Understood.

> If this commit is really the culprit, change the bug category/component
> appropriately.

I could not find PCI looking at category/component. Could you help me there please?

> (BTW, the commit was merged to upstream first in 2.6.34-rc1.  You can check
> it
> via "git describe --contains $ID")

Strange. Your command finds the following tag as you wrote.

        $ git describe --contains b74fd238a9
        v2.6.34-rc1~218^2~27

But looking at `Makefile` there are other versions.

        $ git grep -C3 SUBLEVEL  b74fd238a9 -- Makefile
        b74fd238a9:Makefile-VERSION = 2
        b74fd238a9:Makefile-PATCHLEVEL = 6
        b74fd238a9:Makefile:SUBLEVEL = 33
        b74fd238a9:Makefile-EXTRAVERSION = -rc7
        b74fd238a9:Makefile-NAME = Man-Eating Seals of Antiquity
        b74fd238a9:Makefile-
        […]
Comment 8 Paul Menzel 2011-08-16 07:34:12 UTC
Adding the Debian developer helping with fixing this bug.
Comment 9 Takashi Iwai 2011-08-16 07:46:44 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Does reverting the commit fix the problem?
> 
> Can I savely revert that commit from Linux kernel 3.x?

I suppose so.  Just give it a try.

> > If this commit is really the culprit, change the bug category/component
> appropriately.
> 
> I could not find PCI looking at category/component. Could you help me there
> please?

I meant the bugzilla "Products" and "Component" tags.

> > (BTW, the commit was merged to upstream first in 2.6.34-rc1.  You can check
> it
> > via "git describe --contains $ID")
> 
> Strange. Your command finds the following tag as you wrote.
> 
>         $ git describe --contains b74fd238a9
>         v2.6.34-rc1~218^2~27
> 
> But looking at `Makefile` there are other versions.
> 
>         $ git grep -C3 SUBLEVEL  b74fd238a9 -- Makefile
>         b74fd238a9:Makefile-VERSION = 2
>         b74fd238a9:Makefile-PATCHLEVEL = 6
>         b74fd238a9:Makefile:SUBLEVEL = 33
>         b74fd238a9:Makefile-EXTRAVERSION = -rc7
>         b74fd238a9:Makefile-NAME = Man-Eating Seals of Antiquity
>         b74fd238a9:Makefile-
>         […]

It means that the patch was written based on 2.6.33-rc7, but merged to the upstream in 2.6.34-rc1.
Comment 10 Paul Menzel 2011-08-16 08:04:04 UTC
(In reply to comment #9)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > Does reverting the commit fix the problem?
> > 
> > Can I savely revert that commit from Linux kernel 3.x?
> 
> I suppose so.  Just give it a try.

All right. I will try v3.0.

> > > If this commit is really the culprit, change the bug category/component
> appropriately.
> > 
> > I could not find PCI looking at category/component. Could you help me there
> > please?
> 
> I meant the bugzilla "Products" and "Component" tags.

Me too. ;-) But I do not know what entries to choose when it concerns x86/pci.

> > > (BTW, the commit was merged to upstream first in 2.6.34-rc1.  You can
> check it
> > > via "git describe --contains $ID")
> > 
> > Strange. Your command finds the following tag as you wrote.
> > 
> >         $ git describe --contains b74fd238a9
> >         v2.6.34-rc1~218^2~27
> > 
> > But looking at `Makefile` there are other versions.
> > 
> >         $ git grep -C3 SUBLEVEL  b74fd238a9 -- Makefile
> >         b74fd238a9:Makefile-VERSION = 2
> >         b74fd238a9:Makefile-PATCHLEVEL = 6
> >         b74fd238a9:Makefile:SUBLEVEL = 33
> >         b74fd238a9:Makefile-EXTRAVERSION = -rc7
> >         b74fd238a9:Makefile-NAME = Man-Eating Seals of Antiquity
> >         b74fd238a9:Makefile-
> >         […]
> 
> It means that the patch was written based on 2.6.33-rc7, but merged to the
> upstream in 2.6.34-rc1.

Thank you for the explanation.That would explain now looking at my protocols 2.6.33-rc8 worked in my tests. But I had forgotten about that during bisection.
Comment 11 Paul Menzel 2011-08-16 11:04:51 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #7)
> > > (In reply to comment #6)
> > > > Does reverting the commit fix the problem?
> > > 
> > > Can I savely revert that commit from Linux kernel 3.x?
> > 
> > I suppose so.  Just give it a try.
> 
> All right. I will try v3.0.

Well that did not work well and did not even compile.

        $ git checkout v3.0
        $ git revert 3e3da00c
        $ cp -a /boot/config-3.0.0-1-686-pae .config
        $ make oldconfig
        $ make localmodconfig
        $ time make -j2 deb-pkg
          CC      arch/x86/lib/memmove_64.o
        gcc: error: arch/x86/lib/memmove_64.c: Datei oder Verzeichnis nicht gefunden
        gcc: fatal error: no input files
        compilation terminated.
        make[3]: *** [arch/x86/lib/memmove_64.o] Fehler 1
        make[2]: *** [arch/x86/lib] Fehler 2

        make[1]: *** [deb-pkg] Fehler 2
        make: *** [deb-pkg] Fehler 2

So I tried 2.6.34. This build fine but booting stopped with the following error.

        [0xc80000… - 0xcf000000… pref] conflicts with GART
        request_module: runaway loop modprobe binfmt-464c

So I am not able to revert that commit, guessing that other later changes depend on it.

Yinghai, do you have any ideas? I am going to add the maintainers to CC.

[…]
Comment 12 Yinghai Lu 2011-08-16 17:17:06 UTC
On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> Yinghai, do you have any ideas? I am going to add the maintainers to CC.

can you post whole bootlog with "debug ignore_loglevel" 

Thanks

Yinghai
Comment 13 Paul Menzel 2011-08-16 21:07:41 UTC
(In reply to comment #12)
> On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > Yinghai, do you have any ideas? I am going to add the maintainers to CC.
> 
> can you post whole bootlog with "debug ignore_loglevel"

I tried, but even adding only `debug`to the kernel command line the boot stops after two seconds and I do not even get the the cryptsetup LUKS passphrase dialog.


Thank you for looking into this,

Paul
Comment 14 Jonathan Nieder 2011-08-16 21:23:54 UTC
(In reply to comment #11)

>         $ git checkout v3.0
>         $ git revert 3e3da00c
>         $ cp -a /boot/config-3.0.0-1-686-pae .config
>         $ make oldconfig
>         $ make localmodconfig
>         $ time make -j2 deb-pkg
>           CC      arch/x86/lib/memmove_64.o
>         gcc: error: arch/x86/lib/memmove_64.c: Datei oder Verzeichnis nicht
>         gefunden

From "git log -- arch/x86/lib/memmove_64.c", I learn that it
was replaced by an assembler file. Hopefully

         rm arch/x86/lib/.memmove_64.o.cmd

would get the build working again --- sorry for the fuss.
Comment 15 Paul Menzel 2011-08-16 21:43:43 UTC
Created attachment 69022 [details]
output of `dmesg` from Debian Linux kernel 3.0.0 with option `ignore_loglevel`
Comment 16 Paul Menzel 2011-08-16 21:45:09 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > 
> > > Yinghai, do you have any ideas? I am going to add the maintainers to CC.
> > 
> > can you post whole bootlog with "debug ignore_loglevel"
> 
> I tried, but even adding only `debug`to the kernel command line the boot
> stops
> after two seconds and I do not even get the the cryptsetup LUKS passphrase
> dialog.

This is also reproducible with Linux v3.0.0 from Debian. Only adding `ignore_loglevel` works though, so maybe the dmesg output I attach has enough information. (And this has nothing to do with 

Please note that Clemens already pointed out what went wrong from the stack trace [1], but I guess you still need more information.


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613979#51
Comment 17 Paul Menzel 2011-08-16 21:48:02 UTC
(In reply to comment #14)
> (In reply to comment #11)
> 
> >         $ git checkout v3.0
> >         $ git revert 3e3da00c
> >         $ cp -a /boot/config-3.0.0-1-686-pae .config
> >         $ make oldconfig
> >         $ make localmodconfig
> >         $ time make -j2 deb-pkg
> >           CC      arch/x86/lib/memmove_64.o
> >         gcc: error: arch/x86/lib/memmove_64.c: Datei oder Verzeichnis nicht
> gefunden
> 
> From "git log -- arch/x86/lib/memmove_64.c", I learn that it
> was replaced by an assembler file. Hopefully
> 
>          rm arch/x86/lib/.memmove_64.o.cmd
> 
> would get the build working again --- sorry for the fuss.

Jonathan, thank you for clearing that up and explaining the issue and how you figured it out. That helps a lot. I will try another build, but since the revert did not work with v2.6.34 (unbootable) I doubt that it will work with v3.0.
Comment 18 Paul Menzel 2011-08-16 22:23:35 UTC
(In reply to comment #16)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > On 08/16/2011 04:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > 
> > > > Yinghai, do you have any ideas? I am going to add the maintainers to
> CC.
> > > 
> > > can you post whole bootlog with "debug ignore_loglevel"
> > 
> > I tried, but even adding only `debug`to the kernel command line the boot
> stops
> > after two seconds and I do not even get the the cryptsetup LUKS passphrase
> > dialog.
> 
> This is also reproducible with Linux v3.0.0 from Debian. Only adding
> `ignore_loglevel` works though, so maybe the dmesg output I attach has enough
> information. (And this has nothing to do with

… this bug since this is also reproducible with 2.6.3{2,3}.) So this must be another problem that some buffer is too small. But I do not know how `debug` works.

[…]
Comment 19 Svante Signell 2011-08-17 09:14:51 UTC
Created attachment 69042 [details]
dmesg for failing kernel with 'debug ignore_loglevel'

Attached is the output of the failing kernel 2.6.33-rc7-12 with debug and ignore_loglevel options set. Hope this helps to finally solve the regression for my AMD Athlon box.
Comment 20 Yinghai Lu 2011-08-17 23:15:28 UTC
it seems it is the same problem to 
http://us.generation-nt.com/answer/x86-pci-oops-config-snd-hda-intel-help-198326691.html

and you could use

pci=use_crs

or just revert that commit.

Thanks
Comment 21 Jonathan Nieder 2011-08-17 23:35:54 UTC
The similar bug Yinghai mentioned is

 https://bugzilla.kernel.org/show_bug.cgi?id=16007

which is handled using a quirk to enable use_crs (v2.6.36-rc1~518^2~8,
2010-07-23).  I wonder if it is safe to enable pci=use_crs by default
on old machines nowadays.  Cc-ing Bjorn Helgaas in case he has
thoughts on this.

(Context: we were discussing [1].)
[1] https://bugzilla.kernel.org/show_bug.cgi?id=30552
Comment 22 Paul Menzel 2011-08-18 12:17:28 UTC
(In reply to comment #20)
> it seems it is the same problem to 
>
> http://us.generation-nt.com/answer/x86-pci-oops-config-snd-hda-intel-help-198326691.html

I am sorry that I did not find that bug report #16007 [1] myself. The search engines must have a hard time finding it because `dmesg` output was just attached and not pasted.

> and you could use
> 
> pci=use_crs

That indeed fixes the problem. I am changing the component therefore to PCI, which I somehow overlooked beforehand. … And I am not allowed to do so because I do not have the permission. Only Jaroslav or “administrators” are allowed to do this.

> or just revert that commit.

I will try that soon.

I have two more questions.

1. Why was #16007 marked as resolved? I could not find when this was done and what commit fixed it.

    git log origin/master -- arch/x86/pci/amd_bus*

2. What is the correct upstream fix? `Documentation/kernel-parameters.txt` says to report it when `pci=use_crs` is needed.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=16007
[2] http://packages.debian.org/changelogs/pool/main/l/linux-2.6/linux-2.6_3.0.0-2/changelog


Thanks,

Paul
Comment 23 Bjorn Helgaas 2011-08-18 16:28:39 UTC
I'm really sorry this issue has been languishing so long.  Thanks for pointing it out to me, Jonathan.

Bug #16007 was resolved by 2491762cfb475dbd (found by "git log | less +/bugzilla.*16007").

This looks like exactly the same problem.  amd_bus.c reads HyperTransport routing registers, but that doesn't tell us about PCI host bridges.  Looking at attachment #69022 [details], this:

  bus: [00, ff] on node 0 link 0
  bus: 00 index 1 [mem 0x80000000-0xfcffffffff]

tells us [mem 0x80000000-0xfcffffffff] is routed to PCI buses 00-ff.  We have to assume a host bridge leading to bus 00.  PCI buses are hierarchical, so there would have to be a PCI-to-PCI bridge that leads from bus 00 to the sound device on bus 20.  But there is no such bridge, so all we can do is assume bus 20 is in another PCI hierarchy with its own MMIO space that's separate from the [mem 0x80000000-0xfcffffffff] found by amd_bus.c.

That leads to this incorrect reassignment:

  pci 0000:20:01.0: reg 10: [mem 0xfbffc000-0xfbffffff 64bit]
  pci 0000:20:01.0: address space collision: [mem 0xfbffc000-0xfbffffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff]
  pci 0000:20:01.0: BAR 0: assigned [mem 0xfd00000000-0xfd00003fff 64bit]

The original [mem 0xfbffc000-0xfbffffff 64bit] setting was done by the BIOS and is just fine.

If we pay attention to the ACPI _CRS information (as with "pci=use_crs"), we learn about a second host bridge that leads to the sound device on bus 20:

  ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 20-ff])
  pci_root PNP0A08:00: host bridge window [mem 0xfbf00000-0xfbffffff] (ignored)
  pci 0000:20:01.0: reg 10: [mem 0xfbffc000-0xfbffffff 64bit]

This is probably on the same HT chain (node 0 link 0), so it's consistent with what amd_bus found.  It's just that amd_bus can't tell us about *host bridges* on that chain.

The fix?  We could easily add another quirk to turn on "pci=use_crs" automatically.  I hesitate a little because we have two of those quirks already, and we could end up adding many more as we trip over the same issue again and again.

I'm not aware of any current problems with the PCI _CRS code, and I'm pretty confident that Windows has been using PCI _CRS for a long time, so we might consider moving the date cutoff (currently we disable it for BIOS dates before 2008) or removing it altogether.

I'll research this a bit and come back with a proposal.  Again, I'm sorry that it took so long to figure this out.
Comment 24 Paul Menzel 2011-08-18 16:56:14 UTC
(In reply to comment #23)

[…]

> Bug #16007 was resolved by 2491762cfb475dbd (found by "git log | less
> +/bugzilla.*16007").

Thank you. I should have figured this out myself. Git even has an option `--grep`.

    $ git log --grep=zilla.*16007 origin/master

[…]

> I'll research this a bit and come back with a proposal.  Again, I'm sorry
> that
> it took so long to figure this out.

Thank you very much for looking into this issue!
Comment 25 Svante Signell 2011-08-19 09:36:24 UTC
Created attachment 69372 [details]
boot log with kernel 'debug ignore_loglevel pci=use_crs' options added.

Yippee!

The computer boots properly with the added kernel option pci=use_crs. Attached is a complete boot log. There are still warnings about address space collisions, though. Hopefully you can come up with a permanent solution of this regression soon.

Thanks everyone!
Comment 26 Lews Carroll 2011-09-01 17:29:26 UTC
Finally running 3.0.0-1 on Debian Testing. Thank-you for taking the time to research this issue.
Comment 27 Paul Menzel 2011-09-01 17:39:41 UTC
(In reply to comment #26)
> Finally running 3.0.0-1 on Debian Testing. Thank-you for taking the time to
> research this issue.

Have you reported your issues somewhere already?

If you have a different system than already reported, could you open a new report and attach there the information about your system? `dmesg` and relevant sections of `dmidecode` should be enough.
Comment 28 Paul Menzel 2011-09-01 18:09:24 UTC
Created attachment 71152 [details]
output of `dmidecode`(from Debian Sid/unstable 2.9-1.2)

I am attaching the output of `dmidecode`. Unfortunately ASUS does not seem to fill out all tables.

    $ dmesg | grep -i m2v
    [    0.000000] DMI: System manufacturer System Product Name/M2V-MX SE, BIOS 0304    10/30/2007
Comment 29 Paul Menzel 2011-09-01 18:13:07 UTC
Created attachment 71172 [details]
[PATCH] x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE

Bjorn is working on a real fix. But until he has time and this gets in it would be great to get this quirk in.
Comment 30 Paul Menzel 2011-09-07 09:24:03 UTC
Since the ASUS M2V-MX SE is also supported by coreboot [1] I checked with the developer and a user.

1. There is no problem with boards running coreboot since there is only one root bus.

14:46 < PaulePanter> ruik: On ASUS M2V-MX SE did you ever experience <https://bugzilla.kernel.org/show_bug.cgi?id=30552>?
14:46 < PaulePanter> Or did you never run Linux kernels versions greater than 2.6.32?
14:58 < ruik> PaulePanter: let me check it
14:59 < mue_> PaulePanter: works fine here with >2.6.39
15:00 < ruik> PaulePanter: i can try evening with 3.0.1
15:00 < mue_> 3.0/3.1 both work
15:01 < ruik> PaulePanter: comment #23 is nice one
15:02 < ruik> the bus 20h is really new bus
15:03 < ruik> PaulePanter: this would not be seen with coreboot, where I configured the chipset to add a bus bridge
15:03 < ruik> to the sound card
15:03 < ruik> i dont know why they choose to do it this way
15:08 < PaulePanter> ruik, mue_: Thank you for checking.
15:09 < PaulePanter> ruik: So this is a BIOS bug?
15:09 < PaulePanter> ruik: Adding  `pci=use_crs` to the Linux kernel command line will not harm boards running coreboot, it?
15:10 < ruik> PaulePanter: no it is linux bug because it does not expect that same HT chain
15:10 < ruik> has two pci root busses
15:10 < ruik> (i guess0
15:10 < ruik> )
15:10 < PaulePanter> I see.
15:10 < ruik> PaulePanter: the pci=use_crs should work fine I can test in the evening
15:10 < PaulePanter> ruik: Thank you.
15:11 < PaulePanter> ruik, mue_: Can I paste this conversation into the bugzilla entry?
15:11 < mue_> PaulePanter: i am running coreboot on the board
15:11 < ruik> well if you think it helps...
15:12 < ruik> also the quirk will not match coreboot DMI
15:12 < ruik> or maybe it will
15:12 < ruik> it depends if it puts there Asus or Asustek
15:12 < PaulePanter> Ah you are right.
15:13 < PaulePanter> In the latter case the patch does not impact boards running coreboot at all.
15:13 < ruik> but coreboot does not need that
15:13 < ruik> because i have there only one root bus
15:13 < ruik> the audio is behind a bridge
15:14 < PaulePanter> I will add that to the commit message.
15:14 < PaulePanter> mue_: Could you paste the 
15:14 < PaulePanter> … dmidecode output somewhere, please?
15:15 < PaulePanter> bbl and thank you all.
15:16 < ruik> ok
15:16 < mue_> PaulePanter: http://dpaste.com/609869/
15:16 < ruik> mue_: if you use more recent coreboot it will have
15:16 < ruik> a more fancy one
15:16 < ruik> (plus you need master  of seabios)

2. There should not be a problem with the patch/quirk and boards using coreboot because the output of `dmidecode` differs.


[1] http://www.coreboot.org/
Comment 31 Paul Menzel 2011-09-07 09:27:59 UTC
Created attachment 71842 [details]
output of `dmidecode` of board ASUS M2V-MX SE running coreboot

output of `dmidecode` of board ASUS M2V-MX SE running coreboot
Comment 32 Florian Mickler 2012-01-12 21:28:09 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc10:

commit 29cf7a30f8a0ce4af2406d93d5a332099be26923
Author: Paul Menzel <paulepanter@users.sourceforge.net>
Date:   Wed Aug 31 17:07:10 2011 +0200

    x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE
Comment 33 Bjorn Helgaas 2012-01-13 23:11:11 UTC
Paul, can you confirm that 3.1 or later boots correctly without having to supply "pci=use_crs"?  If so, I think we can close this.
Comment 34 Jonathan Nieder 2012-01-19 01:17:12 UTC
I suspect this is fixed for Paul but not for Svante.  Svante, can you confirm? If so, please provide DMI information (using "dmesg | grep DMI", "grep . /sys/class/dmi/id/*", or "dmidecode").
Comment 35 Jonathan Nieder 2012-01-19 01:18:16 UTC
Erm, Svante doesn't seem to be on the cc list.  I'll ping him; sorry for the noise.
Comment 36 Paul Menzel 2012-01-20 22:01:28 UTC
(In reply to comment #33)
> Paul, can you confirm that 3.1 or later boots correctly without having to
> supply "pci=use_crs"?  If so, I think we can close this.

Yes, everything works fine for me now.

Thank you again for your work and I am sorry for not updating this report myself.
Comment 37 Paul Menzel 2012-01-20 22:52:25 UTC
(In reply to comment #35)
> Erm, Svante doesn't seem to be on the cc list.  I'll ping him; sorry for the
> noise.

Oh, I closed this bug now. :/ I am sorry. I submitted a new report #42619 dedicated to Svante’s board MS-7253. I will try to update the forwarded to address in the Debian BTS [1] tomorrow.


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=619034
Comment 38 Florian Mickler 2012-03-12 22:22:46 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.3-rc7:

commit 8411371709610c826bf65684f886bfdfb5780ca1
Author: Jonathan Nieder <jrnieder@gmail.com>
Date:   Tue Feb 28 11:51:10 2012 -0700

    x86/PCI: use host bridge _CRS info on MSI MS-7253