Bug 42002 - kernel BUG at kernel/resource.c:499
Summary: kernel BUG at kernel/resource.c:499
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 high
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-30 01:40 UTC by mludvig
Modified: 2012-08-30 13:28 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.1-rc4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Boot log of a working kernel (38.16 KB, text/plain)
2011-09-01 00:08 UTC, mludvig
Details
Boot log of a crashing kernel (28.36 KB, text/plain)
2011-09-01 00:10 UTC, mludvig
Details
lspci -vv output (30.46 KB, text/plain)
2011-09-02 00:01 UTC, mludvig
Details
Boot log of a working kernel (55.15 KB, text/plain)
2011-09-04 09:40 UTC, mludvig
Details
Boot log of a crashing kernel (37.23 KB, text/plain)
2011-09-04 09:41 UTC, mludvig
Details

Description mludvig 2011-08-30 01:40:57 UTC
Kernel 3.1-rc3 / 3.1-rc4 doesn't boot on my Jetway NC9C-550-LF board with Intel Atom N550. Same config that works for 3.0 plus defaults for the new options. Here's the captured crash:

[...]
[    7.027487] pci 0000:00:1f.2: BAR 5: assigned [mem 0xfed98400-0xfed987ff]
[    7.108769] pci 0000:00:1f.2: BAR 5: set to [mem 0xfed98400-0xfed987ff] (PCI address [0xfed98400-0xfed987ff])
[    7.227488] pci 0000:00:1c.3: BAR 14: assigned [mem 0xff400000-0xff6fffff]
[    7.309808] pci 0000:00:1c.3: BAR 15: assigned [mem 0xff700000-0xff8fffff 64bit pref]
[    7.403544] pci 0000:00:1c.2: BAR 14: assigned [mem 0xff900000-0xffafffff]
[    7.485843] ------------[ cut here ]------------
[    7.495817] kernel BUG at kernel/resource.c:499!
[    7.495817] invalid opcode: 0000 [#1] SMP
[    7.495817] Modules linked in:
[    7.495817]
[    7.495817] Pid: 1, comm: swapper Not tainted 3.1.0-rc4-hafanek+ #15 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
[    7.495817] EIP: 0060:[<c102fa46>] EFLAGS: 00010282 CPU: 1
[    7.495817] EIP is at reallocate_resource+0xc9/0xe0
[    7.495817] EAX: f654bd20 EBX: f5c14b4c ECX: f654bd20 EDX: f5c14b4c
[    7.495817] ESI: f5c5fe64 EDI: f5c14b68 EBP: 00000000 ESP: f5c5fe44
[    7.495817]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[    7.495817] Process swapper (pid: 1, ti=f5c5e000 task=f5c60000 task.ti=f5c5e000)
[    7.495817] Stack:
[    7.495817]  f5d48038 80000000 802fffff f5d64052 00182201 f5d48038 f5c129a8 00000000
[    7.495817]  f5c14b4c f5d48038 f5c14b4c 00182001 c102faae f5c5fe80 c119534c 80000000
[    7.495817]  ffffffff 00100000 c119534c f5c14800 00000006 fffffff4 f5c14b4c c112d68f
[    7.495817] Call Trace:
[    7.495817]  [<c102faae>] ? allocate_resource+0x51/0xa2
[    7.495817]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7
[    7.495817]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7
[    7.495817]  [<c112d68f>] ? pci_bus_alloc_resource+0x5f/0x87
[    7.495817]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7
[    7.495817]  [<c1133261>] ? _pci_assign_resource+0x99/0x101
[    7.495817]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7
[    7.495817]  [<c1133526>] ? pci_reassign_resource+0x4e/0x84
[    7.495817]  [<c113a1b8>] ? __assign_resources_sorted+0x119/0x19b
[    7.495817]  [<c121d0da>] ? __pci_bus_assign_resources+0x3e/0xb7
[    7.495817]  [<c1229306>] ? printk+0xe/0x11
[    7.495817]  [<c135596e>] ? pci_assign_unassigned_resources+0x7a/0x1b0
[    7.495817]  [<c10a4ec9>] ? kfree+0x3e/0xba
[    7.495817]  [<c1131bef>] ? pci_get_subsys+0x48/0x50
[    7.495817]  [<c1131bef>] ? pci_get_subsys+0x48/0x50
[    7.495817]  [<c135cdeb>] ? pcibios_allocate_bus_resources+0x80/0x80
[    7.495817]  [<c135ce49>] ? pcibios_assign_resources+0x5e/0x62
[    7.495817]  [<c1001066>] ? do_one_initcall+0x66/0x104
[    7.495817]  [<c133a752>] ? kernel_init+0x9f/0x111
[    7.495817]  [<c133a6b3>] ? start_kernel+0x306/0x306
[    7.495817]  [<c122f23e>] ? kernel_thread_helper+0x6/0xd
[    7.495817] Code: 89 10 c7 43 10 00 00 00 00 eb 05 8d 42 14 eb e3 8d 74 24 04 b9 07 00 00 00 89 df 89 da f3 a5 8b 04 24 e8 06 f7 ff ff 85 c0 74 07 <0f> 0b bd f0 ff ff ff e8 69 f9 ff ff 8d 64 24 20 89 e8 5b 5e 5f
[    7.495817] EIP: [<c102fa46>] reallocate_resource+0xc9/0xe0 SS:ESP 0068:f5c5fe44
[   10.271091] ---[ end trace 44593438a59a9533 ]---
[   10.326314] Kernel panic - not syncing: Attempted to kill init!
[   10.397196] Pid: 1, comm: swapper Tainted: G      D     3.1.0-rc4-hafanek+ #15
[   10.483653] Call Trace:
[   10.512892]  [<c1229217>] ? panic+0x4d/0x12e
[   10.563963]  [<c102cdc0>] ? do_exit+0x70/0x62c
[   10.617146]  [<c1002553>] ? do_bounds+0x4c/0x4c
[   10.671341]  [<c102b968>] ? kmsg_dump+0x35/0xad
[   10.725523]  [<c1002553>] ? do_bounds+0x4c/0x4c
[   10.779710]  [<c10044e8>] ? oops_end+0x72/0x75
[   10.832854]  [<c10025bd>] ? do_invalid_op+0x6a/0x77
[   10.891238]  [<c102fa46>] ? reallocate_resource+0xc9/0xe0
[   10.955844]  [<c104a922>] ? tick_dev_program_event+0x1e/0xfd
[   11.023540]  [<c104aa15>] ? tick_program_event+0x14/0x17
[   11.087086]  [<c104237b>] ? hrtimer_interrupt+0x120/0x1b9
[   11.151710]  [<c102f934>] ? __find_resource+0xd9/0x122
[   11.213187]  [<c122ea47>] ? error_code+0x67/0x6c
[   11.268406]  [<c102fa46>] ? reallocate_resource+0xc9/0xe0
[   11.332992]  [<c102faae>] ? allocate_resource+0x51/0xa2
[   11.395540]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7
[   11.458053]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7
[   11.520559]  [<c112d68f>] ? pci_bus_alloc_resource+0x5f/0x87 
[   11.588262]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7 
[   11.650807]  [<c1133261>] ? _pci_assign_resource+0x99/0x101 
[   11.717484]  [<c119534c>] ? free_dmar_iommu+0x1f7/0x1f7 
[   11.779983]  [<c1133526>] ? pci_reassign_resource+0x4e/0x84 
[   11.846653]  [<c113a1b8>] ? __assign_resources_sorted+0x119/0x19b 
[   11.919594]  [<c121d0da>] ? __pci_bus_assign_resources+0x3e/0xb7 
[   11.991478]  [<c1229306>] ? printk+0xe/0x11 
[   12.041489]  [<c135596e>] ? pci_assign_unassigned_resources+0x7a/0x1b0 
[   12.119602]  [<c10a4ec9>] ? kfree+0x3e/0xba 
[   12.169658]  [<c1131bef>] ? pci_get_subsys+0x48/0x50 
[   12.229055]  [<c1131bef>] ? pci_get_subsys+0x48/0x50 
[   12.288444]  [<c135cdeb>] ? pcibios_allocate_bus_resources+0x80/0x80 
[   12.364471]  [<c135ce49>] ? pcibios_assign_resources+0x5e/0x62 
[   12.434283]  [<c1001066>] ? do_one_initcall+0x66/0x104 
[   12.495772]  [<c133a752>] ? kernel_init+0x9f/0x111 
[   12.553069]  [<c133a6b3>] ? start_kernel+0x306/0x306 
[   12.612449]  [<c122f23e>] ? kernel_thread_helper+0x6/0xd 

That's it.
Comment 1 mludvig 2011-08-31 10:11:43 UTC
Bisection found the culprit:

| From 2bbc6942273b5b3097bd265d82227bdd84b351b2 Mon Sep 17 00:00:00 2001
| From: Ram Pai <linuxram@us.ibm.com>
| Date: Mon, 25 Jul 2011 13:08:39 -0700
| Subject: [PATCH] PCI : ability to relocate assigned pci-resources
Comment 2 mludvig 2011-09-01 00:08:54 UTC
Created attachment 71052 [details]
Boot log of a working kernel

GIT revision just before the breaking commit.
Comment 3 mludvig 2011-09-01 00:10:18 UTC
Created attachment 71062 [details]
Boot log of a crashing kernel

GIT rev 2bbc694 - first commit that doesn't boot.
Comment 4 mludvig 2011-09-02 00:01:27 UTC
Created attachment 71292 [details]
lspci -vv output
Comment 5 mludvig 2011-09-04 09:40:25 UTC
Created attachment 71642 [details]
Boot log of a working kernel

Increased verbosity with ignore_loglevel
Comment 6 mludvig 2011-09-04 09:41:17 UTC
Created attachment 71652 [details]
Boot log of a crashing kernel

Increased verbosity with ignore_loglevel
Comment 7 Bjorn Helgaas 2011-09-06 05:23:22 UTC
[I sent this in email earlier, intending it to be attached in bugzilla, but that didn't work.]

I see two things wrong so far.

1)  I think we are reassigning PCI resources when we shouldn't.

 pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfed8ffff]
 pci_root PNP0A08:00: host bridge window [mem 0x00000000-0xffffffff]
 pci_root PNP0A08:00: host bridge window expanded to [mem
0x00000000-0xffffffff]; [mem 0x00000000-0xffffffff] ignored
 pci 0000:00:1c.1: address space collision: [mem
0xfde00000-0xfdefffff 64bit pref] conflicts with PCI Bus 0000:00 [mem
0xf0000000-0xfed8ffff]
 pci 0000:00:1c.2: address space collision: [mem
0xfdf00000-0xfdffffff 64bit pref] conflicts with PCI Bus 0000:00 [mem
0xf0000000-0xfed8ffff]
 ...

These "collisions" are not actually collisions -- [mem
0xfde00000-0xfdefffff 64bit pref] is a perfectly legal assignment
inside the [mem 0xf0000000-0xfed8ffff] host bridge window.

The supposed host bridge window [mem 0x00000000-0xffffffff] is clearly
bogus, but we don't handle it well in Linux.  The kernel resource code
doesn't allow overlaps at the same level, so we have a hack that
coalesces those overlapping host bridge windows, which leads to these
"collisions,"  which in turn causes unnecessary PCI resource
reassignments.

2) The reassignment fails when it shouldn't.

It looks like when we fail, we're assigning more space to the 1c.3 and
1c.2 bridge windows than we did before, but beyond that, I think Ram
will have more insight than I do right now.

Note You need to log in before you can comment on or make changes to this bug.