Bug 9961
Summary: | Oops with docking and undocking, device after docking not connected | ||
---|---|---|---|
Product: | Drivers | Reporter: | Pavel Kysilka (goldenfish) |
Component: | PCI | Assignee: | Gary Hade (garyhade) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | acpi-bugzilla, garyhade, greg, kristen.c.accardi, pm, zdenek.kabelac |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24, 2.6.25-rc1-00075-g10270d4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg - oops docking
dmesg - oops - undocking Warnings on docking possible fix |
Description
Pavel Kysilka
2008-02-13 13:55:09 UTC
Created attachment 14799 [details]
dmesg - oops docking
boot laptop battery powered. dock.
Created attachment 14800 [details]
dmesg - oops - undocking
boot laptop AC powered and docked. push docking button on dock and undock.
Will you please attach the output of acpidump? Thanks. *** Bug 10047 has been marked as a duplicate of this bug. *** By using git-bisect, I've found the culprit. Reverting it cures the oops. commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f Author: Gary Hade <garyhade@us.ibm.com> Date: Wed Oct 3 15:55:51 2007 -0700 PCI: remove transparent bridge sizing Remove transparent bridge sizing. Due to code in pci_read_bridge_bases() [drivers/pci/probe.c] the child bus of a transparent bridge already has access to the parent bus resources so transparent bridge sizing appears unnecessary. The bridge sizing includes alignment and granularity adjustments that can cause significantly more memory to be reserved from the parant bus than required by devices on the child bus and allotted by _CRS. Signed-off-by: Gary Hade <gary.hade@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 5e5191e..401e03c 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -472,7 +472,12 @@ void pci_bus_size_bridges(struct pci_bus *bus) break; case PCI_CLASS_BRIDGE_PCI: + /* don't size subtractive decoding (transparent) + * PCI-to-PCI bridges */ + if (bus->self->transparent) + break; pci_bridge_check_ranges(bus); + /* fall through */ default: pbus_size_io(bus); /* If the bridge supports prefetchable range, size it Created attachment 14917 [details]
Warnings on docking
The above works for 2.6.24, but 2.6.25-rc2 (git head) gives many warnings about double registering objects. So, it looks like something needs to be prepared, but not registered, whereas before it was being registered without being prepared, hence the null pointer dereference.
Interesting. Do you think that my transparent bridge sizing removal change may have exposed a problem that is absent in 2.6.24 and later? Maybe something like: if (!bus || !bus->self || bus->self->transparent) break; or if (bus && bus->self && bus->self->transparent) break; would at least help a little bit with the NULL oops ? Thought there will be a bigger problem I guess... It's not there that it crashes. Because the bits below don't get executed, it crashes elsewhere due to something not being initialised. Also, if you boot up with the dock connected, DMA (and possibly IRQs) from the PCI card in the dock doesn't seem to work properly. This is a regression from 2.6.23. I'll try to work out roughly where things start to go wrong, but it'll take a while. Hi, there is acpi table of my laptop. http://acpi.sourceforge.net/dsdt/view.php?id=689 My laptop's acpidump (T30) is already attached to bug #10047. Comment #6 seems to indicate that my transparent bridge sizing patch is no longer the prime suspect but if I can reproduce the problem with my T41p I will try to figure out the Oops disappears when my change is removed. Unfortunately, I am having trouble locating a Dock II. My attempts to find a Dock II have been unsuccessful so far but we are still searching... Gary has found a dock, and has reproduced the problem. He's looking into it now. Yes, it was a bit of a challange but someone finally located a Dock II for me to use. I received it yesterday. After reproducing the problem on my T41p with 2.6.25-rc2, 2.6.24, and 2.6.24.3 I tried 2.6.24-rc1 where the above mentioned "remove transparent bridge sizing" change had entered mainline. The problem reproduced again with 2.6.24-rc1 but disappeared after reverting only the "remove transparent bridge sizing" change. I may have misunderstood Paul's Comment #6. When he said "The above works for 2.6.24" I thought he meant that 2.6.24 worked without reverting the "remove transparent bridge sizing" change. He must have meant that 2.6.24 worked after reverting the change. Paul, is this what you intended or is there a chance that 2.6.24 is behaving differently on your T30 than it is on my T41p? So, it appears to me that the problem was introduced or exposed by my change. <sigh> I am working to find a solution ASAP. 2.6.23 works. 2.6.24 works after reverting the change. Isn't English such an ambiguous language? <grin> 2.6.25-rc3 and later do not see the dock at all as a dock. You can use it if you boot up whilst the laptop is docked, but then you cannot undock other than by rebooting. Also, in 2.6.25-rc2 and later some PCI bus interrupts and possibly DMA are not being correctly handled, resulting in the V4L2 stack consuming almost all the CPU. (In reply to comment #16) > 2.6.23 works. 2.6.24 works after reverting the change. Isn't English such an > ambiguous language? <grin> Especially when idiots like me are trying to understand it. :) Thanks, I think we're on same page now. Vanilla 2.6.25-rc6, with PCI debug switched on: ACPI: \_SB_.PCI0.PCI1.DOCK - docking acpiphp_glue: handle_hotplug_event_func: Bus check notify on \_SB_.PCI0.PCI1.DOCK PCI: Found 0000:02:03.0 [104c/ac22] 000604 01 PCI: Scanning behind PCI bridge 0000:02:03.0, config 000000, pass 0 PCI: Scanning behind PCI bridge 0000:02:03.0, config 000000, pass 1 PCI: Scanning bus 0000:08 PCI: Found 0000:08:01.0 [1095/0648] 000101 00 PCI: Found 0000:08:02.0 [104c/ac51] 000607 02 PCI: Found 0000:08:02.1 [104c/ac51] 000607 02 PCI: Fixups for bus 0000:08 PCI: Transparent bridge - 0000:02:03.0 PCI: Scanning behind PCI bridge 0000:08:02.0, config 000000, pass 0 PCI: Scanning behind PCI bridge 0000:08:02.1, config 000000, pass 0 PCI: Scanning behind PCI bridge 0000:08:02.0, config 000000, pass 1 PCI: Bus #09 (-#0c) is partially hidden behind transparent bridge #02 (-#08) PCI: Scanning behind PCI bridge 0000:08:02.1, config 000000, pass 1 PCI: Bus #0d (-#10) is partially hidden behind transparent bridge #02 (-#08) PCI: Bus scan for 0000:08 returning with max=10 PCI: Bus #08 (-#10) is partially hidden behind transparent bridge #02 (-#08) acpiphp_glue: bus exists... trim acpiphp_glue: acpi_bus_trim return 0 ath0: no IPv6 routers present ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1.DOCK._PRT] BUG: unable to handle kernel NULL pointer dereference at 00000000 IP: [<c01dd7fe>] pdev_sort_resources+0x8a/0x116 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: tun aes_i586 aes_generic arc4 ecb ath5k mac80211 cfg80211 radeon drm rfcomm l2cap bluetooth ppdev lp ipv6 microcode ext3 jbd mbcache fuse dm_crypt crypto_blkcipher dm_mod cpufreq_stats speedstep_ich speedstep_lib thinkpad_acpi nvram acpiphp bay dock pcmcia firmware_class joydev battery yenta_socket rsrc_nonstatic pcmcia_core snd_intel8x0 ac snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss irtty_sir sir_dev snd_pcm snd_timer nsc_ircc button psmouse irda i2c_i801 snd intel_agp agpgart crc_ccitt serio_raw evdev parport_pc parport shpchp pci_hotplug iTCO_wdt soundcore snd_page_alloc rtc pcspkr xfs floppy e100 mii sg uhci_hcd usbcore sr_mod cdrom sd_mod thermal processor fan ata_piix libata scsi_mod radeonfb fb_ddc i2c_algo_bit i2c_core Pid: 39, comm: kacpi_notify Not tainted (2.6.25-rc6 #14) EIP: 0060:[<c01dd7fe>] EFLAGS: 00010246 CPU: 0 EIP is at pdev_sort_resources+0x8a/0x116 EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000fff ESI: f529bcc4 EDI: f74d1e70 EBP: f74d1e58 ESP: f74d1e38 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process kacpi_notify (pid: 39, ti=f74d0000 task=f74a6120 task.ti=f74d0000) Stack: 00000000 f529bcf0 f74d1e70 f529bc00 00000007 f529bc00 f74a3414 f74a3400 f74d1e8c c02a9d9b f529bc08 f7422f90 f7422978 dd3bb400 00000000 f7d28488 f74a3400 f74d1e8c f7d28480 f7d4a348 f74a3400 f74d1ec4 f9b161ed f74a340c Call Trace: [<c02a9d9b>] ? pci_bus_assign_resources+0x59/0x341 [<f9b161ed>] ? enable_device+0x20d/0x2d7 [acpiphp] [<f9b15707>] ? acpiphp_enable_slot+0x9e/0xe7 [acpiphp] [<f9b1585d>] ? handle_hotplug_event_func+0x63/0x101 [acpiphp] [<f9b15167>] ? post_dock_fixups+0x6c/0x79 [acpiphp] [<f9b157fa>] ? handle_hotplug_event_func+0x0/0x101 [acpiphp] [<f9b10168>] ? hotplug_dock_devices+0x39/0xe1 [dock] [<f9b10441>] ? dock_notify+0x75/0xc0 [dock] [<c01f8529>] ? acpi_ev_notify_dispatch+0x4f/0x5a [<c01f3400>] ? acpi_os_execute_deferred+0x20/0x2c [<c012eedd>] ? run_workqueue+0x78/0xfb [<c01f33e0>] ? acpi_os_execute_deferred+0x0/0x2c [<c012f739>] ? worker_thread+0xb6/0xc2 [<c0131ac1>] ? autoremove_wake_function+0x0/0x30 [<c012f683>] ? worker_thread+0x0/0xc2 [<c01319ee>] ? kthread+0x3b/0x61 [<c01319b3>] ? kthread+0x0/0x61 [<c0105683>] ? kernel_thread_helper+0x7/0x10 ======================= Code: 50 52 51 ff 75 f0 68 1e c5 34 c0 e8 e8 4c f4 ff 83 c4 1c e9 87 00 00 00 8b 7d e8 8d 42 01 83 7d f0 06 89 7d e0 0f 4e c8 8b 45 e0 <8b> 18 31 c0 85 db 74 29 8b 53 04 8b 43 08 89 d7 05 8c 01 00 00 EIP: [<c01dd7fe>] pdev_sort_resources+0x8a/0x116 SS:ESP 0068:f74d1e38 ---[ end trace b8b5e3b5c2b6062d ]--- I am still struggling with this. I do not fully understand root cause yet but I'm betting that the hot-add of a "transparent" p2p bridge (on Dock II) immediately below yet another "transparent" p2p bridge (on ThinkPad) which creates a pretty complex cobweb of resource references has something to do with it. bus 00 -------------- | 00:1e.0 - ThinkPad resident Intel 82801 Mobile PCI Bridge | "transparent" via pci_fixup_transparent_bridge() | bus 02 -------------- | 02:03.0 - Dock II resident Texas Instruments PCI2032 PCI | Docking Bridge | "transparent" due to programming interface == 0x01 | bus 09 -------------- I have determined why we are not seeing the Oops during boot when the Dock II already attached to the laptop. It is due to pcibios_allocate_bus_resources() only being visited during boot. The boot-time visit to ... if (!r->start || !pr || request_resource(pr, r) < 0) { printk(KERN_ERR "PCI: Cannot allocate " "resource region %d " "of bridge %s\n", idx, pci_name(dev)); /* * Something is wrong with the region. * Invalidate the resource to prevent * child resource allocations in this * range. */ r->flags = 0; } ... in pcibios_allocate_bus_resources() clears the Dock II resource flags for regions 7, 8, and 9 as evidenced by the following messages which appear both with and without transparent bridge sizing removed. PCI: Cannot allocate resource region 7 of bridge 0000:02:03.0 PCI: Cannot allocate resource region 8 of bridge 0000:02:03.0 PCI: Cannot allocate resource region 9 of bridge 0000:02:03.0 With the Dock II bridge resource flags for these regions cleared the check: if (!(r->flags) || r->parent) continue; in pdev_sort_resources() prevents execution from reaching the list->next in the later: struct resource_list *ln = list->next; where the NULL pointer dereference occurs in the hot-add case after one trip through the body of the enclosing for loop. In the hot-add case, pcibios_allocate_bus_resources() is not visited so the Dock II bridge resource flags are not cleared prior to the call to pcibios_allocate_bus_resources(). If I am unable to come up with a solution for this that retains the transparent bridge non-sizing in the next couple of days it is likely that I will provide a patch that simply restores the transparent bridge sizing. I believe the resource shortage on some of our systems that motiviated the transparent bridge non-sizing change no longer shows up when space is not allocated by default for expansion ROMs. A later change that removes default allocation for expansion ROMs is already in mainline. Created attachment 15357 [details]
possible fix
Pavel/Paul,
Please give this a try and let me know what you think.
Note that the proposed fix has the side effect of eliminating the possibly confusing "PCI: Cannot allocate resource region ..." messages that show up during boot when the ThinkPad is jointed to the Dock II. These messages were seen both with and without the transparent bridge sizing removal change. (In reply to comment #21) > Note that the proposed fix has the side effect of eliminating the > possibly confusing "PCI: Cannot allocate resource region ..." messages > that show up during boot when the ThinkPad is jointed to the Dock II. ^^^^^^^ joined Gary, Please e-mail the patch in comment #20 to greg-kh if you have not already, as i'm not sure he scans bugzilla regularly. On the assumption that your patch is in the right spot, I'm moving this sighting to the drivers/pci category while marking it RESOLVED to indicate a patch is available to test. (In reply to comment #23) > Gary, > Please e-mail the patch in comment #20 to greg-kh > if you have not already, as i'm not sure he scans bugzilla regularly. > On the assumption that your patch is in the right spot, Len, I have not sent the patch to Greg yet but had planned to do so today. However, I just noticed a discussion concerning another more serious problem which is being blamed on my transparent bridge sizing removal change: http://lkml.org/lkml/2008/3/26/94 > I'm moving this sighting to the drivers/pci category Correct. > while marking it RESOLVED to indicate a patch is available to test. Also correct but the fix is now the patch that reverts the transparent bridge sizing removal change: http://lkml.org/lkml/2008/3/26/176 |