Bug 100031 - "BUG: scheduling while atomic" in pci-imx6.c if PCIe switch is attached
Summary: "BUG: scheduling while atomic" in pci-imx6.c if PCIe switch is attached
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-17 12:28 UTC by dave.mueller
Modified: 2016-01-06 02:08 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Proposed patch for pci-imx6.c (499 bytes, patch)
2015-06-17 12:28 UTC, dave.mueller
Details | Diff

Description dave.mueller 2015-06-17 12:28:31 UTC
Created attachment 180171 [details]
Proposed patch for pci-imx6.c

I have an i.MX6Q based custom board with an external PCIe NIC running
Linux kernel 4.0

As long as the NIC is connected directly to the i.MX6 PCIe host,
everything is fine.

But if I insert an additional PCIe switch (PEX8603) between the host and
the NIC, the kernel crashes with "BUG: scheduling while atomic:" as
shown below.

The attached patch seems to fix the problem.



imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xffff]
pci_bus 0000:00: root bus resource [mem 0x01000000-0x01efffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:00.0: [16c3:abcd] type 01 class 0x060400
pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x0000ffff pref]
pci 0000:00:00.0: supports D1
pci 0000:00:00.0: PME# supported from D0 D1 D3hot D3cold
PCI: bus0: Fast back to back transfers disabled
BUG: scheduling while atomic: swapper/0/1/0x00000002
3 locks held by swapper/0/1:
 #0:  (&dev->mutex){......}, at: [<803493cc>] __driver_attach+0x3c/0x9c
 #1:  (&dev->mutex){......}, at: [<803493f0>] __driver_attach+0x60/0x9c
 #2:  (pci_lock){......}, at: [<802fabe8>]
pci_bus_read_config_dword+0x3c/0x8c
Modules linked in:
irq event stamp: 167572
hardirqs last  enabled at (167571): [<804a1d44>]
_raw_spin_unlock_irqrestore+0x44/0x7c
hardirqs last disabled at (167572): [<804a1b70>]
_raw_spin_lock_irqsave+0x24/0x60
softirqs last  enabled at (166114): [<8011bd64>] __do_softirq+0x2a0/0x348
softirqs last disabled at (166109): [<8011c198>] irq_exit+0xd4/0x1bc
Preemption disabled at:[<802fabe8>] pci_bus_read_config_dword+0x3c/0x8c

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.0.0+ #277
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<801092dc>] (dump_backtrace) from [<801094f8>] (show_stack+0x18/0x1c)
 r6:80738780 r5:8081614c r4:00000000 r3:00200140
[<801094e0>] (show_stack) from [<8049bc24>] (dump_stack+0x7c/0xc8)
[<8049bba8>] (dump_stack) from [<80135dd0>] (__schedule_bug+0xb0/0xd4)
 r5:af070000 r4:af070000
[<80135d20>] (__schedule_bug) from [<8049c69c>] (__schedule+0x68/0x4a8)
 r4:af725780 r3:00000002
[<8049c634>] (__schedule) from [<8049cb80>] (schedule+0xa4/0xb4)
 r9:00000000 r8:00000004 r7:00000000 r6:000f4240 r5:af055a80 r4:af054000
[<8049cadc>] (schedule) from [<804a138c>]
(schedule_hrtimeout_range_clock+0xd8/0x114)
 r4:00000001 r3:af070000
[<804a12b4>] (schedule_hrtimeout_range_clock) from [<804a13dc>]
(schedule_hrtimeout_range+0x14/0x18)
 r7:af2f5c00 r6:af29da20 r5:00000005 r4:af29da20
[<804a13c8>] (schedule_hrtimeout_range) from [<80168834>]
(usleep_range+0x50/0x58)
[<801687e4>] (usleep_range) from [<8031658c>] (imx6_pcie_link_up+0x48/0x138)
[<80316544>] (imx6_pcie_link_up) from [<80315028>]
(dw_pcie_link_up+0x20/0x2c)
 r6:af29da20 r5:af2f5c00 r4:00000000
[<80315008>] (dw_pcie_link_up) from [<8031508c>]
(dw_pcie_valid_config+0x58/0x80)
[<80315034>] (dw_pcie_valid_config) from [<80315244>]
(dw_pcie_rd_conf+0x50/0x148)
 r6:00000000 r5:af055b3c r4:af29da20 r3:00000000
[<803151f4>] (dw_pcie_rd_conf) from [<802fac10>]
(pci_bus_read_config_dword+0x64/0x8c)
 r10:00000000 r9:00000000 r8:00000000 r7:af055b9c r6:60000113 r5:af2f5c00
 r4:00000000
[<802fabac>] (pci_bus_read_config_dword) from [<802fc414>]
(pci_bus_read_dev_vendor_id+0x2c/0xe8)
 r8:af055b9c r7:0000ea60 r6:00000000 r5:af2f5c00 r4:af2f5c00
[<802fc3e8>] (pci_bus_read_dev_vendor_id) from [<802fd8c4>]
(pci_scan_single_device+0x40/0xb8)
 r9:00000000 r8:00000001 r7:00000000 r6:00000000 r5:af2f5c00 r4:af2f5c00
[<802fd884>] (pci_scan_single_device) from [<802fd99c>]
(pci_scan_slot+0x60/0xf8)
 r7:00000000 r6:00000001 r5:af2f5c00 r4:af2f5c00
[<802fd93c>] (pci_scan_slot) from [<802fe70c>]
(pci_scan_child_bus+0x28/0xa8)
 r7:00000000 r6:00000001 r5:00000008 r4:af2f5c00
[<802fe6e4>] (pci_scan_child_bus) from [<802fe500>]
(pci_scan_bridge+0x31c/0x500)
 r8:00000001 r7:00000000 r6:af2f5c00 r5:af2f5400 r4:af2ae000 r3:00000001
[<802fe1e4>] (pci_scan_bridge) from [<802fe76c>]
(pci_scan_child_bus+0x88/0xa8)
 r10:af367ba0 r9:af367b80 r8:af2f5414 r7:00000001 r6:00000000 r5:af2ae000
 r4:af2f5400
[<802fe6e4>] (pci_scan_child_bus) from [<80314a80>]
(dw_pcie_scan_bus+0x50/0x78)
 r8:00000000 r7:af055cc0 r6:81025628 r5:af2f5400 r4:af29da20 r3:80816b98
[<80314a30>] (dw_pcie_scan_bus) from [<8010b2bc>]
(pci_common_init_dev+0x1a0/0x2f8)
 r5:00000000 r4:808175a4
[<8010b11c>] (pci_common_init_dev) from [<8031593c>]
(dw_pcie_host_init+0x600/0x654)
 r10:00000000 r9:00000000 r8:805fd162 r7:00000000 r6:00000000 r5:00000020
 r4:00000000
[<8031533c>] (dw_pcie_host_init) from [<8071c600>]
(imx6_pcie_probe+0x1f0/0x2a4)
 r10:00000000 r9:8082ba00 r8:af29da20 r7:af1da800 r6:af1da810 r5:af29da10
 r4:00000000
[<8071c410>] (imx6_pcie_probe) from [<8034aa2c>]
(platform_drv_probe+0x50/0xa0)
 r9:8082ba00 r8:808175e4 r7:00000000 r6:808175e4 r5:af1da810 r4:ffffffed
[<8034a9dc>] (platform_drv_probe) from [<803491fc>]
(driver_probe_device+0xc0/0x208)
 r6:00000000 r5:81028f98 r4:af1da810 r3:8034a9dc
[<8034913c>] (driver_probe_device) from [<80349408>]
(__driver_attach+0x78/0x9c)
 r9:8082ba00 r8:80804a20 r7:8081c608 r6:808175e4 r5:af1da844 r4:af1da810
[<80349390>] (__driver_attach) from [<8034787c>]
(bus_for_each_dev+0x74/0x98)
 r6:80349390 r5:808175e4 r4:00000000 r3:00000001
[<80347808>] (bus_for_each_dev) from [<80348d44>] (driver_attach+0x20/0x28)
 r6:af367a00 r5:00000000 r4:808175e4
[<80348d24>] (driver_attach) from [<803489d4>] (bus_add_driver+0xe8/0x1d0)
[<803488ec>] (bus_add_driver) from [<80349a98>] (driver_register+0xa4/0xe8)
 r7:80804a20 r6:00000000 r5:8071c3e8 r4:808175e4
[<803499f4>] (driver_register) from [<8034a960>]
(__platform_driver_register+0x50/0x64)
 r5:8071c3e8 r4:808175d0
[<8034a910>] (__platform_driver_register) from [<8034aabc>]
(__platform_driver_probe+0x28/0x9c)
[<8034aa94>] (__platform_driver_probe) from [<8071c404>]
(imx6_pcie_init+0x1c/0x28)
 r6:00000000 r5:8071c3e8 r4:af27b200 r3:00000000
[<8071c3e8>] (imx6_pcie_init) from [<801007d8>]
(do_one_initcall+0x108/0x1bc)
[<801006d0>] (do_one_initcall) from [<80700e1c>]
(kernel_init_freeable+0x120/0x1e8)
 r9:8082ba00 r8:8082ba00 r7:80735e7c r6:8072daa0 r5:0000005f r4:00000006
[<80700cfc>] (kernel_init_freeable) from [<80499690>]
(kernel_init+0x14/0xf0)
 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:8049967c r4:8082ba00
[<8049967c>] (kernel_init) from [<80106020>] (ret_from_fork+0x14/0x34)
 r4:00000000 r3:00000000
Comment 1 Bjorn Helgaas 2015-06-17 14:54:08 UTC
Thanks, Dave.

Please post the patch to linux-pci@vger.kernel.org (referencing this bugzilla), and copy the imx6 maintainers:

09:49:13 ~/linux (next)$ ./scripts/get_maintainer.pl -f drivers/pci/host/pci-imx6.c
Richard Zhu <Richard.Zhu@freescale.com> (maintainer:PCI DRIVER FOR IMX6)
Lucas Stach <l.stach@pengutronix.de> (maintainer:PCI DRIVER FOR IMX6)
Bjorn Helgaas <bhelgaas@google.com> (supporter:PCI SUBSYSTEM)
linux-pci@vger.kernel.org (open list:PCI DRIVER FOR IMX6)
linux-arm-kernel@lists.infradead.org (moderated list:PCI DRIVER FOR IMX6)
linux-kernel@vger.kernel.org (open list)

My first question on the list will be: Besides imx6, there are five other DesignWare-based drivers, and imx6_pcie_link_up() looks nothing like the other pcie_host_ops.link_up() methods.  We need to explain why imx6 is special, and whether all the .link_up() methods can be made similar.
Comment 2 Lucas Stach 2015-06-18 08:33:34 UTC
Bjorn,

the i.MX6 link up routine is a bit more complex compared to the other designware drivers because of a hardware bug, where some FIFOs could get out of sync when changing the link speed. We need to start the link at Gen1, wait for it to be stable, then start a directed link speed change if the EP is Gen2 and wait for the link to be stable again. If the link doesn't come back, we need to start over.

I don't think any of the other DW PCIe implementations need this workaround.
Comment 3 sanjeevsharma 2015-11-16 09:39:06 UTC
detailed explanation can be found here https://lkml.org/lkml/2015/11/9/208
Comment 4 Bjorn Helgaas 2016-01-06 02:08:02 UTC
Re comment #2, it sounds like imx6_pcie_link_up() is doing things that should be done in imx6_pcie_establish_link() instead.

Re comment #3, that link is only a patch.  It has nothing like a detailed explanation.

Note You need to log in before you can comment on or make changes to this bug.