Created attachment 300612 [details] full dmesg log with the warning I have this kernel trace often while PC booting: DMA-API: xhci_hcd 0000:09:00.3: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: CPU: 4 PID: 69 at kernel/dma/debug.c:570 add_dma_entry+0x1c7/0x250 Modules linked in: bpf_preload CPU: 4 PID: 69 Comm: kworker/4:1 Not tainted 5.17.0 #1 Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS, BIOS 2423 08/10/2021 Workqueue: usb_hub_wq hub_event RIP: 0010:add_dma_entry+0x1c7/0x250 Code: ff 0f 84 97 00 00 00 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 4b 53 80 00 48 89 c6 48 89 da 48 c7 c7 58 b3 84 94 e8 27 21 ca 00 <0f> 0b 48 85 ed 0f 85 69 d2 ca 00 8b 05 88 a0 30 02 85 c0 0f 85 f4 RSP: 0018:ffff9cadc0c9ba88 EFLAGS: 00010292 RAX: 0000000000000060 RBX: ffff9173d2c49ea0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff RBP: ffff9173d11a5e80 R08: 0000000000000000 R09: ffff9cadc0c9b8c0 R10: ffff9cadc0c9b8b8 R11: ffffffff95165128 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000206 R15: 00000000046882ee FS: 0000000000000000(0000) GS:ffff9182a8200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fce5d52a130 CR3: 0000000119b28000 CR4: 0000000000350ee0 Call Trace: <TASK> dma_map_page_attrs+0x82/0x250 ? _raw_spin_unlock_irqrestore+0x30/0x60 usb_hcd_map_urb_for_dma+0x3eb/0x4f0 usb_hcd_submit_urb+0x9b/0xb90 ? lockdep_init_map_type+0x62/0x260 usb_start_wait_urb+0x55/0x130 usb_control_msg+0xc9/0x110 hub_ext_port_status+0x89/0x110 hub_event+0x1e4/0x1920 ? lock_acquire+0xe0/0x2d0 ? lock_is_held_type+0xea/0x140 process_one_work+0x29b/0x5f0 worker_thread+0x4f/0x390 ? process_one_work+0x5f0/0x5f0 kthread+0xf5/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> My usb devices: $ lsusb -t /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M |__ Port 4: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/4p, 5000M /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M |__ Port 2: Dev 2, If 3, Class=Application Specific Interface, Driver=, 12M |__ Port 2: Dev 2, If 1, Class=Vendor Specific Class, Driver=btusb, 12M |__ Port 2: Dev 2, If 2, Class=Vendor Specific Class, Driver=btusb, 12M |__ Port 2: Dev 2, If 0, Class=Vendor Specific Class, Driver=btusb, 12M |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 1: Dev 12, If 0, Class=Communications, Driver=cdc_ether, 480M |__ Port 1: Dev 12, If 1, Class=CDC Data, Driver=cdc_ether, 480M |__ Port 4: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 3: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M |__ Port 3: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M |__ Port 4: Dev 7, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 2: Dev 8, If 0, Class=Human Interface Device, Driver=usbfs, 1.5M |__ Port 3: Dev 9, If 0, Class=Vendor Specific Class, Driver=, 480M |__ Port 4: Dev 10, If 0, Class=Imaging, Driver=, 480M /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 480M |__ Port 6: Dev 2, If 2, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 6: Dev 2, If 0, Class=Vendor Specific Class, Driver=, 12M |__ Port 7: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M $ lsusb Bus 004 Device 006: ID 2109:0812 VIA Labs, Inc. VL812 Hub Bus 004 Device 005: ID 2109:0812 VIA Labs, Inc. VL812 Hub Bus 004 Device 004: ID 0578:0578 Intrinsix Corp. KingSpec Bus 004 Device 003: ID 2109:0812 VIA Labs, Inc. VL812 Hub Bus 004 Device 002: ID 2109:0812 VIA Labs, Inc. VL812 Hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 010: ID 1004:633e LG Electronics, Inc. LM-X420xxx/G2/G3 Android Phone (MTP/download mode) Bus 003 Device 009: ID 0bda:8812 Realtek Semiconductor Corp. RTL8812AU 802.11a/b/g/n/ac 2T2R DB WLAN Adapter Bus 003 Device 008: ID 0463:ffff MGE UPS Systems UPS Bus 003 Device 007: ID 2109:2812 VIA Labs, Inc. VL812 Hub Bus 003 Device 006: ID 046d:c30e Logitech, Inc. UltraX Keyboard (Y-BL49) Bus 003 Device 005: ID 2109:2812 VIA Labs, Inc. VL812 Hub Bus 003 Device 004: ID 2109:2812 VIA Labs, Inc. VL812 Hub Bus 003 Device 012: ID 0525:a4a2 Netchip Technology, Inc. Linux-USB Ethernet/RNDIS Gadget Bus 003 Device 003: ID 2109:2812 VIA Labs, Inc. VL812 Hub Bus 003 Device 002: ID 0b05:17cb ASUSTek Computer, Inc. Broadcom BCM20702A0 Bluetooth Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 05e3:0610 Genesys Logic, Inc. Hub Bus 001 Device 002: ID 0b05:1939 ASUSTek Computer, Inc. AURA LED Controller Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Kernel is built from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git at v5.17 tag (f443e374ae131c168a065ea1748feac6b2e76613) Also reproduces for my distro kernel: https://bugzilla.redhat.com/show_bug.cgi?id=2021242
Am I right in assuming this didn't happen on earlier Linux versions like 5.16?
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1) > Am I right in assuming this didn't happen on earlier Linux versions like > 5.16? Forget that, should have looked in the redhat ticket before posting this, there you state that 5.13 seems to not be affected.
Hey Christoph, Marek, Greg! Below warning was reported to bugzilla.kernel.org more than a week ago; to quote the important parts here: > https://bugzilla.kernel.org/show_bug.cgi?id=215740 > > DMA-API: xhci_hcd 0000:09:00.3: cacheline tracking EEXIST, overlapping > mappings aren't supported > WARNING: CPU: 4 PID: 69 at kernel/dma/debug.c:570 > add_dma_entry+0x1c7/0x250 > [...] > Call Trace: > <TASK> > dma_map_page_attrs+0x82/0x250 > ? _raw_spin_unlock_irqrestore+0x30/0x60 > usb_hcd_map_urb_for_dma+0x3eb/0x4f0 > usb_hcd_submit_urb+0x9b/0xb90 > ? lockdep_init_map_type+0x62/0x260 > usb_start_wait_urb+0x55/0x130 > usb_control_msg+0xc9/0x110 > hub_ext_port_status+0x89/0x110 > hub_event+0x1e4/0x1920 See the full quote below for details. According to a bugzilla.redhat.com ticket linked in the report this started to happen with 5.14. A second user there also confirmed the problem that still seems to happen with recent 5.17 pre-releases. Searching on lore revealed that others have hit this warning as well in the past few months: https://lore.kernel.org/all/65271fd1-1c1c-f2ad-9b0f-60174e791eaa@linux.intel.com/ https://lore.kernel.org/all/5cd6fc87-0f8d-0b9b-42be-8180540a94e7@gmail.com/ https://lore.kernel.org/all/CABXGCsP-JJAUh-SV0u1ALu5+awxcWT3BsQ=FzKrqb+9HOk2EUA@mail.gmail.com/ Does anyone have a idea what might be wrong here? If not, I guess we should ask one of the reporters to bisect this. Anyway, to ensure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced v5.13..v5.14 #regzbot title dma or usb: warning "DMA-API: xhci_hcd ... cacheline tracking EEXIST, overlapping mappings aren't supported" #regzbot from: Andrew <travneff@gmail.com> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215740 #regzbot https://bugzilla.redhat.com/show_bug.cgi?id=2021242 #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. On 24.03.22 16:18, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=215740 > > Bug ID: 215740 > Summary: kernel warning: DMA-API: xhci_hcd: cacheline tracking > EEXIST, overlapping mappings aren't supported > Product: Drivers > Version: 2.5 > Kernel Version: 5.17 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: USB > Assignee: drivers_usb@kernel-bugs.kernel.org > Reporter: travneff@gmail.com > Regression: No > > Created attachment 300612 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=300612&action=edit > full dmesg log with the warning > > I have this kernel trace often while PC booting: > > DMA-API: xhci_hcd 0000:09:00.3: cacheline tracking EEXIST, overlapping > mappings aren't supported > WARNING: CPU: 4 PID: 69 at kernel/dma/debug.c:570 > add_dma_entry+0x1c7/0x250 > Modules linked in: bpf_preload > CPU: 4 PID: 69 Comm: kworker/4:1 Not tainted 5.17.0 #1 > Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS, BIOS 2423 > 08/10/2021 > Workqueue: usb_hub_wq hub_event > RIP: 0010:add_dma_entry+0x1c7/0x250 > Code: ff 0f 84 97 00 00 00 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 4b 53 > 80 > 00 48 89 c6 48 89 da 48 c7 c7 58 b3 84 94 e8 27 21 ca 00 <0f> 0b 48 85 ed 0f > 85 > 69 d2 ca 00 8b 05 88 a0 30 02 85 c0 0f 85 f4 > RSP: 0018:ffff9cadc0c9ba88 EFLAGS: 00010292 > RAX: 0000000000000060 RBX: ffff9173d2c49ea0 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff > RBP: ffff9173d11a5e80 R08: 0000000000000000 R09: ffff9cadc0c9b8c0 > R10: ffff9cadc0c9b8b8 R11: ffffffff95165128 R12: 0000000000000000 > R13: 0000000000000001 R14: 0000000000000206 R15: 00000000046882ee > FS: 0000000000000000(0000) GS:ffff9182a8200000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fce5d52a130 CR3: 0000000119b28000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > dma_map_page_attrs+0x82/0x250 > ? _raw_spin_unlock_irqrestore+0x30/0x60 > usb_hcd_map_urb_for_dma+0x3eb/0x4f0 > usb_hcd_submit_urb+0x9b/0xb90 > ? lockdep_init_map_type+0x62/0x260 > usb_start_wait_urb+0x55/0x130 > usb_control_msg+0xc9/0x110 > hub_ext_port_status+0x89/0x110 > hub_event+0x1e4/0x1920 > ? lock_acquire+0xe0/0x2d0 > ? lock_is_held_type+0xea/0x140 > process_one_work+0x29b/0x5f0 > worker_thread+0x4f/0x390 > ? process_one_work+0x5f0/0x5f0 > kthread+0xf5/0x120 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x22/0x30 > </TASK> > > My usb devices: > > $ lsusb -t > /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M > |__ Port 4: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M > |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M > |__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=uas, > 5000M > |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 5000M > |__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/4p, 5000M > /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M > |__ Port 2: Dev 2, If 3, Class=Application Specific Interface, > Driver=, > 12M > |__ Port 2: Dev 2, If 1, Class=Vendor Specific Class, Driver=btusb, > 12M > |__ Port 2: Dev 2, If 2, Class=Vendor Specific Class, Driver=btusb, > 12M > |__ Port 2: Dev 2, If 0, Class=Vendor Specific Class, Driver=btusb, > 12M > |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M > |__ Port 1: Dev 12, If 0, Class=Communications, Driver=cdc_ether, > 480M > |__ Port 1: Dev 12, If 1, Class=CDC Data, Driver=cdc_ether, 480M > |__ Port 4: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M > |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M > |__ Port 3: Dev 6, If 1, Class=Human Interface Device, > Driver=usbhid, 1.5M > |__ Port 3: Dev 6, If 0, Class=Human Interface Device, > Driver=usbhid, 1.5M > |__ Port 4: Dev 7, If 0, Class=Hub, Driver=hub/4p, 480M > |__ Port 2: Dev 8, If 0, Class=Human Interface > Device, > Driver=usbfs, 1.5M > |__ Port 3: Dev 9, If 0, Class=Vendor Specific Class, > Driver=, 480M > |__ Port 4: Dev 10, If 0, Class=Imaging, Driver=, > 480M > /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M > /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 480M > |__ Port 6: Dev 2, If 2, Class=Human Interface Device, Driver=usbhid, > 12M > |__ Port 6: Dev 2, If 0, Class=Vendor Specific Class, Driver=, 12M > |__ Port 7: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M > > $ lsusb > Bus 004 Device 006: ID 2109:0812 VIA Labs, Inc. VL812 Hub > Bus 004 Device 005: ID 2109:0812 VIA Labs, Inc. VL812 Hub > Bus 004 Device 004: ID 0578:0578 Intrinsix Corp. KingSpec > Bus 004 Device 003: ID 2109:0812 VIA Labs, Inc. VL812 Hub > Bus 004 Device 002: ID 2109:0812 VIA Labs, Inc. VL812 Hub > Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub > Bus 003 Device 010: ID 1004:633e LG Electronics, Inc. LM-X420xxx/G2/G3 > Android Phone (MTP/download mode) > Bus 003 Device 009: ID 0bda:8812 Realtek Semiconductor Corp. RTL8812AU > 802.11a/b/g/n/ac 2T2R DB WLAN Adapter > Bus 003 Device 008: ID 0463:ffff MGE UPS Systems UPS > Bus 003 Device 007: ID 2109:2812 VIA Labs, Inc. VL812 Hub > Bus 003 Device 006: ID 046d:c30e Logitech, Inc. UltraX Keyboard (Y-BL49) > Bus 003 Device 005: ID 2109:2812 VIA Labs, Inc. VL812 Hub > Bus 003 Device 004: ID 2109:2812 VIA Labs, Inc. VL812 Hub > Bus 003 Device 012: ID 0525:a4a2 Netchip Technology, Inc. Linux-USB > Ethernet/RNDIS Gadget > Bus 003 Device 003: ID 2109:2812 VIA Labs, Inc. VL812 Hub > Bus 003 Device 002: ID 0b05:17cb ASUSTek Computer, Inc. Broadcom > BCM20702A0 > Bluetooth > Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub > Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub > Bus 001 Device 003: ID 05e3:0610 Genesys Logic, Inc. Hub > Bus 001 Device 002: ID 0b05:1939 ASUSTek Computer, Inc. AURA LED > Controller > Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub > > Kernel is built from > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > at v5.17 tag (f443e374ae131c168a065ea1748feac6b2e76613) > > Also reproduces for my distro kernel: > https://bugzilla.redhat.com/show_bug.cgi?id=2021242 >
That warning was added in v5.14: 2b4bbc6231d7 dma-debug: report -EEXIST errors in add_dma_entry Actual issue on usb side could be a lot older.
On Thu, Mar 31, 2022 at 09:41:25AM +0200, Thorsten Leemhuis wrote: > Hey Christoph, Marek, Greg! > > Below warning was reported to bugzilla.kernel.org more than a week ago; > to quote the important parts here: The warning is from a new debug check added in 5.14 under a debug config option. So while the USB maintainers should look into fixing it I don't think it is a regression in that traditional sense of the word.
Ferry Toth reports that the warning only occurs when connecting a LAN9514 USB Ethernet (drivers/net/usb/smsc95xx.c). Other USB devices (memory sticks, WiFi, Bluetooth) do not trigger the warning: https://lore.kernel.org/netdev/2a436486-a54d-a9b3-d839-232a38653af3@gmail.com/ Curiously, the above-linked other reports seem to indicate that the warning occurs when connecting USB hubs. It's unclear what those have in common with the LAN9514.
In my case it is EVM9514, a 4p USB hub + 10/100 ethernet: https://www.microchip.com/en-us/development-tool/EVB9514 So, indeed it seems like to warning originates from hub code shared between at least VL812 and LAN9514.
I also heard from Andy Shevchenko that he had the same error using even another hub. It seems the error is not particular to a certain hub, but doesn't appear on root hubs. And indeed when I plug my hub into Ubuntu the error doesn't occur because "debug check added in 5.14 under a debug config option" (Comment 5 above) is not enabled in Ubuntu kernels.
Ok, it's different then in #6 above. smsc95xx behaves as a 5p hub, with an usb to ethernet adapter plugged into 1 port. So, 4 ports remain available to the user. The bug itself is likely located in hub.c. I can reproduce this with an ordinary 7p hub (EVB USB2517). Plugging the hub causes no issues. Then plugging an usb device (in my case I just plugged a mouse) into a port triggers the "cacheline tracking EEXIST". The warning occurs only once (regardless unplugging the device or the hub). Note that on Ubuntu 23.10 the warning does not occur, but they have most debug options not set. So, likely this is an old bug that has gone undetected for a long while.
Seems might be caused from init3: usb_submit_urb() in hub_active()> I don't see why. May be Alan knows?
I don't think it has anything to do with the init3: section. The stack trace shows that the problem stems from hub_ext_port_status(), probably called by port_event() within hub_event(). And the place where hub_ext_port_status() calls usb_control_msg() indirectly is protected by a mutex! The I/O buffer being mapped multiple times is hub->status->port, and it doesn't get used anywhere else in the hub driver AFAICS. (There is a comment about the phy driver also using that buffer; I don't know what it refers to.) [It would be easier to track this down if the DMA subsystem would cause these erroneous mappings to fail instead of pretending that they have succeeded. Grumble...] I'm stumped. I'll try to find time to reproduce it on my own system. Probably would have noticed it before now if I had tried connecting an external hub recently, but that just hasn't come up.
Ah, ok I was looking at a stack trace from a more recent kernel. So, here to add some more data points (2 different kernels): This is with EVB LAN9514 plugged in while booting: : ------------[ cut here ]------------ : DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported : WARNING: CPU: 1 PID: 32 at kernel/dma/debug.c:602 add_dma_entry+0x1eb/0x2b0 : Modules linked in: mmc_block extcon_intel_mrfld sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs libcrc32c xor zlib_deflate zstd_compress raid6_pq : CPU: 1 PID: 32 Comm: kworker/1:1 Not tainted 6.6.0-next-20231106-edison-acpi-standard+ #1 : Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 : Workqueue: usb_hub_wq hub_event : RIP: 0010:add_dma_entry+0x1eb/0x2b0 : Code: 0f 84 cc 00 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 48 1f 68 00 48 89 c6 4c 89 e2 48 c7 c7 80 04 3d a4 e8 76 fb f5 ff 90 <0f> 0b 90 90 48 85 ed 74 1a 48 c7 c7 68 84 35 a4 e8 f0 0c fd ff 8b : RSP: 0000:ffffa5bb0021fab0 EFLAGS: 00010286 : RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000ffffdfff : RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001 : RBP: ffff9f50c122da80 R08: ffffffffa4738d08 R09: 00000000ffffdfff : R10: ffffffffa4658d20 R11: ffffffffa4708d20 R12: ffff9f50c2879550 : R13: 0000000000000001 R14: 0000000000000202 R15: 00000000002047f8 : FS: 0000000000000000(0000) GS:ffff9f50fe300000(0000) knlGS:0000000000000000 : CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 : CR2: 00005647b6a03c0f CR3: 0000000009ce0000 CR4: 00000000001006f0 : Call Trace: : <TASK> : ? __warn+0x7e/0x130 : ? add_dma_entry+0x1eb/0x2b0 : ? report_bug+0x18e/0x1a0 : ? handle_bug+0x3d/0x70 : ? exc_invalid_op+0x18/0x70 : ? asm_exc_invalid_op+0x1a/0x20 : ? add_dma_entry+0x1eb/0x2b0 : dma_map_page_attrs+0x105/0x280 : ? __virt_addr_valid+0xe7/0x130 : usb_hcd_map_urb_for_dma+0x3c2/0x500 : usb_hcd_submit_urb+0x9b/0xb70 : ? kmem_cache_alloc+0x20d/0x230 : ? usb_hcd_link_urb_to_ep+0x72/0xa0 : ? prepare_transfer+0x104/0x140 : usb_start_wait_urb+0x60/0x160 : usb_control_msg+0xe0/0x150 : hub_ext_port_status+0x85/0x1b0 : hub_event+0x1f7/0x1ba0 : ? hub_activate+0x59b/0x8f0 : process_one_work+0x136/0x2f0 : ? __pfx_worker_thread+0x10/0x10 : worker_thread+0x2ed/0x400 : ? __pfx_worker_thread+0x10/0x10 : kthread+0xd3/0x100 : ? __pfx_kthread+0x10/0x10 : ret_from_fork+0x2f/0x50 : ? __pfx_kthread+0x10/0x10 : ret_from_fork_asm+0x1a/0x30 : </TASK> : ---[ end trace 0000000000000000 ]--- : DMA-API: Mapped at: : debug_dma_map_page+0x68/0x100 : dma_map_page_attrs+0x105/0x280 : usb_hcd_map_urb_for_dma+0x3c2/0x500 : usb_hcd_submit_urb+0x9b/0xb70 : usb_start_wait_urb+0x60/0x160 This is after boot, plugging the EVM-USB2517, then plugging a mouse: ------------[ cut here ]------------ DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: CPU: 0 PID: 587 at kernel/dma/debug.c:568 add_dma_entry+0x1ed/0x280 Modules linked in: rfcomm bnep iptable_nat snd_sof_nocodec spi_pxa2xx_platform dw_dmac snd_sof_pci_intel_tng snd_sof_pci snd_sof_intel_atom pwm_lpss_pci pwm_lpss> CPU: 0 PID: 587 Comm: kworker/0:3 Not tainted 6.1.55-edison-acpi-standard #1 Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 Workqueue: usb_hub_wq hub_event RIP: 0010:add_dma_entry+0x1ed/0x280 Code: ff 0f 84 97 00 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 f5 a6 5a 00 48 89 c6 4c 89 e2 48 c7 c7 10 ed 62 b8 e8 e3 ce f7 ff <0f> 0b 48 85 ed 0f 85 4c d3 > RSP: 0018:ffffa3400066fae0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffffb8634309 RDI: 00000000ffffffff RBP: ffff92a981295f00 R08: ffffffffb8937828 R09: 00000000ffffdfff R10: ffffffffb8857840 R11: ffffffffb8907840 R12: ffff92a982ca37a0 R13: 0000000000000001 R14: 0000000000000206 R15: 00000000000b716f FS: 0000000000000000(0000) GS:ffff92a9be200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055bed6f3b078 CR3: 000000000e00c000 CR4: 00000000001006f0 Call Trace: <TASK> ? __warn+0x74/0xc0 ? add_dma_entry+0x1ed/0x280 ? report_bug+0xee/0x170 ? prb_read_valid+0x12/0x20 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? add_dma_entry+0x1ed/0x280 dma_map_page_attrs+0x101/0x270 ? preempt_count_add+0x6b/0xa0 ? preempt_count_add+0x6b/0xa0 usb_hcd_map_urb_for_dma+0x410/0x570 usb_hcd_submit_urb+0x97/0xb80 ? hub_activate+0x2db/0x890 ? hub_resume+0x2e/0xf0 usb_start_wait_urb+0x60/0x160 usb_control_msg+0xdc/0x140 hub_ext_port_status+0x82/0x110 hub_event+0x442/0x1840 ? _raw_spin_unlock+0xd/0x30 ? finish_task_switch.isra.0+0x89/0x260 process_one_work+0x1d7/0x3a0 worker_thread+0x48/0x3c0 ? process_one_work+0x3a0/0x3a0 kthread+0xe3/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> ---[ end trace 0000000000000000 ]--- DMA-API: Mapped at: debug_dma_map_page+0x64/0x100 dma_map_page_attrs+0x101/0x270 usb_hcd_map_urb_for_dma+0x410/0x570 usb_hcd_submit_urb+0x97/0xb80 usb_start_wait_urb+0x60/0x160
Those two stack traces are essentially the same as the first one reported. They all contain the following key sequence of function calls: dma_map_page_attrs usb_hcd_map_urb_for_dma usb_hcd_submit_urb usb_start_wait_urb usb_control_msg hub_ext_port_status hub_event process_one_work It doesn't look like they add anything new.
I see. I was misled by ? hub_activate, and hub_activate calling usb_submit_urb. In any case the traces then show the problem is likely reproducible using any hub.
Created attachment 305425 [details] Diagnostics for overlapped DMA mapping I just tried booting a kernel with CONFIG_DMA_API_DEBUG enabled. When I plugged in a hub and then plugged some devices into that hub, nothing went wrong. So since I can't do any useful testing here, maybe one of you can test the attached patch. It won't fix anything but it should help confirm what's happening and provide a starting point for further debugging.
Created attachment 305426 [details] ftrace, hub plugged at 131, mouse plugged into hub at 141 Sure. And added a ftrace see attachment trace_hub.txt. In journal I got: ... Nov 19 20:57:45 yuna kernel: ---[ end trace 0000000000000000 ]--- Nov 19 20:57:45 yuna kernel: DMA-API: Mapped at: Nov 19 20:57:45 yuna kernel: debug_dma_map_page+0x6d/0x100 Nov 19 20:57:45 yuna kernel: dma_map_page_attrs+0x10a/0x280 Nov 19 20:57:45 yuna kernel: usb_hcd_map_urb_for_dma+0x419/0x5c0 Nov 19 20:57:45 yuna kernel: usb_hcd_submit_urb+0xa0/0xaa0 Nov 19 20:57:45 yuna kernel: usb_start_wait_urb+0x65/0x100 Nov 19 20:57:45 yuna kernel: Map error: ffffd221000e02c0 ffff916fc336b4a0 Nov 19 20:57:45 yuna kernel: Map error: ffffd221000e02c0 ffff916fc336b4a0 Nov 19 20:57:45 yuna kernel: usb 1-1.1: new low-speed USB device number 3 using xhci-hcd Nov 19 20:57:45 yuna kernel: Map error: ffffd221000e02c0 ffff916fc336b4a0 ...
Created attachment 305444 [details] Revised diagnostics The ftrace doesn't help. You can't tell where the error occurs or which hub the function calls are for. The patch wasn't quite right, so here is an updated version. Be sure to include the "cacheline tracking EEXIST, overlapping mappings aren't supported" line when you post the dmesg output.
Hi Alan, would you know why your patches won't apply using git to v6.6? I fixed up manually. Results of Revised diagnostics attachment 305444 [details] below: ------------[ cut here ]------------ DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported, len 4 WARNING: CPU: 0 PID: 7 at kernel/dma/debug.c:607 add_dma_entry+0x207/0x2c0 Modules linked in: rfcomm bnep iptable_nat snd_sof_nocodec spi_pxa2xx_platform dw_dmac brcmfmac_wcc snd_sof_pci_intel_tng snd_sof_pci snd_sof_intel_atom pwm_lpss> CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted 6.6.0-edison-acpi-standard+ #1 Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 Workqueue: usb_hub_wq hub_event RIP: 0010:add_dma_entry+0x207/0x2c0 Code: c8 00 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 fe a3 6a 00 48 89 c6 4c 89 e9 4c 89 e2 48 c7 c7 a8 1d 9a 99 e8 69 ac f5 ff <0f> 0b 48 85 db 74 1a 48 c7 > RSP: 0018:ffffa4de00043ab0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff9c5f01376e80 RCX: 0000000000000027 RDX: ffff9c5f3e21c5c8 RSI: 0000000000000001 RDI: ffff9c5f3e21c5c0 RBP: 00000000ffffffff R08: ffffffff99d431e8 R09: 00000000ffffdfff R10: ffffffff99c63200 R11: ffffffff99d13200 R12: ffff9c5f012375f0 R13: 0000000000000004 R14: 0000000000000206 R15: 00000000001eb8d2 FS: 0000000000000000(0000) GS:ffff9c5f3e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f57691956b0 CR3: 000000003682e000 CR4: 00000000001006f0 Call Trace: <TASK> ? __warn+0x84/0x140 ? add_dma_entry+0x207/0x2c0 ? report_bug+0x199/0x1b0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? add_dma_entry+0x207/0x2c0 dma_map_page_attrs+0x10a/0x280 usb_hcd_map_urb_for_dma+0x46f/0x650 usb_hcd_submit_urb+0xa0/0xaa0 ? kick_hub_wq+0x69/0xf0 ? hub_activate+0x2c3/0x890 ? usb_disable_remote_wakeup+0x34/0x70 usb_start_wait_urb+0x65/0x100 usb_control_msg+0xe5/0x150 hub_ext_port_status+0x8a/0x1b0 hub_event+0x1d1/0x18e0 ? trace_hardirqs_on+0x21/0x90 process_one_work+0x139/0x2f0 worker_thread+0x2f4/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> ---[ end trace 0000000000000000 ]--- DMA-API: Mapped at: debug_dma_map_page+0x6d/0x100 dma_map_page_attrs+0x10a/0x280 usb_hcd_map_urb_for_dma+0x46f/0x650 usb_hcd_submit_urb+0xa0/0xaa0 usb_start_wait_urb+0x65/0x100 Map error: 128857248 128857248 len 4 Map error: 128857248 128857248 len 4 usb 1-1.1: new low-speed USB device number 3 using xhci-hcd Map error: 128857248 128857248 len 4 usb 1-1.1: New USB device found, idVendor=046d, idProduct=c03e, bcdDevice=20.00 usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 1-1.1: Product: USB-PS/2 Optical Mouse usb 1-1.1: Manufacturer: Logitech input: Logitech USB-PS/2 Optical Mouse as /devices/pci0000:00/0000:00:11.0/dwc3.0.auto/xhci-hcd.1.auto/usb1/1-1/1-1.1/1-1.1:1.0/0003:046D:C03E.0001/input/input1 hid-generic 0003:046D:C03E.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-xhci-hcd.1.auto-1.1/input0
Created attachment 305456 [details] Diagnostic to find DMA overlap The patches are prepared against a 6.6-rc3 tree (it hasn't been updated since then), so they might not match 6.6 exactly. At least we're getting some good data from the patch: > DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings > aren't supported, len 4 ... > Map error: 128857248 128857248 len 4 The fact that the addresses and lengths agree indicate that the patch works right, and len = 4 means that the mapping error is indeed for the buffer containing the hub status data. By the way, do you have any USB hubs besides this one on your system? The next question is what other mapping already exists for that buffer? This next patch may be able to tell us. It's supposed to produce a second stack dump when the overlap occurs, for the call that created the first mapping.
It's not a conflict or so, I am getting (using qgit or cli): git am --signoff /home/ferry/Downloads/dma-dbg.patch Patch format detection failed. when I add --patch-format=mbox I get: \Applying: @@ -, +, @@ fatal: empty ident name (for <>) not allowed Never had this before. I am testing on Intel Merrifield (Edison Arduino): root@yuna:~# lsusb -t /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/7p, 480M |__ Port 1: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M But root_hub 5000M is not available due to no phy. With Diagnostic to find DMA overlap attachment 305456 [details] ------------[ cut here ]------------ DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: CPU: 0 PID: 105 at kernel/dma/debug.c:608 add_dma_entry+0x1fc/0x370 Modules linked in: rfcomm iptable_nat bnep snd_sof_nocodec spi_pxa2xx_platform dw_dmac brcmfmac_wcc pwm_lpss_pci pwm_lpss snd_sof_pci_intel_tng intel_mrfld_pwrbtn snd_sof_pci snd_sof_intel_atom intel_mrfld_adc snd_sof dw_dmac_pci snd_sof_utils snd_sof_xtensa_dsp dw_dmac_core snd_intel_dspcfg snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil hci_uart leds_gpio btbcm ti_ads7950 tun ledtrig_timer ledtrig_heartbeat mmc_block extcon_intel_mrfld sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs libcrc32c xor zlib_deflate zstd_compress raid6_pq CPU: 0 PID: 105 Comm: kworker/0:3 Not tainted 6.6.0-edison-acpi-standard+ #1 Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 Workqueue: usb_hub_wq hub_event RIP: 0010:add_dma_entry+0x1fc/0x370 Code: ff 0f 84 80 01 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 e6 9d 6a 00 48 89 c6 4c 89 e2 48 c7 c7 10 21 5a b1 e8 94 a5 f5 ff <0f> 0b 48 85 db 74 1a 48 c7 c7 44 18 53 b1 e8 01 f9 fc ff 8b 73 4c RSP: 0000:ffffb63c403fbab0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff908001374f00 RCX: 0000000000000027 RDX: ffff90803e21c5c8 RSI: 0000000000000001 RDI: ffff90803e21c5c0 RBP: 000000000019d4b5 R08: ffffffffb19431e8 R09: 00000000ffffdfff R10: ffffffffb1863200 R11: ffffffffb1913200 R12: ffff908002124860 R13: 0000000000000001 R14: 0000000000000202 R15: 000000000019d4b5 FS: 0000000000000000(0000) GS:ffff90803e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd58c003000 CR3: 0000000008096000 CR4: 00000000001006f0 Call Trace: <TASK> ? __warn+0x84/0x140 ? add_dma_entry+0x1fc/0x370 ? report_bug+0x199/0x1b0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? add_dma_entry+0x1fc/0x370 dma_map_page_attrs+0x10a/0x280 usb_hcd_map_urb_for_dma+0x46f/0x650 usb_hcd_submit_urb+0xa0/0xaa0 ? kick_hub_wq+0x69/0xf0 ? hub_activate+0x2c3/0x890 ? usb_disable_remote_wakeup+0x34/0x70 usb_start_wait_urb+0x65/0x100 usb_control_msg+0xe5/0x150 hub_ext_port_status+0x8a/0x1b0 hub_event+0x1d1/0x18e0 process_one_work+0x139/0x2f0 ? __pfx_worker_thread+0x10/0x10 worker_thread+0x2f4/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> ---[ end trace 0000000000000000 ]--- DMA-API: Mapped at: debug_dma_map_page+0x6d/0x100 dma_map_page_attrs+0x10a/0x280 usb_hcd_map_urb_for_dma+0x46f/0x650 usb_hcd_submit_urb+0xa0/0xaa0 usb_start_wait_urb+0x65/0x100 DMA-API: Overlapped entry: 108342600 size 1 DMA-API: Mapped at: debug_dma_map_page+0x6d/0x100 dma_map_page_attrs+0x10a/0x280 usb_hcd_map_urb_for_dma+0x46f/0x650 usb_hcd_submit_urb+0xa0/0xaa0 hub_activate+0x29c/0x890 Map error: 108342608 108342608 len 4 Map error: 108342608 108342608 len 4 usb 1-1.1: new low-speed USB device number 3 using xhci-hcd Map error: 108342608 108342608 len 4 usb 1-1.1: New USB device found, idVendor=046d, idProduct=c03e, bcdDevice=20.00 usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 1-1.1: Product: USB-PS/2 Optical Mouse usb 1-1.1: Manufacturer: Logitech input: Logitech USB-PS/2 Optical Mouse as /devices/pci0000:00/0000:00:11.0/dwc3.0.auto/xhci-hcd.1.auto/usb1/1-1/1-1.1/1-1.1:1.0/0003:046D:C03E.0001/input/input1 hid-generic 0003:046D:C03E.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-xhci-hcd.1.auto-1.1/input0 I also notice now on unplugging the mouse: Map error: 108342608 108342608 len 4 usb 1-1.1: USB disconnect, device number 3 Map error: 108342608 108342608 len 4 Map error: 108342608 108342608 len 4 Map error: 108342608 108342608 len 4 Map error: 108342608 108342608 len 4 Map error: 108342608 108342608 len 4
Looks like read mapping is bigger than 8, I think it might have a cache line alignment requirement. But I have knowledge of USB core code, I believe Alan knows much better what's going on there. Btw, I don't see USB using https://elixir.bootlin.com/linux/v6.7-rc2/C/ident/dma_get_cache_alignment API (except the only DWC2 case).
Also there is dma_kmalloc_size_aligned() which might help.
+Christoph Hellwig (because I am not an expert on kmalloc and cache line alignment) Ferry: Sorry about the malformed patches. The bad line was added automatically by Quilt (I don't know why). If I add any more patches, I'll try to remember to remove that bogus line first. Andy: There are two mappings in question here, both of them are reads, neither is larger than 8 bytes, and neither has a cache line alignment requirement (other than the general restriction that on architectures with non-cache-coherent DMA -- i.e., not x86 -- different allocations should belong to different cache lines). The first mapping is 1 byte long (at the start of the 8-byte-long hub->buffer allocated near the beginning of hub_configure() in drivers/usb/core/hub.c) and the second is 4 bytes long (for hub->status, allocated just afterward). The physical addresses and sizes of the mappings, as shown in the log, are: hub->buffer: 108342600 size 1 hub->status: 108342608 size 4 Obviously they lie in the same cache line, which is what triggers the DMA debugging error. But equally obviously, on x86 or other architectures with cache-coherent DMA, there is nothing wrong with them. Incidentally,
Sorry, submitted the update too soon. I meant to add: Incidentally, the check for overlapping DMA mappings does not take into account the possibility that a mapping can extend over several cache lines. Maybe this wasn't deemed important enough to check for. But in any case, it seems clear that on x86 there shouldn't be anything wrong with allowing several different mappings to refer to the same cache line.
I'm not sure I understand 'u8 (*buffer)[8]' in the data struct, sounds like 8 pointers, so the side has to be 32/64 bytes. As for the next field, status, it's a pointer and again 4/8 bytes. I believe we are talking about x86_64 case, so I fail to see how 1 and 4 map to the struct usb_hub fields. Is the len in your debug patches is in "long"s units? Then it makes sense if the status is mapped first (1 long) followed by 4 (out of 8) URB buffer pointers.
"u8 (*buffer)[8]" means that buffer is a pointer to an array of 8 u8's, NOT an array of 8 pointers to u8. The latter would be written as "u8 *(buffer[8])" or "u8 *buffer[8]". C's syntax is notoriously difficult to understand when talking about arrays of pointers and pointers to arrays; you have to parse the declaration from the inside out and you may need to know that postfix "[]" has higher binding precedence than prefix "*". 1 and 4 don't map to anything; they are the _lengths_ of the DMA mappings. That is, the first mapping is 1 byte long and the second mapping is 4 bytes long: These are lengths in bytes, not in longs.
Yeah, I got it, so the problematic one (that triggers a warning) is this line ret = get_port_status(hub->hdev, port1, &hub->status->port, type, len); where port is 8 bytes, but len = 4 as we asked for non-extended status. Which line maps the DMA? ``` } else { urb->transfer_dma = dma_map_single( hcd->self.sysdev, urb->transfer_buffer, urb->transfer_buffer_length, dir); ``` This one?
Yes, that one. The issue of "which mapping is the problematic one" is more or less irrelevant. The warning is issued when the second mapping is created. The order of the two mappings doesn't matter, just the fact that they refer to the same cache line.
(In reply to Andy Shevchenko from comment #25) > status, it's a pointer and again 4/8 bytes. I believe we are talking about > x86_64 case, so I fail to see how 1 and 4 map to the struct usb_hub fields. Yes. But I just checked and the same happens with i686 kernel (on same platform).
So I checked and: root@yuna:~# cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size 64 So I just simply added to your debug patches: diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 0ff47eeffb49..ba5864b23aed 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1420,7 +1420,7 @@ static int hub_configure(struct usb_hub *hub, unsigned full_load; unsigned maxchild; - hub->buffer = kmalloc(sizeof(*hub->buffer), GFP_KERNEL); + hub->buffer = kmalloc(64, GFP_KERNEL); if (!hub->buffer) { ret = -ENOMEM; goto fail; and the problem disappears. But then, why Comment 15 can Alan not reproduce this?
I don't know why my local test didn't reproduce the problem -- and I haven't tried to find out what's happening. My best guess is that the allocations of hub->buffer and hub->status end up (by random chance) being non-adjacent, so they aren't in the same cache line. But I haven't even run the test patches attached to this bug report. Regardless, changing the allocation size to 64 merely hides the problem. The real problem is that the DMA debugging code is wrong and needs to be fixed. It should not complain about two different mappings of the same cache line on x86, unless the mappings actually overlap.
By changing the allocation I just wanted to close the loop and prove this is the trigger. And maybe on other platforms than x86 an actual issue? What do you think of Andy's suggestion to use dma_kmalloc_size_aligned()? I still find it strange that the issue does not reproduce for you. I have built each kernel since this issue was introduced, with multiple compiler versions, x86_64/i686 and it happens 100% of the cases. Could there be another related CONFIG?
It should not be necessary to use dma_kmalloc_size_aligned(). These allocations don't need any special alignment; they just need to be in different cache lines on some architectures. Isn't this what ARCH_DMA_MINALIGN and ARCH_KMALLOC_MINALIGN are supposed to take care of (see include/linux/slab.h)? I'll try doing a little more testing to find out why the issue does not show up on my computer.
Created attachment 305473 [details] Revised (again) diagnostics I revised the test patch slightly. Now the addresses are printed in hex, as they should have been all along. Also, there's a line to print out the physical addresses pointed to by hub->buffer and hub->status when a new hub is configured. Here's the log for plugging in a hub, then plugging a mouse into the hub: [ 32.757399] usb 1-1: new high-speed USB device number 5 using xhci_hcd [ 32.886200] usb 1-1: New USB device found, idVendor=05e3, idProduct=0608, bcdDevice= 9.01 [ 32.886444] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0 [ 32.886631] usb 1-1: Product: USB2.0 Hub [ 32.889566] hub 1-1:1.0: USB hub found [ 32.889813] usb 1-1: hub buffer at 100e71818, status at 100e71610 [ 32.890499] hub 1-1:1.0: 4 ports detected [ 42.761267] usb 1-1.3: new low-speed USB device number 6 using xhci_hcd [ 42.833935] kauditd_printk_skb: 101 callbacks suppressed [ 42.844775] usb 1-1.3: New USB device found, idVendor=093a, idProduct=2510, bcdDevice= 1.00 [ 42.844841] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 42.844890] usb 1-1.3: Product: USB OPTICAL MOUSE [ 42.844933] usb 1-1.3: Manufacturer: PIXART [ 42.851150] input: PIXART USB OPTICAL MOUSE as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:093A:2510.0009/input/input10 [ 42.851585] hid-generic 0003:093A:2510.0009: input,hidraw8: USB HID v1.11 Mouse [PIXART USB OPTICAL MOUSE] on usb-0000:00:14.0-1.3/input0 The cache line size on this computer is 64 bytes, same as yours. As you can see, hub->buffer and hub->status were allocated in different cache lines, so the DMA debugging routine did not detect an overlap. That's why I didn't get any error messages.
So as most of the people subscribing to this bug have seen from the email discussion, it looks like the bug ultimately lies in the alignment of kmalloc() buffers and it probably will be fixed with a kernel patch. Equally important, however, is the question of why the people experiencing this problem turned on CONFIG_DMA_API_DEBUG in the first place. That setting is meant only for kernel testing and it drastically reduces performance. Redhat certainly should not enable it in any of their distribution kernels (and in fact it is not set in my copy of Fedora 38). Andrew, Ferry, any comments?
Your latest patch gives me: root@yuna:~# journalctl -k | grep "hub buffer" kernel: usb usb1: hub buffer at 6e1cda0, status at 6e1cda8 kernel: usb usb2: hub buffer at 6e1ce20, status at 6e1ce28 kernel: usb 1-1: hub buffer at 8001380, status at 8001388 The first 2 lines are for the root hubs. In my case on Intel Edison-Arduino the distribution is Yocto generated. The platform is especially suited to attach custom hardware, iio devices, arduino shields and the like, as you have direct access to gpios, i2c, spi, hsu. So CONFIG_DMA_API_DEBUG was enabled to help debug device drivers. I have to admit I never realised it to be an expensive option. And certainly didn't expect to find bugs in DMA_API_DEBUG itself or kmalloc(). I'll turn that into an optional thing with a comment on possible side effects.
Okay. On an Intel Edison-Arduino the slowdown won't be as dramatic as on a large system doing high-bandwidth I/O. But you'll still want to avoid it if you can. Have you tried testing the patch posted in the email thread?
With Catalin Marinas's suggestion "force the kmalloc() min align to cache_line_size()" + attachment 305473 [details] root@yuna:~# journalctl -k | grep hub kernel: usbcore: registered new interface driver hub kernel: hub 1-0:1.0: USB hub found kernel: usb usb1: hub buffer at 71c7180, status at 71c71c0 kernel: hub 1-0:1.0: 1 port detected kernel: hub 2-0:1.0: USB hub found kernel: usb usb2: hub buffer at 71c79c0, status at 71c7a00 kernel: hub 2-0:1.0: 1 port detected kernel: hub 1-1:1.0: USB hub found kernel: usb 1-1: hub buffer at 65b36c0, status at 6639340 kernel: hub 1-1:1.0: 7 ports detected and stack trace goes away. IOW the 2 root hub kmalloc() are now also aligned to the cache line size, even though these never triggered the stack trace. Strange: hub's status is aligned far away from hub buffer.
@Alan, others, so what is the status of affairs? What's the conclusion? I still get the warning almost every time I boot my boards.
Also note, the boards are not limited to Intel Merrifield based ones, I have the same stacktrace on different Intel SoCs, like Apollo Lake (Joule). [ 0.145195] smpboot: CPU0: Intel(R) Atom(TM) Processor T5700 @ 1.70GHz (family: 0x6, model: 0x5c, stepping: 0x2) ... [ 2.740381] ------------[ cut here ]------------ [ 2.741675] DMA-API: xhci_hcd 0000:00:15.0: cacheline tracking EEXIST, overlapping mappings aren't supported [ 2.744345] WARNING: CPU: 0 PID: 64 at kernel/dma/debug.c:607 add_dma_entry+0x219/0x350