Bug 215740 - kernel warning: DMA-API: xhci_hcd: cacheline tracking EEXIST, overlapping mappings aren't supported
Summary: kernel warning: DMA-API: xhci_hcd: cacheline tracking EEXIST, overlapping map...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-24 15:18 UTC by Andrew
Modified: 2023-11-30 19:35 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments
full dmesg log with the warning (167.77 KB, text/plain)
2022-03-24 15:18 UTC, Andrew
Details
Diagnostics for overlapped DMA mapping (1.68 KB, patch)
2023-11-19 16:47 UTC, Alan Stern
Details | Diff
ftrace, hub plugged at 131, mouse plugged into hub at 141 (163.72 KB, text/plain)
2023-11-19 21:12 UTC, Ferry Toth
Details
Revised diagnostics (2.24 KB, patch)
2023-11-20 19:43 UTC, Alan Stern
Details | Diff
Diagnostic to find DMA overlap (2.74 KB, patch)
2023-11-21 20:51 UTC, Alan Stern
Details | Diff
Revised (again) diagnostics (3.27 KB, patch)
2023-11-25 22:15 UTC, Alan Stern
Details | Diff

Description Andrew 2022-03-24 15:18:12 UTC
Created attachment 300612 [details]
full dmesg log with the warning

I have this kernel trace often while PC booting:

    DMA-API: xhci_hcd 0000:09:00.3: cacheline tracking EEXIST, overlapping mappings aren't supported
    WARNING: CPU: 4 PID: 69 at kernel/dma/debug.c:570 add_dma_entry+0x1c7/0x250
    Modules linked in: bpf_preload
    CPU: 4 PID: 69 Comm: kworker/4:1 Not tainted 5.17.0 #1
    Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS, BIOS 2423 08/10/2021
    Workqueue: usb_hub_wq hub_event
    RIP: 0010:add_dma_entry+0x1c7/0x250
    Code: ff 0f 84 97 00 00 00 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 4b 53 80 00 48 89 c6 48 89 da 48 c7 c7 58 b3 84 94 e8 27 21 ca 00 <0f> 0b 48 85 ed 0f 85 69 d2 ca 00 8b 05 88 a0 30 02 85 c0 0f 85 f4
    RSP: 0018:ffff9cadc0c9ba88 EFLAGS: 00010292
    RAX: 0000000000000060 RBX: ffff9173d2c49ea0 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff
    RBP: ffff9173d11a5e80 R08: 0000000000000000 R09: ffff9cadc0c9b8c0
    R10: ffff9cadc0c9b8b8 R11: ffffffff95165128 R12: 0000000000000000
    R13: 0000000000000001 R14: 0000000000000206 R15: 00000000046882ee
    FS:  0000000000000000(0000) GS:ffff9182a8200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fce5d52a130 CR3: 0000000119b28000 CR4: 0000000000350ee0
    Call Trace:
    <TASK>
    dma_map_page_attrs+0x82/0x250
    ? _raw_spin_unlock_irqrestore+0x30/0x60
    usb_hcd_map_urb_for_dma+0x3eb/0x4f0
    usb_hcd_submit_urb+0x9b/0xb90
    ? lockdep_init_map_type+0x62/0x260
    usb_start_wait_urb+0x55/0x130
    usb_control_msg+0xc9/0x110
    hub_ext_port_status+0x89/0x110
    hub_event+0x1e4/0x1920
    ? lock_acquire+0xe0/0x2d0
    ? lock_is_held_type+0xea/0x140
    process_one_work+0x29b/0x5f0
    worker_thread+0x4f/0x390
    ? process_one_work+0x5f0/0x5f0
    kthread+0xf5/0x120
    ? kthread_complete_and_exit+0x20/0x20
    ret_from_fork+0x22/0x30
    </TASK>

My usb devices:

$ lsusb -t
    /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
        |__ Port 4: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M
                |__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
                |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 5000M
                    |__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/4p, 5000M
    /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M
        |__ Port 2: Dev 2, If 3, Class=Application Specific Interface, Driver=, 12M
        |__ Port 2: Dev 2, If 1, Class=Vendor Specific Class, Driver=btusb, 12M
        |__ Port 2: Dev 2, If 2, Class=Vendor Specific Class, Driver=btusb, 12M
        |__ Port 2: Dev 2, If 0, Class=Vendor Specific Class, Driver=btusb, 12M
        |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 1: Dev 12, If 0, Class=Communications, Driver=cdc_ether, 480M
            |__ Port 1: Dev 12, If 1, Class=CDC Data, Driver=cdc_ether, 480M
            |__ Port 4: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
                |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
                    |__ Port 3: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M
                    |__ Port 3: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
                    |__ Port 4: Dev 7, If 0, Class=Hub, Driver=hub/4p, 480M
                        |__ Port 2: Dev 8, If 0, Class=Human Interface Device, Driver=usbfs, 1.5M
                        |__ Port 3: Dev 9, If 0, Class=Vendor Specific Class, Driver=, 480M
                        |__ Port 4: Dev 10, If 0, Class=Imaging, Driver=, 480M
    /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
    /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 480M
        |__ Port 6: Dev 2, If 2, Class=Human Interface Device, Driver=usbhid, 12M
        |__ Port 6: Dev 2, If 0, Class=Vendor Specific Class, Driver=, 12M
        |__ Port 7: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M

$ lsusb
    Bus 004 Device 006: ID 2109:0812 VIA Labs, Inc. VL812 Hub
    Bus 004 Device 005: ID 2109:0812 VIA Labs, Inc. VL812 Hub
    Bus 004 Device 004: ID 0578:0578 Intrinsix Corp. KingSpec
    Bus 004 Device 003: ID 2109:0812 VIA Labs, Inc. VL812 Hub
    Bus 004 Device 002: ID 2109:0812 VIA Labs, Inc. VL812 Hub
    Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    Bus 003 Device 010: ID 1004:633e LG Electronics, Inc. LM-X420xxx/G2/G3 Android Phone (MTP/download mode)
    Bus 003 Device 009: ID 0bda:8812 Realtek Semiconductor Corp. RTL8812AU 802.11a/b/g/n/ac 2T2R DB WLAN Adapter
    Bus 003 Device 008: ID 0463:ffff MGE UPS Systems UPS
    Bus 003 Device 007: ID 2109:2812 VIA Labs, Inc. VL812 Hub
    Bus 003 Device 006: ID 046d:c30e Logitech, Inc. UltraX Keyboard (Y-BL49)
    Bus 003 Device 005: ID 2109:2812 VIA Labs, Inc. VL812 Hub
    Bus 003 Device 004: ID 2109:2812 VIA Labs, Inc. VL812 Hub
    Bus 003 Device 012: ID 0525:a4a2 Netchip Technology, Inc. Linux-USB Ethernet/RNDIS Gadget
    Bus 003 Device 003: ID 2109:2812 VIA Labs, Inc. VL812 Hub
    Bus 003 Device 002: ID 0b05:17cb ASUSTek Computer, Inc. Broadcom BCM20702A0 Bluetooth
    Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    Bus 001 Device 003: ID 05e3:0610 Genesys Logic, Inc. Hub
    Bus 001 Device 002: ID 0b05:1939 ASUSTek Computer, Inc. AURA LED Controller
    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Kernel is built from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
at v5.17 tag (f443e374ae131c168a065ea1748feac6b2e76613)

Also reproduces for my distro kernel: https://bugzilla.redhat.com/show_bug.cgi?id=2021242
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-31 06:52:50 UTC
Am I right in assuming this didn't happen on earlier Linux versions like 5.16?
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-31 06:54:12 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1)
> Am I right in assuming this didn't happen on earlier Linux versions like
> 5.16?
Forget that, should have looked in the redhat ticket before posting this, there you state that 5.13 seems to not be affected.
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-31 07:41:32 UTC
Hey Christoph, Marek, Greg!

Below warning was reported to bugzilla.kernel.org more than a week ago;
to quote the important parts here:

> https://bugzilla.kernel.org/show_bug.cgi?id=215740
>
>     DMA-API: xhci_hcd 0000:09:00.3: cacheline tracking EEXIST, overlapping
>     mappings aren't supported
>     WARNING: CPU: 4 PID: 69 at kernel/dma/debug.c:570
>     add_dma_entry+0x1c7/0x250
> [...]
>     Call Trace:
>     <TASK>
>     dma_map_page_attrs+0x82/0x250
>     ? _raw_spin_unlock_irqrestore+0x30/0x60
>     usb_hcd_map_urb_for_dma+0x3eb/0x4f0
>     usb_hcd_submit_urb+0x9b/0xb90
>     ? lockdep_init_map_type+0x62/0x260
>     usb_start_wait_urb+0x55/0x130
>     usb_control_msg+0xc9/0x110
>     hub_ext_port_status+0x89/0x110
>     hub_event+0x1e4/0x1920

See the full quote below for details.

According to a bugzilla.redhat.com ticket linked in the report this
started to happen with 5.14. A second user there also confirmed the
problem that still seems to happen with recent 5.17 pre-releases.
Searching on lore revealed that others have hit this warning as well in
the past few months:

https://lore.kernel.org/all/65271fd1-1c1c-f2ad-9b0f-60174e791eaa@linux.intel.com/
https://lore.kernel.org/all/5cd6fc87-0f8d-0b9b-42be-8180540a94e7@gmail.com/
https://lore.kernel.org/all/CABXGCsP-JJAUh-SV0u1ALu5+awxcWT3BsQ=FzKrqb+9HOk2EUA@mail.gmail.com/

Does anyone have a idea what might be wrong here? If not, I guess we
should ask one of the reporters to bisect this.

Anyway, to ensure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced v5.13..v5.14
#regzbot title dma or usb: warning "DMA-API: xhci_hcd ... cacheline
tracking EEXIST, overlapping mappings aren't supported"
#regzbot from: Andrew  <travneff@gmail.com>
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215740
#regzbot https://bugzilla.redhat.com/show_bug.cgi?id=2021242
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

On 24.03.22 16:18, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215740
> 
>             Bug ID: 215740
>            Summary: kernel warning: DMA-API: xhci_hcd: cacheline tracking
>                     EEXIST, overlapping mappings aren't supported
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 5.17
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: USB
>           Assignee: drivers_usb@kernel-bugs.kernel.org
>           Reporter: travneff@gmail.com
>         Regression: No
> 
> Created attachment 300612 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=300612&action=edit
> full dmesg log with the warning
> 
> I have this kernel trace often while PC booting:
> 
>     DMA-API: xhci_hcd 0000:09:00.3: cacheline tracking EEXIST, overlapping
> mappings aren't supported
>     WARNING: CPU: 4 PID: 69 at kernel/dma/debug.c:570
>     add_dma_entry+0x1c7/0x250
>     Modules linked in: bpf_preload
>     CPU: 4 PID: 69 Comm: kworker/4:1 Not tainted 5.17.0 #1
>     Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS, BIOS 2423
> 08/10/2021
>     Workqueue: usb_hub_wq hub_event
>     RIP: 0010:add_dma_entry+0x1c7/0x250
>     Code: ff 0f 84 97 00 00 00 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 4b 53
>     80
> 00 48 89 c6 48 89 da 48 c7 c7 58 b3 84 94 e8 27 21 ca 00 <0f> 0b 48 85 ed 0f
> 85
> 69 d2 ca 00 8b 05 88 a0 30 02 85 c0 0f 85 f4
>     RSP: 0018:ffff9cadc0c9ba88 EFLAGS: 00010292
>     RAX: 0000000000000060 RBX: ffff9173d2c49ea0 RCX: 0000000000000000
>     RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff
>     RBP: ffff9173d11a5e80 R08: 0000000000000000 R09: ffff9cadc0c9b8c0
>     R10: ffff9cadc0c9b8b8 R11: ffffffff95165128 R12: 0000000000000000
>     R13: 0000000000000001 R14: 0000000000000206 R15: 00000000046882ee
>     FS:  0000000000000000(0000) GS:ffff9182a8200000(0000)
> knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 00007fce5d52a130 CR3: 0000000119b28000 CR4: 0000000000350ee0
>     Call Trace:
>     <TASK>
>     dma_map_page_attrs+0x82/0x250
>     ? _raw_spin_unlock_irqrestore+0x30/0x60
>     usb_hcd_map_urb_for_dma+0x3eb/0x4f0
>     usb_hcd_submit_urb+0x9b/0xb90
>     ? lockdep_init_map_type+0x62/0x260
>     usb_start_wait_urb+0x55/0x130
>     usb_control_msg+0xc9/0x110
>     hub_ext_port_status+0x89/0x110
>     hub_event+0x1e4/0x1920
>     ? lock_acquire+0xe0/0x2d0
>     ? lock_is_held_type+0xea/0x140
>     process_one_work+0x29b/0x5f0
>     worker_thread+0x4f/0x390
>     ? process_one_work+0x5f0/0x5f0
>     kthread+0xf5/0x120
>     ? kthread_complete_and_exit+0x20/0x20
>     ret_from_fork+0x22/0x30
>     </TASK>
> 
> My usb devices:
> 
> $ lsusb -t
>     /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
>         |__ Port 4: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
>             |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M
>                 |__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=uas,
>                 5000M
>                 |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 5000M
>                     |__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/4p, 5000M
>     /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M
>         |__ Port 2: Dev 2, If 3, Class=Application Specific Interface,
>         Driver=,
> 12M
>         |__ Port 2: Dev 2, If 1, Class=Vendor Specific Class, Driver=btusb,
>         12M
>         |__ Port 2: Dev 2, If 2, Class=Vendor Specific Class, Driver=btusb,
>         12M
>         |__ Port 2: Dev 2, If 0, Class=Vendor Specific Class, Driver=btusb,
>         12M
>         |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
>             |__ Port 1: Dev 12, If 0, Class=Communications, Driver=cdc_ether,
> 480M
>             |__ Port 1: Dev 12, If 1, Class=CDC Data, Driver=cdc_ether, 480M
>             |__ Port 4: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
>                 |__ Port 4: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
>                     |__ Port 3: Dev 6, If 1, Class=Human Interface Device,
> Driver=usbhid, 1.5M
>                     |__ Port 3: Dev 6, If 0, Class=Human Interface Device,
> Driver=usbhid, 1.5M
>                     |__ Port 4: Dev 7, If 0, Class=Hub, Driver=hub/4p, 480M
>                         |__ Port 2: Dev 8, If 0, Class=Human Interface
>                         Device,
> Driver=usbfs, 1.5M
>                         |__ Port 3: Dev 9, If 0, Class=Vendor Specific Class,
> Driver=, 480M
>                         |__ Port 4: Dev 10, If 0, Class=Imaging, Driver=,
>                         480M
>     /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
>     /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 480M
>         |__ Port 6: Dev 2, If 2, Class=Human Interface Device, Driver=usbhid,
> 12M
>         |__ Port 6: Dev 2, If 0, Class=Vendor Specific Class, Driver=, 12M
>         |__ Port 7: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
> 
> $ lsusb
>     Bus 004 Device 006: ID 2109:0812 VIA Labs, Inc. VL812 Hub
>     Bus 004 Device 005: ID 2109:0812 VIA Labs, Inc. VL812 Hub
>     Bus 004 Device 004: ID 0578:0578 Intrinsix Corp. KingSpec
>     Bus 004 Device 003: ID 2109:0812 VIA Labs, Inc. VL812 Hub
>     Bus 004 Device 002: ID 2109:0812 VIA Labs, Inc. VL812 Hub
>     Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
>     Bus 003 Device 010: ID 1004:633e LG Electronics, Inc. LM-X420xxx/G2/G3
> Android Phone (MTP/download mode)
>     Bus 003 Device 009: ID 0bda:8812 Realtek Semiconductor Corp. RTL8812AU
> 802.11a/b/g/n/ac 2T2R DB WLAN Adapter
>     Bus 003 Device 008: ID 0463:ffff MGE UPS Systems UPS
>     Bus 003 Device 007: ID 2109:2812 VIA Labs, Inc. VL812 Hub
>     Bus 003 Device 006: ID 046d:c30e Logitech, Inc. UltraX Keyboard (Y-BL49)
>     Bus 003 Device 005: ID 2109:2812 VIA Labs, Inc. VL812 Hub
>     Bus 003 Device 004: ID 2109:2812 VIA Labs, Inc. VL812 Hub
>     Bus 003 Device 012: ID 0525:a4a2 Netchip Technology, Inc. Linux-USB
> Ethernet/RNDIS Gadget
>     Bus 003 Device 003: ID 2109:2812 VIA Labs, Inc. VL812 Hub
>     Bus 003 Device 002: ID 0b05:17cb ASUSTek Computer, Inc. Broadcom
>     BCM20702A0
> Bluetooth
>     Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>     Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
>     Bus 001 Device 003: ID 05e3:0610 Genesys Logic, Inc. Hub
>     Bus 001 Device 002: ID 0b05:1939 ASUSTek Computer, Inc. AURA LED
>     Controller
>     Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> 
> Kernel is built from
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> at v5.17 tag (f443e374ae131c168a065ea1748feac6b2e76613)
> 
> Also reproduces for my distro kernel:
> https://bugzilla.redhat.com/show_bug.cgi?id=2021242
>
Comment 4 Mathias Nyman 2022-03-31 07:54:27 UTC
That warning was added in v5.14:
2b4bbc6231d7 dma-debug: report -EEXIST errors in add_dma_entry

Actual issue on usb side could be a lot older.
Comment 5 Christoph Hellwig 2022-04-05 05:19:25 UTC
On Thu, Mar 31, 2022 at 09:41:25AM +0200, Thorsten Leemhuis wrote:
> Hey Christoph, Marek, Greg!
> 
> Below warning was reported to bugzilla.kernel.org more than a week ago;
> to quote the important parts here:

The warning is from a new debug check added in 5.14 under a debug
config option.  So while the USB maintainers should look into fixing it
I don't think it is a regression in that traditional sense of the word.
Comment 6 Lukas Wunner 2022-05-04 08:10:05 UTC
Ferry Toth reports that the warning only occurs when connecting a LAN9514 USB Ethernet (drivers/net/usb/smsc95xx.c). Other USB devices (memory sticks, WiFi, Bluetooth) do not trigger the warning:

https://lore.kernel.org/netdev/2a436486-a54d-a9b3-d839-232a38653af3@gmail.com/

Curiously, the above-linked other reports seem to indicate that the warning occurs when connecting USB hubs. It's unclear what those have in common with the LAN9514.
Comment 7 Ferry Toth 2022-05-04 10:17:29 UTC
In my case it is EVM9514, a 4p USB hub + 10/100 ethernet:
https://www.microchip.com/en-us/development-tool/EVB9514

So, indeed it seems like to warning originates from hub code shared between at least VL812 and LAN9514.
Comment 8 Ferry Toth 2023-10-10 20:19:10 UTC
I also heard from Andy Shevchenko that he had the same error using even another hub. It seems the error is not particular to a certain hub, but doesn't appear on root hubs. 

And indeed when I plug my hub into Ubuntu the error doesn't occur because "debug check added in 5.14 under a debug config option" (Comment 5 above) is not enabled in Ubuntu kernels.
Comment 9 Ferry Toth 2023-11-12 20:46:08 UTC
Ok, it's different then in #6 above.

smsc95xx behaves as a 5p hub, with an usb to ethernet adapter plugged into 1 port. So, 4 ports remain available to the user. The bug itself is likely located in hub.c.

I can reproduce this with an ordinary 7p hub (EVB USB2517). Plugging the hub causes no issues. Then plugging an usb device (in my case I just plugged a mouse) into a port triggers the "cacheline tracking EEXIST".

The warning occurs only once (regardless unplugging the device or the hub).

Note that on Ubuntu 23.10 the warning does not occur, but they have most debug options not set. So, likely this is an old bug that has gone undetected for a long while.
Comment 10 Ferry Toth 2023-11-13 22:11:50 UTC
Seems might be caused from init3: usb_submit_urb() in hub_active()> I don't see why. May be Alan knows?
Comment 11 Alan Stern 2023-11-14 03:25:04 UTC
I don't think it has anything to do with the init3: section.  The stack trace shows that the problem stems from hub_ext_port_status(), probably called by port_event() within hub_event().  And the place where hub_ext_port_status() calls usb_control_msg() indirectly is protected by a mutex!

The I/O buffer being mapped multiple times is hub->status->port, and it doesn't get used anywhere else in the hub driver AFAICS.  (There is a comment about the phy driver also using that buffer; I don't know what it refers to.)

[It would be easier to track this down if the DMA subsystem would cause these erroneous mappings to fail instead of pretending that they have succeeded.  Grumble...]

I'm stumped.  I'll try to find time to reproduce it on my own system.  Probably would have noticed it before now if I had tried connecting an external hub recently, but that just hasn't come up.
Comment 12 Ferry Toth 2023-11-15 20:24:51 UTC
Ah, ok I was looking at a stack trace from a more recent kernel. So, here to add some more data points (2 different kernels):

This is with EVB LAN9514 plugged in while booting:

: ------------[ cut here ]------------
: DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported
: WARNING: CPU: 1 PID: 32 at kernel/dma/debug.c:602 add_dma_entry+0x1eb/0x2b0
: Modules linked in: mmc_block extcon_intel_mrfld sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs libcrc32c xor zlib_deflate zstd_compress raid6_pq
: CPU: 1 PID: 32 Comm: kworker/1:1 Not tainted 6.6.0-next-20231106-edison-acpi-standard+ #1
: Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
: Workqueue: usb_hub_wq hub_event
: RIP: 0010:add_dma_entry+0x1eb/0x2b0
: Code: 0f 84 cc 00 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 48 1f 68 00 48 89 c6 4c 89 e2 48 c7 c7 80 04 3d a4 e8 76 fb f5 ff 90 <0f> 0b 90 90 48 85 ed 74 1a 48 c7 c7 68 84 35 a4 e8 f0 0c fd ff 8b
: RSP: 0000:ffffa5bb0021fab0 EFLAGS: 00010286
: RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000ffffdfff
: RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001
: RBP: ffff9f50c122da80 R08: ffffffffa4738d08 R09: 00000000ffffdfff
: R10: ffffffffa4658d20 R11: ffffffffa4708d20 R12: ffff9f50c2879550
: R13: 0000000000000001 R14: 0000000000000202 R15: 00000000002047f8
: FS:  0000000000000000(0000) GS:ffff9f50fe300000(0000) knlGS:0000000000000000
: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
: CR2: 00005647b6a03c0f CR3: 0000000009ce0000 CR4: 00000000001006f0
: Call Trace:
:  <TASK>
:  ? __warn+0x7e/0x130
:  ? add_dma_entry+0x1eb/0x2b0
:  ? report_bug+0x18e/0x1a0
:  ? handle_bug+0x3d/0x70
:  ? exc_invalid_op+0x18/0x70
:  ? asm_exc_invalid_op+0x1a/0x20
:  ? add_dma_entry+0x1eb/0x2b0
:  dma_map_page_attrs+0x105/0x280
:  ? __virt_addr_valid+0xe7/0x130
:  usb_hcd_map_urb_for_dma+0x3c2/0x500
:  usb_hcd_submit_urb+0x9b/0xb70
:  ? kmem_cache_alloc+0x20d/0x230
:  ? usb_hcd_link_urb_to_ep+0x72/0xa0
:  ? prepare_transfer+0x104/0x140
:  usb_start_wait_urb+0x60/0x160
:  usb_control_msg+0xe0/0x150
:  hub_ext_port_status+0x85/0x1b0
:  hub_event+0x1f7/0x1ba0
:  ? hub_activate+0x59b/0x8f0
:  process_one_work+0x136/0x2f0
:  ? __pfx_worker_thread+0x10/0x10
:  worker_thread+0x2ed/0x400
:  ? __pfx_worker_thread+0x10/0x10
:  kthread+0xd3/0x100
:  ? __pfx_kthread+0x10/0x10
:  ret_from_fork+0x2f/0x50
:  ? __pfx_kthread+0x10/0x10
:  ret_from_fork_asm+0x1a/0x30
:  </TASK>
: ---[ end trace 0000000000000000 ]---
: DMA-API: Mapped at:
:  debug_dma_map_page+0x68/0x100
:  dma_map_page_attrs+0x105/0x280
:  usb_hcd_map_urb_for_dma+0x3c2/0x500
:  usb_hcd_submit_urb+0x9b/0xb70
:  usb_start_wait_urb+0x60/0x160

This is after boot, plugging the EVM-USB2517, then plugging a mouse:

------------[ cut here ]------------
DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported
WARNING: CPU: 0 PID: 587 at kernel/dma/debug.c:568 add_dma_entry+0x1ed/0x280
Modules linked in: rfcomm bnep iptable_nat snd_sof_nocodec spi_pxa2xx_platform dw_dmac snd_sof_pci_intel_tng snd_sof_pci snd_sof_intel_atom pwm_lpss_pci pwm_lpss>
CPU: 0 PID: 587 Comm: kworker/0:3 Not tainted 6.1.55-edison-acpi-standard #1
Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
Workqueue: usb_hub_wq hub_event
RIP: 0010:add_dma_entry+0x1ed/0x280
Code: ff 0f 84 97 00 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 f5 a6 5a 00 48 89 c6 4c 89 e2 48 c7 c7 10 ed 62 b8 e8 e3 ce f7 ff <0f> 0b 48 85 ed 0f 85 4c d3 >
RSP: 0018:ffffa3400066fae0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffffb8634309 RDI: 00000000ffffffff
RBP: ffff92a981295f00 R08: ffffffffb8937828 R09: 00000000ffffdfff
R10: ffffffffb8857840 R11: ffffffffb8907840 R12: ffff92a982ca37a0
R13: 0000000000000001 R14: 0000000000000206 R15: 00000000000b716f
FS:  0000000000000000(0000) GS:ffff92a9be200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055bed6f3b078 CR3: 000000000e00c000 CR4: 00000000001006f0
Call Trace:
 <TASK>
 ? __warn+0x74/0xc0
 ? add_dma_entry+0x1ed/0x280
 ? report_bug+0xee/0x170
 ? prb_read_valid+0x12/0x20
 ? handle_bug+0x3c/0x70
 ? exc_invalid_op+0x14/0x70
 ? asm_exc_invalid_op+0x16/0x20
 ? add_dma_entry+0x1ed/0x280
 dma_map_page_attrs+0x101/0x270
 ? preempt_count_add+0x6b/0xa0
 ? preempt_count_add+0x6b/0xa0
 usb_hcd_map_urb_for_dma+0x410/0x570
 usb_hcd_submit_urb+0x97/0xb80
 ? hub_activate+0x2db/0x890
 ? hub_resume+0x2e/0xf0
 usb_start_wait_urb+0x60/0x160
 usb_control_msg+0xdc/0x140
 hub_ext_port_status+0x82/0x110
 hub_event+0x442/0x1840
 ? _raw_spin_unlock+0xd/0x30
 ? finish_task_switch.isra.0+0x89/0x260
 process_one_work+0x1d7/0x3a0
 worker_thread+0x48/0x3c0
 ? process_one_work+0x3a0/0x3a0
 kthread+0xe3/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30
 </TASK>
---[ end trace 0000000000000000 ]---
DMA-API: Mapped at:
 debug_dma_map_page+0x64/0x100
 dma_map_page_attrs+0x101/0x270
 usb_hcd_map_urb_for_dma+0x410/0x570
 usb_hcd_submit_urb+0x97/0xb80
 usb_start_wait_urb+0x60/0x160
Comment 13 Alan Stern 2023-11-16 02:19:20 UTC
Those two stack traces are essentially the same as the first one reported.  They all contain the following key sequence of function calls:

  dma_map_page_attrs
  usb_hcd_map_urb_for_dma
  usb_hcd_submit_urb
  usb_start_wait_urb
  usb_control_msg
  hub_ext_port_status
  hub_event
  process_one_work

It doesn't look like they add anything new.
Comment 14 Ferry Toth 2023-11-17 22:13:13 UTC
I see. I was misled by ? hub_activate, and  hub_activate calling usb_submit_urb.

In any case the traces then show the problem is likely reproducible using any hub.
Comment 15 Alan Stern 2023-11-19 16:47:50 UTC
Created attachment 305425 [details]
Diagnostics for overlapped DMA mapping

I just tried booting a kernel with CONFIG_DMA_API_DEBUG enabled.  When I plugged in a hub and then plugged some devices into that hub, nothing went wrong.

So since I can't do any useful testing here, maybe one of you can test the attached patch.  It won't fix anything but it should help confirm what's happening and provide a starting point for further debugging.
Comment 16 Ferry Toth 2023-11-19 21:12:38 UTC
Created attachment 305426 [details]
ftrace, hub plugged at 131, mouse plugged into hub at 141

Sure. And added a ftrace see attachment trace_hub.txt.
 
In journal I got:
...
Nov 19 20:57:45 yuna kernel: ---[ end trace 0000000000000000 ]---
Nov 19 20:57:45 yuna kernel: DMA-API: Mapped at:
Nov 19 20:57:45 yuna kernel:  debug_dma_map_page+0x6d/0x100
Nov 19 20:57:45 yuna kernel:  dma_map_page_attrs+0x10a/0x280
Nov 19 20:57:45 yuna kernel:  usb_hcd_map_urb_for_dma+0x419/0x5c0
Nov 19 20:57:45 yuna kernel:  usb_hcd_submit_urb+0xa0/0xaa0
Nov 19 20:57:45 yuna kernel:  usb_start_wait_urb+0x65/0x100
Nov 19 20:57:45 yuna kernel: Map error: ffffd221000e02c0 ffff916fc336b4a0
Nov 19 20:57:45 yuna kernel: Map error: ffffd221000e02c0 ffff916fc336b4a0
Nov 19 20:57:45 yuna kernel: usb 1-1.1: new low-speed USB device number 3 using xhci-hcd
Nov 19 20:57:45 yuna kernel: Map error: ffffd221000e02c0 ffff916fc336b4a0
...
Comment 17 Alan Stern 2023-11-20 19:43:37 UTC
Created attachment 305444 [details]
Revised diagnostics

The ftrace doesn't help.  You can't tell where the error occurs or which hub the function calls are for.

The patch wasn't quite right, so here is an updated version.  Be sure to include the "cacheline tracking EEXIST, overlapping mappings aren't supported" line when you post the dmesg output.
Comment 18 Ferry Toth 2023-11-21 19:31:58 UTC
Hi Alan, would you know why your patches won't apply using git to v6.6? I fixed up manually. Results of Revised diagnostics attachment 305444 [details] below:

------------[ cut here ]------------
DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported, len 4
WARNING: CPU: 0 PID: 7 at kernel/dma/debug.c:607 add_dma_entry+0x207/0x2c0
Modules linked in: rfcomm bnep iptable_nat snd_sof_nocodec spi_pxa2xx_platform dw_dmac brcmfmac_wcc snd_sof_pci_intel_tng snd_sof_pci snd_sof_intel_atom pwm_lpss>
CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted 6.6.0-edison-acpi-standard+ #1
Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
Workqueue: usb_hub_wq hub_event
RIP: 0010:add_dma_entry+0x207/0x2c0
Code: c8 00 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 fe a3 6a 00 48 89 c6 4c 89 e9 4c 89 e2 48 c7 c7 a8 1d 9a 99 e8 69 ac f5 ff <0f> 0b 48 85 db 74 1a 48 c7 >
RSP: 0018:ffffa4de00043ab0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff9c5f01376e80 RCX: 0000000000000027
RDX: ffff9c5f3e21c5c8 RSI: 0000000000000001 RDI: ffff9c5f3e21c5c0
RBP: 00000000ffffffff R08: ffffffff99d431e8 R09: 00000000ffffdfff
R10: ffffffff99c63200 R11: ffffffff99d13200 R12: ffff9c5f012375f0
R13: 0000000000000004 R14: 0000000000000206 R15: 00000000001eb8d2
FS:  0000000000000000(0000) GS:ffff9c5f3e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f57691956b0 CR3: 000000003682e000 CR4: 00000000001006f0
Call Trace:
 <TASK>
 ? __warn+0x84/0x140
 ? add_dma_entry+0x207/0x2c0
 ? report_bug+0x199/0x1b0
 ? handle_bug+0x3c/0x70
 ? exc_invalid_op+0x18/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? add_dma_entry+0x207/0x2c0
 dma_map_page_attrs+0x10a/0x280
 usb_hcd_map_urb_for_dma+0x46f/0x650
 usb_hcd_submit_urb+0xa0/0xaa0
 ? kick_hub_wq+0x69/0xf0
 ? hub_activate+0x2c3/0x890
 ? usb_disable_remote_wakeup+0x34/0x70
 usb_start_wait_urb+0x65/0x100
 usb_control_msg+0xe5/0x150
 hub_ext_port_status+0x8a/0x1b0
 hub_event+0x1d1/0x18e0
 ? trace_hardirqs_on+0x21/0x90
 process_one_work+0x139/0x2f0
 worker_thread+0x2f4/0x400
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe6/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>
---[ end trace 0000000000000000 ]---
DMA-API: Mapped at:
 debug_dma_map_page+0x6d/0x100
 dma_map_page_attrs+0x10a/0x280
 usb_hcd_map_urb_for_dma+0x46f/0x650
 usb_hcd_submit_urb+0xa0/0xaa0
 usb_start_wait_urb+0x65/0x100
Map error: 128857248 128857248 len 4
Map error: 128857248 128857248 len 4
usb 1-1.1: new low-speed USB device number 3 using xhci-hcd
Map error: 128857248 128857248 len 4
usb 1-1.1: New USB device found, idVendor=046d, idProduct=c03e, bcdDevice=20.00
usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1.1: Product: USB-PS/2 Optical Mouse
usb 1-1.1: Manufacturer: Logitech
input: Logitech USB-PS/2 Optical Mouse as /devices/pci0000:00/0000:00:11.0/dwc3.0.auto/xhci-hcd.1.auto/usb1/1-1/1-1.1/1-1.1:1.0/0003:046D:C03E.0001/input/input1
hid-generic 0003:046D:C03E.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-xhci-hcd.1.auto-1.1/input0
Comment 19 Alan Stern 2023-11-21 20:51:41 UTC
Created attachment 305456 [details]
Diagnostic to find DMA overlap

The patches are prepared against a 6.6-rc3 tree (it hasn't been updated since then), so they might not match 6.6 exactly.

At least we're getting some good data from the patch:

> DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings
> aren't supported, len 4
...
> Map error: 128857248 128857248 len 4

The fact that the addresses and lengths agree indicate that the patch works right, and len = 4 means that the mapping error is indeed for the buffer containing the hub status data.

By the way, do you have any USB hubs besides this one on your system?

The next question is what other mapping already exists for that buffer?  This next patch may be able to tell us.  It's supposed to produce a second stack dump when the overlap occurs, for the call that created the first mapping.
Comment 20 Ferry Toth 2023-11-23 21:27:06 UTC
It's not a conflict or so, I am getting (using qgit or cli):
git am --signoff /home/ferry/Downloads/dma-dbg.patch
Patch format detection failed.

when I add --patch-format=mbox I get:
\Applying: @@ -, +, @@
fatal: empty ident name (for <>) not allowed

Never had this before.

I am testing on Intel Merrifield (Edison Arduino):

root@yuna:~# lsusb -t 
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/7p, 480M
        |__ Port 1: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M

But root_hub 5000M is not available due to no phy.

With Diagnostic to find DMA overlap attachment 305456 [details]

------------[ cut here ]------------
DMA-API: dwc3 dwc3.0.auto: cacheline tracking EEXIST, overlapping mappings aren't supported
WARNING: CPU: 0 PID: 105 at kernel/dma/debug.c:608 add_dma_entry+0x1fc/0x370
Modules linked in: rfcomm iptable_nat bnep snd_sof_nocodec spi_pxa2xx_platform dw_dmac brcmfmac_wcc pwm_lpss_pci pwm_lpss snd_sof_pci_intel_tng intel_mrfld_pwrbtn snd_sof_pci snd_sof_intel_atom intel_mrfld_adc snd_sof dw_dmac_pci snd_sof_utils snd_sof_xtensa_dsp dw_dmac_core snd_intel_dspcfg snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil hci_uart leds_gpio btbcm ti_ads7950 tun ledtrig_timer ledtrig_heartbeat mmc_block extcon_intel_mrfld sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs libcrc32c xor zlib_deflate zstd_compress raid6_pq
CPU: 0 PID: 105 Comm: kworker/0:3 Not tainted 6.6.0-edison-acpi-standard+ #1
Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
Workqueue: usb_hub_wq hub_event
RIP: 0010:add_dma_entry+0x1fc/0x370
Code: ff 0f 84 80 01 00 00 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 e6 9d 6a 00 48 89 c6 4c 89 e2 48 c7 c7 10 21 5a b1 e8 94 a5 f5 ff <0f> 0b 48 85 db 74 1a 48 c7 c7 44 18 53 b1 e8 01 f9 fc ff 8b 73 4c
RSP: 0000:ffffb63c403fbab0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff908001374f00 RCX: 0000000000000027
RDX: ffff90803e21c5c8 RSI: 0000000000000001 RDI: ffff90803e21c5c0
RBP: 000000000019d4b5 R08: ffffffffb19431e8 R09: 00000000ffffdfff
R10: ffffffffb1863200 R11: ffffffffb1913200 R12: ffff908002124860
R13: 0000000000000001 R14: 0000000000000202 R15: 000000000019d4b5
FS:  0000000000000000(0000) GS:ffff90803e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd58c003000 CR3: 0000000008096000 CR4: 00000000001006f0
Call Trace:
 <TASK>
 ? __warn+0x84/0x140
 ? add_dma_entry+0x1fc/0x370
 ? report_bug+0x199/0x1b0
 ? handle_bug+0x3c/0x70
 ? exc_invalid_op+0x18/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? add_dma_entry+0x1fc/0x370
 dma_map_page_attrs+0x10a/0x280
 usb_hcd_map_urb_for_dma+0x46f/0x650
 usb_hcd_submit_urb+0xa0/0xaa0
 ? kick_hub_wq+0x69/0xf0
 ? hub_activate+0x2c3/0x890
 ? usb_disable_remote_wakeup+0x34/0x70
 usb_start_wait_urb+0x65/0x100
 usb_control_msg+0xe5/0x150
 hub_ext_port_status+0x8a/0x1b0
 hub_event+0x1d1/0x18e0
 process_one_work+0x139/0x2f0
 ? __pfx_worker_thread+0x10/0x10
 worker_thread+0x2f4/0x400
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe6/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>
---[ end trace 0000000000000000 ]---
DMA-API: Mapped at:
 debug_dma_map_page+0x6d/0x100
 dma_map_page_attrs+0x10a/0x280
 usb_hcd_map_urb_for_dma+0x46f/0x650
 usb_hcd_submit_urb+0xa0/0xaa0
 usb_start_wait_urb+0x65/0x100
DMA-API: Overlapped entry: 108342600 size 1
DMA-API: Mapped at:
 debug_dma_map_page+0x6d/0x100
 dma_map_page_attrs+0x10a/0x280
 usb_hcd_map_urb_for_dma+0x46f/0x650
 usb_hcd_submit_urb+0xa0/0xaa0
 hub_activate+0x29c/0x890
Map error: 108342608 108342608 len 4
Map error: 108342608 108342608 len 4
usb 1-1.1: new low-speed USB device number 3 using xhci-hcd
Map error: 108342608 108342608 len 4
usb 1-1.1: New USB device found, idVendor=046d, idProduct=c03e, bcdDevice=20.00
usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1.1: Product: USB-PS/2 Optical Mouse
usb 1-1.1: Manufacturer: Logitech
input: Logitech USB-PS/2 Optical Mouse as /devices/pci0000:00/0000:00:11.0/dwc3.0.auto/xhci-hcd.1.auto/usb1/1-1/1-1.1/1-1.1:1.0/0003:046D:C03E.0001/input/input1
hid-generic 0003:046D:C03E.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-xhci-hcd.1.auto-1.1/input0

I also notice now on unplugging the mouse:

Map error: 108342608 108342608 len 4
usb 1-1.1: USB disconnect, device number 3
Map error: 108342608 108342608 len 4
Map error: 108342608 108342608 len 4
Map error: 108342608 108342608 len 4
Map error: 108342608 108342608 len 4
Map error: 108342608 108342608 len 4
Comment 21 Andy Shevchenko 2023-11-24 12:02:06 UTC
Looks like read mapping is bigger than 8, I think it might have a cache line alignment requirement. But I have knowledge of USB core code, I believe Alan knows much better what's going on there.

Btw, I don't see USB using https://elixir.bootlin.com/linux/v6.7-rc2/C/ident/dma_get_cache_alignment API (except the only DWC2 case).
Comment 22 Andy Shevchenko 2023-11-24 12:07:43 UTC
Also there is dma_kmalloc_size_aligned() which might help.
Comment 23 Alan Stern 2023-11-24 15:27:14 UTC
+Christoph Hellwig (because I am not an expert on kmalloc and cache line alignment)

Ferry: Sorry about the malformed patches.  The bad line was added automatically by Quilt (I don't know why).  If I add any more patches, I'll try to remember to remove that bogus line first.

Andy: There are two mappings in question here, both of them are reads, neither is larger than 8 bytes, and neither has a cache line alignment requirement (other than the general restriction that on architectures with non-cache-coherent DMA -- i.e., not x86 -- different allocations should belong to different cache lines).

The first mapping is 1 byte long (at the start of the 8-byte-long hub->buffer allocated near the beginning of hub_configure() in drivers/usb/core/hub.c) and the second is 4 bytes long (for hub->status, allocated just afterward).  The physical addresses and sizes of the mappings, as shown in the log, are:

  hub->buffer: 108342600 size 1
  hub->status: 108342608 size 4

Obviously they lie in the same cache line, which is what triggers the DMA debugging error.  But equally obviously, on x86 or other architectures with cache-coherent DMA, there is nothing wrong with them.

Incidentally,
Comment 24 Alan Stern 2023-11-24 15:30:16 UTC
Sorry, submitted the update too soon.  I meant to add:

Incidentally, the check for overlapping DMA mappings does not take into account the possibility that a mapping can extend over several cache lines.  Maybe this wasn't deemed important enough to check for.

But in any case, it seems clear that on x86 there shouldn't be anything wrong with allowing several different mappings to refer to the same cache line.
Comment 25 Andy Shevchenko 2023-11-24 15:40:29 UTC
I'm not sure I understand 'u8 (*buffer)[8]' in the data struct, sounds like 8 pointers, so the side has to be 32/64 bytes. As for the next field, status, it's a pointer and again 4/8 bytes. I believe we are talking about x86_64 case, so I fail to see how 1 and 4 map to the struct usb_hub fields. Is the len in your debug patches is in "long"s units? Then it makes sense if the status is mapped first (1 long) followed by 4 (out of 8) URB buffer pointers.
Comment 26 Alan Stern 2023-11-24 15:50:39 UTC
"u8 (*buffer)[8]" means that buffer is a pointer to an array of 8 u8's, NOT an array of 8 pointers to u8.  The latter would be written as "u8 *(buffer[8])" or "u8 *buffer[8]".  C's syntax is notoriously difficult to understand when talking about arrays of pointers and pointers to arrays; you have to parse the declaration from the inside out and you may need to know that postfix "[]" has higher binding precedence than prefix "*".

1 and 4 don't map to anything; they are the _lengths_ of the DMA mappings.  That is, the first mapping is 1 byte long and the second mapping is 4 bytes long: These are lengths in bytes, not in longs.
Comment 27 Andy Shevchenko 2023-11-24 16:09:38 UTC
Yeah, I got it, so the problematic one (that triggers a warning) is this line

  ret = get_port_status(hub->hdev, port1, &hub->status->port, type, len);

where port is 8 bytes, but len = 4 as we asked for non-extended status.

Which line maps the DMA? 

```
} else {
	urb->transfer_dma = dma_map_single(
			hcd->self.sysdev,
			urb->transfer_buffer,
			urb->transfer_buffer_length,
			dir);
```
This one?
Comment 28 Alan Stern 2023-11-24 16:21:09 UTC
Yes, that one.

The issue of "which mapping is the problematic one" is more or less irrelevant. The warning is issued when the second mapping is created.  The order of the two mappings doesn't matter, just the fact that they refer to the same cache line.
Comment 29 Ferry Toth 2023-11-24 21:05:30 UTC
(In reply to Andy Shevchenko from comment #25)
> status, it's a pointer and again 4/8 bytes. I believe we are talking about
> x86_64 case, so I fail to see how 1 and 4 map to the struct usb_hub fields.

Yes. But I just checked and the same happens with i686 kernel (on same platform).
Comment 30 Ferry Toth 2023-11-24 22:22:40 UTC
So I checked and:
root@yuna:~# cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
64

So I just simply added to your debug patches:
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 0ff47eeffb49..ba5864b23aed 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1420,7 +1420,7 @@ static int hub_configure(struct usb_hub *hub,
 	unsigned full_load;
 	unsigned maxchild;
 
-	hub->buffer = kmalloc(sizeof(*hub->buffer), GFP_KERNEL);
+	hub->buffer = kmalloc(64, GFP_KERNEL);
 	if (!hub->buffer) {
 		ret = -ENOMEM;
 		goto fail;

and the problem disappears. 
But then, why Comment 15 can Alan not reproduce this?
Comment 31 Alan Stern 2023-11-25 02:03:17 UTC
I don't know why my local test didn't reproduce the problem -- and I haven't tried to find out what's happening.  My best guess is that the allocations of hub->buffer and hub->status end up (by random chance) being non-adjacent, so they aren't in the same cache line.  But I haven't even run the test patches attached to this bug report.

Regardless, changing the allocation size to 64 merely hides the problem.  The real problem is that the DMA debugging code is wrong and needs to be fixed.  It should not complain about two different mappings of the same cache line on x86, unless the mappings actually overlap.
Comment 32 Ferry Toth 2023-11-25 20:50:57 UTC
By changing the allocation I just wanted to close the loop and prove this is the trigger. 

And maybe on other platforms than x86 an actual issue? What do you think of Andy's suggestion to use dma_kmalloc_size_aligned()?

I still find it strange that the issue does not reproduce for you. I have built each kernel since this issue was introduced, with multiple compiler versions, x86_64/i686 and it happens 100% of the cases. Could there be another related CONFIG?
Comment 33 Alan Stern 2023-11-25 21:17:07 UTC
It should not be necessary to use dma_kmalloc_size_aligned().  These allocations don't need any special alignment; they just need to be in different cache lines on some architectures.  Isn't this what ARCH_DMA_MINALIGN and ARCH_KMALLOC_MINALIGN are supposed to take care of (see include/linux/slab.h)?

I'll try doing a little more testing to find out why the issue does not show up on my computer.
Comment 34 Alan Stern 2023-11-25 22:15:18 UTC
Created attachment 305473 [details]
Revised (again) diagnostics

I revised the test patch slightly.  Now the addresses are printed in hex, as they should have been all along.  Also, there's a line to print out the physical addresses pointed to by hub->buffer and hub->status when a new hub is configured.  Here's the log for plugging in a hub, then plugging a mouse into the hub:

[   32.757399] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[   32.886200] usb 1-1: New USB device found, idVendor=05e3, idProduct=0608, bcdDevice= 9.01
[   32.886444] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[   32.886631] usb 1-1: Product: USB2.0 Hub
[   32.889566] hub 1-1:1.0: USB hub found
[   32.889813] usb 1-1: hub buffer at 100e71818, status at 100e71610
[   32.890499] hub 1-1:1.0: 4 ports detected
[   42.761267] usb 1-1.3: new low-speed USB device number 6 using xhci_hcd
[   42.833935] kauditd_printk_skb: 101 callbacks suppressed
[   42.844775] usb 1-1.3: New USB device found, idVendor=093a, idProduct=2510, bcdDevice= 1.00
[   42.844841] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   42.844890] usb 1-1.3: Product: USB OPTICAL MOUSE
[   42.844933] usb 1-1.3: Manufacturer: PIXART
[   42.851150] input: PIXART USB OPTICAL MOUSE as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:093A:2510.0009/input/input10
[   42.851585] hid-generic 0003:093A:2510.0009: input,hidraw8: USB HID v1.11 Mouse [PIXART USB OPTICAL MOUSE] on usb-0000:00:14.0-1.3/input0

The cache line size on this computer is 64 bytes, same as yours.  As you can see, hub->buffer and hub->status were allocated in different cache lines, so the DMA debugging routine did not detect an overlap.  That's why I didn't get any error messages.
Comment 35 Alan Stern 2023-11-28 19:04:20 UTC
So as most of the people subscribing to this bug have seen from the email discussion, it looks like the bug ultimately lies in the alignment of kmalloc() buffers and it probably will be fixed with a kernel patch.

Equally important, however, is the question of why the people experiencing this problem turned on CONFIG_DMA_API_DEBUG in the first place.  That setting is meant only for kernel testing and it drastically reduces performance.  Redhat certainly should not enable it in any of their distribution kernels (and in fact it is not set in my copy of Fedora 38).

Andrew, Ferry, any comments?
Comment 36 Ferry Toth 2023-11-28 21:28:55 UTC
Your latest patch gives me:
root@yuna:~# journalctl -k | grep "hub buffer"
kernel: usb usb1: hub buffer at 6e1cda0, status at 6e1cda8
kernel: usb usb2: hub buffer at 6e1ce20, status at 6e1ce28
kernel: usb 1-1: hub buffer at 8001380, status at 8001388

The first 2 lines are for the root hubs.

In my case on Intel Edison-Arduino the distribution is Yocto generated. The platform is especially suited to attach custom hardware, iio devices, arduino shields and the like, as you have direct access to gpios, i2c, spi, hsu. So CONFIG_DMA_API_DEBUG was enabled to help debug device drivers.

I have to admit I never realised it to be an expensive option. And certainly didn't expect to find bugs in DMA_API_DEBUG itself or kmalloc(). I'll turn that into an optional thing with a comment on possible side effects.
Comment 37 Alan Stern 2023-11-29 15:44:52 UTC
Okay.  On an Intel Edison-Arduino the slowdown won't be as dramatic as on a large system doing high-bandwidth I/O.  But you'll still want to avoid it if you can.

Have you tried testing the patch posted in the email thread?
Comment 38 Ferry Toth 2023-11-30 19:35:01 UTC
With Catalin Marinas's suggestion "force the kmalloc() min align to cache_line_size()" + attachment 305473 [details]

root@yuna:~# journalctl -k | grep hub  
kernel: usbcore: registered new interface driver hub
kernel: hub 1-0:1.0: USB hub found
kernel: usb usb1: hub buffer at 71c7180, status at 71c71c0
kernel: hub 1-0:1.0: 1 port detected
kernel: hub 2-0:1.0: USB hub found
kernel: usb usb2: hub buffer at 71c79c0, status at 71c7a00
kernel: hub 2-0:1.0: 1 port detected
kernel: hub 1-1:1.0: USB hub found
kernel: usb 1-1: hub buffer at 65b36c0, status at 6639340
kernel: hub 1-1:1.0: 7 ports detected

and stack trace goes away.
IOW the 2 root hub kmalloc() are now also aligned to the cache line size, even though these never triggered the stack trace. Strange: hub's status is aligned far away from hub buffer.

Note You need to log in before you can comment on or make changes to this bug.