Bug 200283
Summary: | PCI: Data corruption happening due to a race condition | ||
---|---|---|---|
Product: | Drivers | Reporter: | Hari Kishore Vyas (hari.vyas) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | normal | CC: | hari.vyas |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.18 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Race condition while calling pci_set_master() and pci_bus_add_device()
Proposed fix to avoid race condition while updating is_busmaster and is_added bits of pci_dev structure |
Created attachment 276879 [details]
Proposed fix to avoid race condition while updating is_busmaster and is_added bits of pci_dev structure
Proposing to use a spin lock while updating is_busmater and is_added bits of pci_dev structure.
As all bit fields sharing same memory location so other solutions could be to use separate independent fields i.s.o. bitfields in include/linux/pci.h. This will need only modification in mentioned pci.h file but at the cost of additional memory but with advantage of no locking mechanism.
|
Created attachment 276877 [details] Race condition while calling pci_set_master() and pci_bus_add_device() After removing/re-scanning PCIe nvme device multiple times, encountered "proc_dir entry already registered" message which keep on appearing on console with subsequent re-scan. #echo 1 > /sys/bus/pci/rescan [ 2260.042971] pci 0000:01:00.0: BAR 0: assigned [mem 0x400000000-0x400003fff 64bit] [ 2260.050915] proc_dir_entry '0000:01/00.0' already registered [ 2260.056765] -----------[ cut here ]----------- [ 2260.061538] WARNING: CPU: 7 PID: 1966 at /tmp/yocto_builds/11411866_nxs_1.0_ea3_stingray-bcm958742t/poky/build/tmp/work-shared/bcm958742t/kernel-source/fs/proc/generic.c:346 proc_register+0x11c/0x158 [ 2260.079830] Modules linked in: bnxt_re(O) rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm ib_core bnxt_en(O) nvmem_bcm_ocotp nvmem_core ipv6 [ 2260.092043] CPU: 7 PID: 1966 Comm: sh Tainted: G W O 4.12.0-rc7+g2d2a228 #1 [ 2260.100209] Hardware name: Stingray SST100 (BCM958742T) (DT) [ 2260.106047] task: ffff80a1626f8c00 task.stack: ffff80a160320000 [ 2260.112154] PC is at proc_register+0x11c/0x158 [ 2260.116737] LR is at proc_register+0x11c/0x158 [ 2260.121320] pc : [<ffff00000823138c>] lr : [<ffff00000823138c>] pstate: 00000145 [ 2260.128948] sp : ffff80a160323bb0 [ 2260.132365] x29: ffff80a160323bb0 x28: ffff80a1626f8c00 [ 2260.137845] x27: ffff0000087a2000 x26: 0000000000000040 [ 2260.143324] x25: ffff80a164173838 x24: ffff000008c18000 [ 2260.148804] x23: ffff80a15c524485 x22: ffff80a164173800 [ 2260.154283] x21: 0000000000000000 x20: ffff80a15c524400 [ 2260.159762] x19: ffff80a1640f4240 x18: 0000000000000010 [ 2260.165241] x17: 0000000000000000 x16: 0000000000000000 [ 2260.170721] x15: ffffffffffffffff x14: ffff000088be2d6f [ 2260.176200] x13: ffff000008b28b38 x12: ffff00000846e1b8 [ 2260.181679] x11: 0000000005f5e0ff x10: 0000000000000016 [ 2260.187158] x9 : 00000000ffffffd0 x8 : 6465726574736967 [ 2260.192638] x7 : 6572207964616572 x6 : ffff80a17ffa0d38 [ 2260.198117] x5 : ffff80a17ffa0d38 x4 : 0000000000000000 [ 2260.203596] x3 : ffff80a17ffa5958 x2 : ffff80a17ffa0d38 [ 2260.209074] x1 : 000080a177490000 x0 : 0000000000000030 [ 2260.214554] --[ end trace f7b5e622386491a4 ]-- [ 2260.219315] Call trace: [ 2260.221837] Exception stack(0xffff80a1603239e0 to 0xffff80a160323b10) [ 2260.228480] 39e0: ffff80a1640f4240 0001000000000000 ffff80a160323bb0 ffff00000823138c [ 2260.236557] 3a00: ffff0000087a2000 ffff80a1626f8c00 ffff80a160323a20 0000000000000000 [ 2260.244634] 3a20: ffff80a160323bb0 ffff80a160323bb0 ffff80a160323b70 00000000ffffffc8 [ 2260.252710] 3a40: ffff80a160323a70 ffff0000080e5810 ffff80a160323bb0 ffff80a160323bb0 [ 2260.260787] 3a60: ffff80a160323b70 00000000ffffffc8 ffff80a160323b20 ffff000008159920 [ 2260.268865] 3a80: 0000000000000030 000080a177490000 ffff80a17ffa0d38 ffff80a17ffa5958 [ 2260.276941] 3aa0: 0000000000000000 ffff80a17ffa0d38 ffff80a17ffa0d38 6572207964616572 [ 2260.285018] 3ac0: 6465726574736967 00000000ffffffd0 0000000000000016 0000000005f5e0ff [ 2260.293094] 3ae0: ffff00000846e1b8 ffff000008b28b38 ffff000088be2d6f ffffffffffffffff [ 2260.301171] 3b00: 0000000000000000 0000000000000000 [ 2260.306202] [<ffff00000823138c>] proc_register+0x11c/0x158 [ 2260.311860] [<ffff00000823166c>] proc_create_data+0x7c/0xe0 [ 2260.317610] [<ffff0000084272a8>] pci_proc_attach_device+0x78/0xf0 [ 2260.323896] [<ffff00000841565c>] pci_bus_add_device+0x34/0xa8 [ 2260.329823] [<ffff000008415704>] pci_bus_add_devices+0x34/0x80 [ 2260.335840] [<ffff00000841574c>] pci_bus_add_devices+0x7c/0x80 [ 2260.341855] [<ffff000008418528>] pci_rescan_bus+0x28/0x38 [ 2260.347423] [<ffff000008420f64>] bus_rescan_store+0x54/0x78 [ 2260.353171] [<ffff00000848f650>] bus_attr_store+0x20/0x30 [ 2260.358739] [<ffff00000823e064>] sysfs_kf_write+0x3c/0x50 [ 2260.364307] [<ffff00000823d324>] kernfs_fop_write+0x104/0x1f0 [ 2260.370235] [<ffff0000081c3ad8>] __vfs_write+0x18/0x100 [ 2260.375624] [<ffff0000081c4e7c>] vfs_write+0x9c/0x1a8 [ 2260.380833] [<ffff0000081c6490>] SyS_write+0x48/0xb0 [ 2260.385955] [<ffff000008082f8c>] __sys_trace_return+0x0/0x4 [ 2260.391807] nvme nvme0: pci function 0000:01:00.0 However, despite the errors, the device still appears to be reinitialized correctly. Once we observe this error, it will occur for ever for all next re-scan of that device. Steps to Reproduce: 1. Insert a Samsung 960 drive into the first PCIe slot. 2. Check it's detected lspci -s 0000:01:00.0 0000:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 3. Remove the drive via the PCIe interface echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove 4. Check it's removed lspci -s 0000:01:00.0 5. Rescan the PCIe bus echo 1 > /sys/bus/pci/rescan [ 2677.645883] pci 0000:01:00.0: BAR 0: assigned [mem 0x400000000-0x400003fff 64bit] [ 2677.656855] nvme nvme0: pci function 0000:01:00.0 6. Repeat steps 2 through 5 until the error occurs. Further debugging revealed that there is a race condition while setting is_added and is_busmaster bits of pci_dev structure. This issue is happening with multiple times device removal and rescan from sysfs. Card is not removed physically. Is_added bit is set after device attach which probe nvme driver. NVMe driver starts one workqueue and that one is calling pci_set_master() to set is_busmaster bit. With multiple times device removal and rescan from sysfs, race condition happens and is_added bit is over-written to 0 from workqueue started by NVMe driver. Due to this when user tries to remove sysfs entry, it fails in pci_stop_dev() as it checks for is_added bit which was supposed to be 1 but observed as zero and proc,sysfs entries remains. This causes failure while creating new proc,sysfs entries as part of device addition as proc directory already existing. Sequence 1: pci_bus_add_device() { device_attach(); ... dev->is_added = 1; } Sequence 2: nvme_probe() { ... INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work); ... } nvme_reset_work()--->nvme_pci_enable()-->pci_set_master()-->__pci_set_master(true)-->dev->is_busmaster = enable