Bug 177551

Summary: SLAB freelist randomization breaks usb hub init during boot
Product: Memory Management Reporter: userwithuid (userwithuid)
Component: Slab AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED CODE_FIX    
Severity: normal CC: darknater, dclowes1, pinaar14, sean, yjcoshc, zeus.jay
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.8.1 Subsystem:
Regression: No Bisected commit-id:

Description userwithuid 2016-10-14 01:47:59 UTC
4.8 made CONFIG_SLAB_FREELIST_RANDOM available for CONFIG_SLUB. On one of my machines, having this switched on (which it is for Arch's default kernel) causes errors during boot for my external usb hubs and makes some (but not all) attached devices unavailable. Boot still finishes and kinda works, but a kswapd0 process uses 100% on 1 core forever.

Short log summary:
usb 4-2.1.1: new SuperSpeed USB device number 10 using xhci_hcd
kworker/3:2: page allocation failure: order:0, mode:0x2604001
Workqueue: usb_hub_wq hub_event
SLUB: Unable to allocate memory on node -1, gfp=0x2408001
uas: probe of 4-2.1.1:1.0 failed with error -12

USB setup is: PCIe USB 3 controller -> 7-port hub (so really 4-port -> 4-port) -> 7 devices (mostly storage). The partially working/partially failing might be due to only the first or second 4-port failing?

I checked with a custom kernel (1 on, 1 off) to verify CONFIG_SLAB_FREELIST_RANDOM is causing this. But I guess it could still be a hardware or driver bug exposed by the option, right?

Anyway, tell me if you need more things. :-)




More log (before and after looks normal, some devices initialize, those that don't have the same error block):

usb 4-2.1.1: new SuperSpeed USB device number 10 using xhci_hcd
kworker/3:2: page allocation failure: order:0, mode:0x2604001(GFP_NOIO|GFP_DMA|__GFP_COMP|__GFP_NOTRACK)
CPU: 3 PID: 211 Comm: kworker/3:2 Tainted: G          I     4.8.1 #1
Hardware name: System manufacturer System Product Name/P6T DELUXE V2, BIOS 1202    12/22/2010
Workqueue: usb_hub_wq hub_event
 0000000000000286 00000000aacaa45e ffff9f2768d570d8 ffffffffac80ec36
 0000000000000000 0000000000000000 ffff9f2768d57168 ffffffffac2b46d0
 02604001ac2cbcb3 0000000000000000 0000004202624001 0000000002604001
Call Trace:
 [<ffffffffac80ec36>] dump_stack+0x76/0xa0
 [<ffffffffac2b46d0>] warn_alloc_failed+0x110/0x170
 [<ffffffffac2b4f19>] __alloc_pages_nodemask+0x759/0x2040
 [<ffffffffac3221fe>] alloc_pages_current+0x10e/0x330
 [<ffffffffac32d0aa>] allocate_slab+0x7da/0xa10
 [<ffffffffac330878>] ___slab_alloc+0x518/0x650
 [<ffffffffacd1b1a5>] ? xhci_ring_alloc.constprop.18+0x1b5/0x3e0
 [<ffffffffac0ae468>] ? dma_generic_alloc_coherent+0x1f8/0x230
 [<ffffffffac3309ee>] __slab_alloc.isra.21+0x3e/0x60
 [<ffffffffac330e7f>] __kmalloc+0x1af/0x210
 [<ffffffffacd1b1a5>] xhci_ring_alloc.constprop.18+0x1b5/0x3e0
 [<ffffffffacd1db71>] xhci_alloc_stream_info+0x231/0x430
 [<ffffffffacd1d82b>] ? xhci_alloc_command+0xbb/0x1d0
 [<ffffffffacd1788d>] xhci_alloc_streams+0x58d/0xa10
 [<ffffffffacd17300>] ? xhci_check_bandwidth+0x4f0/0x4f0
 [<ffffffffacce2e9a>] usb_alloc_streams+0xca/0x120
 [<ffffffffacd2ee5c>] uas_configure_endpoints+0x1ac/0x1d0
 [<ffffffffacd2f364>] uas_probe+0x384/0x5b0
 [<ffffffffacceeb16>] usb_probe_interface+0x226/0x420
 [<ffffffffacbacd0f>] driver_probe_device+0x43f/0x640
 [<ffffffffacbad18e>] __device_attach_driver+0xee/0x1a0
 [<ffffffffacbad0a0>] ? __driver_attach+0x190/0x190
 [<ffffffffacba98d6>] bus_for_each_drv+0x76/0xc0
 [<ffffffffacbad3d5>] device_initial_probe+0x145/0x200
 [<ffffffffacbab655>] bus_probe_device+0xa5/0xb0
 [<ffffffffacba7914>] device_add+0x444/0x900
 [<ffffffffaccec0a7>] usb_set_configuration+0x9c7/0xe00
 [<ffffffffaccfc001>] generic_probe+0x41/0xa0
 [<ffffffffaccee72c>] usb_probe_device+0x4c/0xb0
 [<ffffffffacbacd0f>] driver_probe_device+0x43f/0x640
 [<ffffffffacbad18e>] __device_attach_driver+0xee/0x1a0
 [<ffffffffacbad0a0>] ? __driver_attach+0x190/0x190
 [<ffffffffacba98d6>] bus_for_each_drv+0x76/0xc0
 [<ffffffffacbad3d5>] device_initial_probe+0x145/0x200
 [<ffffffffacbab655>] bus_probe_device+0xa5/0xb0
 [<ffffffffacba7914>] device_add+0x444/0x900
 [<ffffffffac941b00>] ? init_std_data+0x280/0x2b0
 [<ffffffffaccde607>] usb_new_device+0x1d7/0x420
 [<ffffffffacce0f1d>] hub_event+0x109d/0x1ac0
 [<ffffffffaccedcf3>] ? usb_autopm_put_interface_async+0x43/0x50
 [<ffffffffac182022>] process_one_work+0x202/0x4c0
 [<ffffffffac182337>] worker_thread+0x57/0x570
 [<ffffffffac1822e0>] ? process_one_work+0x4c0/0x4c0
 [<ffffffffac18b708>] kthread+0x138/0x150
 [<ffffffffad15253f>] ret_from_fork+0x1f/0x40
 [<ffffffffac18b5d0>] ? kthread_worker_fn+0x180/0x180
Mem-Info:
active_anon:2226 inactive_anon:2100 isolated_anon:0
 active_file:1584 inactive_file:7617 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:3949 slab_unreclaimable:8009
 mapped:2627 shmem:2113 pagetables:433 bounce:0
 free:4058037 free_pcp:1109 free_cma:0
Node 0 active_anon:8904kB inactive_anon:8400kB active_file:6336kB inactive_file:30468kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:10508kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 8452kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 0 DMA free:52kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:15848kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 2973 15995 15995
Node 0 DMA32 free:3054188kB min:12548kB low:15684kB high:18820kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3120640kB managed:3055072kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:844kB local_pcp:96kB free_cma:0kB
lowmem_reserve[]: 0 0 13022 13022
Node 0 Normal free:13177908kB min:54968kB low:68708kB high:82448kB active_anon:8904kB inactive_anon:8400kB active_file:6336kB inactive_file:30468kB unevictable:0kB writepending:0kB present:13631488kB managed:13339304kB mlocked:0kB slab_reclaimable:15796kB slab_unreclaimable:16188kB kernel_stack:3408kB pagetables:1732kB bounce:0kB free_pcp:3592kB local_pcp:144kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 52kB
Node 0 DMA32: 1*4kB (M) 1*8kB (U) 0*16kB 1*32kB (M) 1*64kB (M) 2*128kB (M) 3*256kB (UM) 1*512kB (U) 3*1024kB (UM) 3*2048kB (UM) 743*4096kB (M) = 3054188kB
Node 0 Normal: 9*4kB (UME) 16*8kB (ME) 16*16kB (UM) 8*32kB (ME) 11*64kB (UME) 3*128kB (UM) 3*256kB (UME) 1*512kB (E) 2*1024kB (ME) 2*2048kB (ME) 3215*4096kB (M) = 13177828kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
11311 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
4192029 pages RAM
0 pages HighMem/MovableOnly
89460 pages reserved
0 pages cma reserved
0 pages hwpoisoned
SLUB: Unable to allocate memory on node -1, gfp=0x2408001(GFP_NOIO|GFP_DMA|__GFP_ZERO)
  cache: dma-kmalloc-1024, object size: 1024, buffer size: 1024, default order: 3, min order: 0
  node 0: slabs: 503, objs: 15844, free: 0
uas: probe of 4-2.1.1:1.0 failed with error -12
Comment 1 darknater 2016-11-17 22:05:03 UTC
Seeing this as well, adding kernel cmdline 
usb-storage.quirks=0x174c:0x55aa:u so that
uas doesn't grab the disks, for my specific case, works around the
problem.  device above worked fine before 4.8.

Seeing this on fedora 24 any 4.8.x kernel
Comment 2 userwithuid 2016-11-18 15:18:09 UTC
Crap, I totally forgot this bug.

I tried to figure out what hardware configuration causes the issue, but eventually gave up and also ended up with turning off uas (I disabled the module).

I'm pretty sure now that freelist randomization is not the actual cause here, I managed to get the same error (run out of memory) with it disabled on a different machine.

I can also say that it is not limited to the ASM USB-to-SATA bridge mentioned above (174c:55aa), I get the same error with Seagate disks as well (e.g. 0bc2:ab21 or 0bc2:2312, Initio chipset).

I also noticed that if I plug in only a single disk, it works even with uas. So maybe this bug is only triggered if multiple or too many uas-enabled disks are initialized or initialized at the same time?
Comment 3 Jay 2016-12-02 23:57:54 UTC
Hi I get the exactly same issues on connecting a third Seagate Drive running 16.10 ubuntu


Is there an easy fix?

Linux Pandion 4.8.0-28-generic #30-Ubuntu SMP Fri Nov 11 14:03:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux



27.026947] usb 4-2.1.4: new SuperSpeed USB device number 6 using xhci_hcd
[   27.047457] usb 4-2.1.4: New USB device found, idVendor=0bc2, idProduct=3322
[   27.047459] usb 4-2.1.4: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[   27.047461] usb 4-2.1.4: Product: Expansion Desk
[   27.047463] usb 4-2.1.4: Manufacturer: Seagate
[   27.047464] usb 4-2.1.4: SerialNumber: NA8ELXQN
[   27.052790] kworker/2:1: page allocation failure: order:0, mode:0x2604001(GFP_NOIO|GFP_DMA|__GFP_COMP|__GFP_NOTRACK)
[   27.052793] CPU: 2 PID: 122 Comm: kworker/2:1 Tainted: P           OE   4.8.0-28-generic #30-Ubuntu
[   27.052793] Hardware name: LENOVO 2349SVL/2349SVL, BIOS G1ETB1WW (2.71 ) 08/08/2016
[   27.052799] Workqueue: usb_hub_wq hub_event
[   27.052801]  0000000000000286 00000000dc0d8d0f ffff96dfe2e37150 ffffffffac62fae2
[   27.052804]  0000000000000000 0000000000000000 ffff96dfe2e371e0 ffffffffac3aa3f1
[   27.052806]  02604001ac3bbb59 ffff96dffe29cb58 0000000000000040 0000000000000012
[   27.052809] Call Trace:
[   27.052813]  [<ffffffffac62fae2>] dump_stack+0x63/0x81
[   27.052815]  [<ffffffffac3aa3f1>] warn_alloc_failed+0x101/0x160
[   27.052818]  [<ffffffffac645af8>] ? find_next_bit+0x18/0x20
[   27.052820]  [<ffffffffac3aa6d1>] __alloc_pages_slowpath+0x201/0xa60
[   27.052822]  [<ffffffffac3ab1e5>] __alloc_pages_nodemask+0x2b5/0x300
[   27.052825]  [<ffffffffac3fec25>] alloc_pages_current+0x95/0x140
[   27.052828]  [<ffffffffac408699>] new_slab+0x419/0x6e0
[   27.052830]  [<ffffffffac40a210>] ___slab_alloc+0x3a0/0x4b0
[   27.052832]  [<ffffffffac8af1db>] ? xhci_segment_alloc.isra.25+0xfb/0x140
[   27.052835]  [<ffffffffac265235>] ? x86_swiotlb_alloc_coherent+0x25/0x50
[   27.052837]  [<ffffffffac40a340>] __slab_alloc+0x20/0x40
[   27.052838]  [<ffffffffac40bdf2>] __kmalloc+0x182/0x1e0
[   27.052840]  [<ffffffffac8af1db>] ? xhci_segment_alloc.isra.25+0xfb/0x140
[   27.052842]  [<ffffffffac8af1db>] xhci_segment_alloc.isra.25+0xfb/0x140
[   27.052844]  [<ffffffffac8af263>] xhci_alloc_segments_for_ring+0x43/0x100
[   27.052846]  [<ffffffffac8af3de>] xhci_ring_alloc.constprop.32+0xbe/0x140
[   27.052848]  [<ffffffffac8b0a9f>] xhci_alloc_stream_info+0x1df/0x3e0
[   27.052850]  [<ffffffffac8b0820>] ? xhci_alloc_command+0x100/0x140
[   27.052852]  [<ffffffffac8ac0e7>] xhci_alloc_streams+0x447/0x840
[   27.052854]  [<ffffffffac8abca0>] ? xhci_check_bandwidth+0x370/0x370
[   27.052856]  [<ffffffffac866d27>] usb_alloc_streams+0xb7/0x110
[   27.052860]  [<ffffffffc0b51918>] uas_configure_endpoints+0x148/0x170 [uas]
[   27.052862]  [<ffffffffc0b52455>] uas_probe+0x3a5/0x550 [uas]
[   27.052866]  [<ffffffffac870439>] usb_probe_interface+0x159/0x2d0
[   27.052868]  [<ffffffffac79dfb3>] driver_probe_device+0x223/0x430
[   27.052870]  [<ffffffffac79e33c>] __device_attach_driver+0x8c/0x100
[   27.052871]  [<ffffffffac79e2b0>] ? __driver_attach+0xf0/0xf0
[   27.052873]  [<ffffffffac79bb67>] bus_for_each_drv+0x67/0xb0
[   27.052875]  [<ffffffffac79dc1d>] __device_attach+0xdd/0x160
[   27.052876]  [<ffffffffac79e3f3>] device_initial_probe+0x13/0x20
[   27.052878]  [<ffffffffac79cdd2>] bus_probe_device+0x92/0xa0
[   27.052879]  [<ffffffffac79a9d4>] device_add+0x494/0x660
[   27.052881]  [<ffffffffac85fd7c>] ? usb_enable_lpm+0xdc/0x100
[   27.052883]  [<ffffffffac86e37c>] usb_set_configuration+0x5ec/0x930
[   27.052886]  [<ffffffffac87954e>] generic_probe+0x2e/0x80
[   27.052888]  [<ffffffffac87029e>] usb_probe_device+0x2e/0x70
[   27.052889]  [<ffffffffac79dfb3>] driver_probe_device+0x223/0x430
[   27.052891]  [<ffffffffac79e33c>] __device_attach_driver+0x8c/0x100
[   27.052892]  [<ffffffffac79e2b0>] ? __driver_attach+0xf0/0xf0
[   27.052893]  [<ffffffffac79bb67>] bus_for_each_drv+0x67/0xb0
[   27.052895]  [<ffffffffac79dc1d>] __device_attach+0xdd/0x160
[   27.052897]  [<ffffffffac79e3f3>] device_initial_probe+0x13/0x20
[   27.052898]  [<ffffffffac79cdd2>] bus_probe_device+0x92/0xa0
[   27.052899]  [<ffffffffac79a9d4>] device_add+0x494/0x660
[   27.052902]  [<ffffffffac764e00>] ? perf_trace_urandom_read+0xb0/0x100
[   27.052903]  [<ffffffffac863465>] usb_new_device+0x275/0x490
[   27.052905]  [<ffffffffac864a3e>] hub_port_connect+0x50e/0x9d0
[   27.052907]  [<ffffffffac865858>] hub_event+0x958/0xb10
[   27.052909]  [<ffffffffac29d85c>] process_one_work+0x1fc/0x4b0
[   27.052911]  [<ffffffffac29db5b>] worker_thread+0x4b/0x500
[   27.052912]  [<ffffffffac29db10>] ? process_one_work+0x4b0/0x4b0
[   27.052915]  [<ffffffffac2a3e58>] kthread+0xd8/0xf0
[   27.052918]  [<ffffffffaca9a25f>] ret_from_fork+0x1f/0x40
[   27.052920]  [<ffffffffac2a3d80>] ? kthread_create_on_node+0x1e0/0x1e0
[   27.052921] Mem-Info:
[   27.052926] active_anon:212494 inactive_anon:7575 isolated_anon:0
                active_file:110869 inactive_file:132002 isolated_file:0
                unevictable:16 dirty:3268 writeback:0 unstable:0
                slab_reclaimable:17647 slab_unreclaimable:17609
                mapped:94066 shmem:7085 pagetables:10427 bounce:0
                free:3488049 free_pcp:437 free_cma:0
[   27.052930] Node 0 active_anon:849976kB inactive_anon:30300kB active_file:443476kB inactive_file:528008kB unevictable:64kB isolated(anon):0kB isolated(file):0kB mapped:376264kB dirty:13072kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 108544kB anon_thp: 28340kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
[   27.052931] Node 0 DMA free:60kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:15840kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   27.052935] lowmem_reserve[]: 0 2670 15684 15684 15684
[   27.052938] Node 0 DMA32 free:2779636kB min:11492kB low:14364kB high:17236kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2846280kB managed:2780712kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1040kB local_pcp:116kB free_cma:0kB
[   27.052942] lowmem_reserve[]: 0 0 13013 13013 13013
[   27.052944] Node 0 Normal free:11172500kB min:56020kB low:70024kB high:84028kB active_anon:849976kB inactive_anon:30300kB active_file:443476kB inactive_file:528008kB unevictable:64kB writepending:13072kB present:13604864kB managed:13329440kB mlocked:64kB slab_reclaimable:70588kB slab_unreclaimable:54596kB kernel_stack:12224kB pagetables:41708kB bounce:0kB free_pcp:708kB local_pcp:216kB free_cma:0kB
[   27.052948] lowmem_reserve[]: 0 0 0 0 0
[   27.052951] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 60kB
[   27.052960] Node 0 DMA32: 3*4kB (UM) 3*8kB (UM) 1*16kB (M) 2*32kB (M) 2*64kB (M) 2*128kB (UM) 2*256kB (M) 3*512kB (UM) 4*1024kB (UM) 4*2048kB (UM) 675*4096kB (M) = 2779636kB
[   27.052970] Node 0 Normal: 1*4kB (E) 51*8kB (UME) 30*16kB (UE) 2*32kB (ME) 323*64kB (U) 284*128kB (UE) 201*256kB (UE) 121*512kB (UM) 53*1024kB (U) 15*2048kB (UME) 2665*4096kB (UM) = 11172220kB
[   27.052982] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   27.052982] 249958 total pagecache pages
[   27.052983] 0 pages in swap cache
[   27.052984] Swap cache stats: add 0, delete 0, find 0/0
[   27.052985] Free swap  = 8388604kB
[   27.052985] Total swap = 8388604kB
[   27.052986] 4116782 pages RAM
[   27.052987] 0 pages HighMem/MovableOnly
[   27.052987] 85269 pages reserved
[   27.052987] 0 pages cma reserved
[   27.052988] 0 pages hwpoisoned
[   27.052990] SLUB: Unable to allocate memory on node -1, gfp=0x2408001(GFP_NOIO|GFP_DMA|__GFP_ZERO)
[   27.052991]   cache: dma-kmalloc-1024, object size: 1024, buffer size: 1024, default order: 3, min order: 0
[   27.052992]   node 0: slabs: 486, objs: 15496, free: 0
[   27.053898] uas: probe of 4-2.1.4:1.0 failed with error -12
Comment 4 Jay 2016-12-04 01:58:29 UTC
(In reply to userwithuid from comment #2)
> Crap, I totally forgot this bug.
> 
> I tried to figure out what hardware configuration causes the issue, but
> eventually gave up and also ended up with turning off uas (I disabled the
> module).
> 
> I'm pretty sure now that freelist randomization is not the actual cause
> here, I managed to get the same error (run out of memory) with it disabled
> on a different machine.
> 
> I can also say that it is not limited to the ASM USB-to-SATA bridge
> mentioned above (174c:55aa), I get the same error with Seagate disks as well
> (e.g. 0bc2:ab21 or 0bc2:2312, Initio chipset).
> 
> I also noticed that if I plug in only a single disk, it works even with uas.
> So maybe this bug is only triggered if multiple or too many uas-enabled
> disks are initialized or initialized at the same time?

I notice this happens when I connect a third disk. 2 connected work fine.
Comment 5 userwithuid 2016-12-05 20:29:58 UTC
(In reply to Jay from comment #3)
> 
> Is there an easy fix?

You can blacklist the "uas" kernel module.


In case you don't know how:

Create a new file /etc/modprobe.d/blacklist-uas.conf
Content:

blacklist uas


You should also propagate the change to the initramfs. On debian/ubuntu, I think that's "sudo update-initramfs -u".

Then reboot. Check whether it worked with "lsmod" (should not list "uas" any more).
Comment 6 yjcoshc 2016-12-31 01:25:05 UTC
(In reply to Jay from comment #4)
> (In reply to userwithuid from comment #2)
> > Crap, I totally forgot this bug.
> > 
> > I tried to figure out what hardware configuration causes the issue, but
> > eventually gave up and also ended up with turning off uas (I disabled the
> > module).
> > 
> > I'm pretty sure now that freelist randomization is not the actual cause
> > here, I managed to get the same error (run out of memory) with it disabled
> > on a different machine.
> > 
> > I can also say that it is not limited to the ASM USB-to-SATA bridge
> > mentioned above (174c:55aa), I get the same error with Seagate disks as
> well
> > (e.g. 0bc2:ab21 or 0bc2:2312, Initio chipset).
> > 
> > I also noticed that if I plug in only a single disk, it works even with
> uas.
> > So maybe this bug is only triggered if multiple or too many uas-enabled
> > disks are initialized or initialized at the same time?
> 
> I notice this happens when I connect a third disk. 2 connected work fine.

It seems if there are three disks using uas then the bug will be triggered. 2 uas disks + 1 usb-storage disk is fine.
Comment 7 Douglas Clowes 2017-01-13 03:12:36 UTC
For me kswapd0 goes high-cpu following a page allocation failure in xhci_segment_alloc and I think that this has been occurring since moving to 4.8 on Fedora 24. I don't remember experiencing it before that. Currently on 4.8.15.

I normally boot with 3 or 4 USB 3.0 disks attached and, after the upgrade to 4.8.x noticed that kswapd0 was running at 100%. I went back to 4.7.x and no problem. Searches on this issue frequently referred to USB disks so I unplugged and rebooted.

If I unplug all of my USB 3.0 devices I get a normal boot, even with a USB weather station, keyboard, mouse. Sometimes, one or two USB 3.0 disks is OK too, If I boot with all of the USB 3.0 disks included, I get a kworker page allocation failure and after boot kswapd0 is high-cpu, usually split across 2-4 cores.

If I boot with two USB 3.0 disks and get a normal boot (no page allocation failure and normal kswapd) and then plug in a hub with the rest of the disks (and a USB 3.0 card reader) I get the page allocation failure at that point and kswapd0 goes high-cpu.

I have not looked at them all, but whenever I see kswapd0 high-cpu and I do look, there is the page allocation failure in the log.

The 'perf top' command seems to show different information from time to time but the top contenders are frequently 'shrink_inactive_list', 'inactive_list_is_low', 'find_next_bit', 'shrink_none_memcg', '_raw_spin_lock' to name a few.

Makes me wonder if the xhci allocation failure is the trigger, and fails to clean up on the error exit path, and kswapd0 is just a hapless victim. There is a stack trace (on ubuntu kernel) of the page allocation failure in the dmesg attached to https://bugzilla.redhat.com/show_bug.cgi?id=1395825 on this issue but I have more if it would help.

I have 19GiB free on a 24GiB machine so there should be no memory shortage to prompt swapping or the page allocation failure.

I had also noticed frequently that not all of my USB disks were mounted after boot and that I had to remove and reinsert a disk to use it. IIRC this affected my USB 2.0 disks too and from before the upgrade to 4.8 too.

There appears to be two issues: one is that the allocation fails and the manner of its failure; the other is what kswapd0 does when it happens. If the root problem is in __kmalloc or below, there could be other causes, unrelated to xhci, which cause the same kswapd0 behaviour.

In my case, I have attempted to characterize the xhci fault by inserting and removing several disks. I have yet to have a failure with a single USB 3.0 disk, or any USB 2.0 disk. It's usually when I insert the third USB 3.0 disk or, if not then the fourth.

When kswapd0 goes high-cpu, it is usually >97% and stays that way for days until I reboot. Doing 'sudo perf top' usually yields something different each time, but this is a not-uncommon display:

  23.84%  [kernel]                            [k] shrink_inactive_list
  19.55%  [kernel]                            [k] inactive_list_is_low
   8.01%  [kernel]                            [k] shrink_node_memcg
   7.87%  [kernel]                            [k] _find_next_bit.part.0
   6.53%  [kernel]                            [k] find_next_bit
   6.19%  [kernel]                            [k] super_cache_count
   5.60%  [kernel]                            [k] _raw_spin_lock
   5.12%  [kernel]                            [k] shrink_slab.part.39
...
or
  25.79%  [kernel]                            [k] inactive_list_is_low
  16.49%  [kernel]                            [k] shrink_inactive_list
   8.82%  [kernel]                            [k] shrink_node_memcg
   7.79%  [kernel]                            [k] find_next_bit
   7.78%  [kernel]                            [k] _raw_spin_lock
   6.19%  [kernel]                            [k] __list_lru_count_one.isra.2
   6.09%  [kernel]                            [k] _find_next_bit.part.0
   5.04%  [kernel]                            [k] super_cache_count
   2.49%  [kernel]                            [k] shrink_slab.part.39
...
and this is a not-uncommon trigger event from the log:
usb 12-4.1.2: new SuperSpeed USB device number 7 using xhci_hcd
usb 12-4.1.2: New USB device found, idVendor=0bc2, idProduct=ab34
usb 12-4.1.2: New USB device strings: Mfr=2, Product=3, SerialNumber=1
usb 12-4.1.2: Product: Backup+  Desk
usb 12-4.1.2: Manufacturer: Seagate
usb 12-4.1.2: SerialNumber: NA7EHNW5
kworker/4:1: page allocation failure: order:0, mode:0x2604001(GFP_NOIO|GFP_DMA|__GFP_COMP|__GFP_NOTRACK)
CPU: 4 PID: 29964 Comm: kworker/4:1 Tainted: P          IOE   4.8.15-200.fc24.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. X58A-UD5/X58A-UD5, BIOS Fe5 03/07/2011
Workqueue: usb_hub_wq hub_event
 0000000000000286 0000000001011d75 ffff949dce7cf1e8 ffffffff953e5e1d
 0000000000000000 0000000000000000 ffff949dce7cf278 ffffffff951c2f51
 02604001951d4a18 ffff949faffd70a0 0000000000000020 0000000002624001
Call Trace:
 [<ffffffff953e5e1d>] dump_stack+0x63/0x86
 [<ffffffff951c2f51>] warn_alloc_failed+0x101/0x170
 [<ffffffff951c3583>] __alloc_pages_nodemask+0x543/0xe00
 [<ffffffff9521a0a5>] alloc_pages_current+0x95/0x140
 [<ffffffff95224891>] new_slab+0x301/0x780
 [<ffffffff95225e12>] ___slab_alloc+0x3f2/0x540
 [<ffffffff9560805b>] ? xhci_segment_alloc.isra.24+0xfb/0x140
 [<ffffffff9505e1a5>] ? x86_swiotlb_alloc_coherent+0x25/0x50
 [<ffffffff95225f80>] __slab_alloc+0x20/0x40
 [<ffffffff95227838>] __kmalloc+0x188/0x1f0
 [<ffffffff9560805b>] ? xhci_segment_alloc.isra.24+0xfb/0x140
 [<ffffffff9560805b>] xhci_segment_alloc.isra.24+0xfb/0x140
 [<ffffffff95608126>] xhci_alloc_segments_for_ring+0x86/0x100
 [<ffffffff9560825e>] xhci_ring_alloc.constprop.33+0xbe/0x140
 [<ffffffff9560992f>] xhci_alloc_stream_info+0x1df/0x3e0
 [<ffffffff956096b0>] ? xhci_alloc_command+0x100/0x140
 [<ffffffff95604fa7>] xhci_alloc_streams+0x447/0x840
 [<ffffffff95604b60>] ? xhci_check_bandwidth+0x370/0x370
 [<ffffffff955cc3e7>] usb_alloc_streams+0xb7/0x110
 [<ffffffffc154f918>] uas_configure_endpoints+0x148/0x170 [uas]
 [<ffffffffc1550455>] uas_probe+0x3a5/0x550 [uas]
 [<ffffffff955d5a19>] usb_probe_interface+0x159/0x2d0
 [<ffffffff9552e903>] driver_probe_device+0x223/0x430
 [<ffffffff9552ec8c>] __device_attach_driver+0x8c/0x100
 [<ffffffff9552ec00>] ? __driver_attach+0xf0/0xf0
 [<ffffffff9552c367>] bus_for_each_drv+0x67/0xb0
 [<ffffffff9552e56d>] __device_attach+0xdd/0x160
 [<ffffffff9552ed43>] device_initial_probe+0x13/0x20
 [<ffffffff9552d732>] bus_probe_device+0x92/0xa0
 [<ffffffff9552b384>] device_add+0x494/0x660
 [<ffffffff955d39e5>] usb_set_configuration+0x5e5/0x930
 [<ffffffff955dec1e>] generic_probe+0x2e/0x80
...
 [<ffffffff950c0b30>] ? kthread_worker_fn+0x180/0x180
Mem-Info:
active_anon:1800847 inactive_anon:479167 isolated_anon:0
 active_file:2135772 inactive_file:1191657 isolated_file:0
 unevictable:4119 dirty:120 writeback:0 unstable:0
 slab_reclaimable:131717 slab_unreclaimable:43534
 mapped:150853 shmem:33925 pagetables:22878 bounce:0
 free:291927 free_pcp:2122 free_cma:0
Node 0 active_anon:7203388kB inactive_anon:1916668kB active_file:8543088kB inactive_file:4766628kB unevictable:16476kB isolated(anon):0kB is
Node 0 DMA free:20kB min:40kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepe
lowmem_reserve[]: 0 3227 24069 24069 24069
Node 0 DMA32 free:247176kB min:9056kB low:12360kB high:15664kB active_anon:1399160kB inactive_anon:138840kB active_file:735344kB inactive_fi
lowmem_reserve[]: 0 0 20842 20842 20842
Node 0 Normal free:920512kB min:58484kB low:79824kB high:101164kB active_anon:5804228kB inactive_anon:1777828kB active_file:7807744kB inacti
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
Node 0 DMA32: 15181*4kB (UME) 6716*8kB (UME) 2109*16kB (UME) 447*32kB (UME) 197*64kB (UME) 110*128kB (UME) 79*256kB (UME) 58*512kB (UME) 8*1
Node 0 Normal: 28811*4kB (UME) 38298*8kB (UME) 12774*16kB (UMEH) 4508*32kB (UMEH) 1424*64kB (UMEH) 192*128kB (UME) 56*256kB (UME) 21*512kB (
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
3379879 total pagecache pages
16916 pages in swap cache
Swap cache stats: add 280864, delete 263948, find 699757/729550
Free swap  = 3624668kB
Total swap = 4079612kB
6291052 pages RAM
0 pages HighMem/MovableOnly
119166 pages reserved
0 pages cma reserved
0 pages hwpoisoned
SLUB: Unable to allocate memory on node -1, gfp=0x2408001(GFP_NOIO|GFP_DMA|__GFP_ZERO)
  cache: dma-kmalloc-1024, object size: 1024, buffer size: 1024, default order: 3, min order: 0
  node 0: slabs: 497, objs: 15652, free: 0
uas: probe of 12-4.1.2:1.0 failed with error -12
from plugging in a Seagate Backup+ 5T USB 3.0 drive.

The system is still usable but does get a little sluggish at times, possibly increasingly so.

The first such event in my log occurred on Nov 10 during the first boot of "Linux version 4.8.6-201.fc24.x86_64" following an update. The same hardware had been running on the previous boot of 4.7.9
Comment 8 Sean Rees 2017-02-08 09:12:32 UTC
I submitted a patch for this (& it was queued), FYI: http://marc.info/?l=linux-mm-commits&m=148650422714993&w=2
Comment 9 userwithuid 2017-02-09 14:57:42 UTC
Awesome! I just tried 4.10 git with the fix and everything works fine again for me (freelist-random enabled, 6 usb disks at boot, no workarounds like disabling uas needed any more).

Thank you very much for fixing this, much appreciated. :-)