Bug 196583
Summary: | RCU hang | ||
---|---|---|---|
Product: | File System | Reporter: | Steinar H. Gunderson (steinar+kernel) |
Component: | NFS | Assignee: | bfields |
Status: | RESOLVED WILL_NOT_FIX | ||
Severity: | high | CC: | bjoern, mail, trondmy |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.11.8 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Steinar H. Gunderson
2017-08-04 14:10:35 UTC
I don't understand what's going on, but I'm suspicious of the serial console and printk stuff in there. Do you have some kind of extra debug logging turn on? It seems to me I've gotten myself into this kind of problem before when frequent printk's were sent to a serial console. There's a serial console, but no extra debug logging, and the rest of the syslog indicates the serial console should have been entirely quiet at the time (ie., just a getty standing still). Looking at it, the timing doesn't indicate the serial console is the root issue either: 3048800.249336: First RCU hang is detected 3048800.269340 (20 ms later): Second RCU hang is detected 3048800.270374 (21 ms later): Backtrace starts It's not so strange that the backtrace would be hitting a CPU that was still busy printing out the “RCU hang” message, is it? 20 ms at 115200 bps is only 288 bytes, after all, which is roughly the length of one of the RCU hang messages. (In reply to Steinar H. Gunderson from comment #3) > It's not so strange that the backtrace would be hitting a CPU that was still > busy printing out the “RCU hang” message, is it? 20 ms at 115200 bps is only > 288 bytes, after all, which is roughly the length of one of the RCU hang > messages. Yes, makes sense. Hm. This looks like some nice documentation on diagnosing that message: https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt Nothing in the cache_read code looks obviously suspicious to me, but I haven't looked very carefully yet. I take it this has only happened once so far? How recently did you upgrade to v4.11.8? Yes, only once. The upgrade to 4.11.8 was June 29, and it's been running since. It hung again today (with 4.11.8). We upgraded to 4.12.8 and got pretty much immediately the same bug. Back to 4.11.8, same bug quickly again. We've gone down to 4.9.44, and it seems to be stable for now, but it's of course hard to say. Nope, 4.9.44 also died after half an hour: [ 1413.576799] INFO: rcu_sched self-detected stall on CPU [ 1413.582039] 12-...: (6000 ticks this GP) idle=09d/140000000000001/0 softirq=17200/17200 fqs=2222 [ 1413.586806] INFO: rcu_sched detected stalls on CPUs/tasks: [ 1413.586812] 12-...: (6000 ticks this GP) idle=09d/140000000000001/0 softirq=17200/17200 fqs=2222 [ 1413.586815] (detected by 18, t=6002 jiffies, g=37798, c=37797, q=75916) [ 1413.586817] Task dump for CPU 12: [ 1413.586820] rpc.svcgssd R running task 0 3181 1 0x00000008 [ 1413.586825] ffff88085fb97100 ffff88105b59d4c0 ffff88085c2f8000 ffff881057176cc0 [ 1413.586827] 0000000000017100 ffffc9000c60ba38 ffffffff814b3f64 00ffffff81070295 [ 1413.586828] ffff88085fb97100 0000000100000000 ffff88105b59d4c0 0000000000000000 [ 1413.586829] Call Trace: [ 1413.586839] [<ffffffff814b3f64>] ? __schedule+0x3a7/0x4a9 [ 1413.586841] [<ffffffff814b40e3>] schedule+0x7d/0x87 [ 1413.586844] [<ffffffff814b5cb5>] schedule_hrtimeout_range_clock+0x38/0xc1 [ 1413.586847] [<ffffffff81150613>] ? __pollwait+0xbe/0xc7 [ 1413.586850] [<ffffffff81142aaa>] ? fput+0x11/0x83 [ 1413.586852] [<ffffffff81150883>] ? poll_freewait+0x43/0x94 [ 1413.586853] [<ffffffff81151897>] ? do_sys_poll+0x3ec/0x46c [ 1413.586857] [<ffffffff8106756f>] ? try_to_wake_up+0x1e1/0x1f3 [ 1413.586859] [<ffffffff810675e7>] ? default_wake_function+0xd/0xf [ 1413.586861] [<ffffffff81154188>] ? __d_lookup_rcu+0xbd/0xea [ 1413.586862] [<ffffffff81154188>] ? __d_lookup_rcu+0xbd/0xea [ 1413.586867] [<ffffffff8118a80e>] ? dquot_file_open+0x14/0x32 [ 1413.586871] [<ffffffff811a3707>] ? ext4_file_open+0x1dc/0x213 [ 1413.586874] [<ffffffff8120af98>] ? fsnotify_perm+0x60/0x6c [ 1413.586876] [<ffffffff811507f1>] ? poll_select_copy_remaining+0xfe/0xfe [ 1413.586898] [<ffffffffa02d2347>] ? qword_addhex+0x5b/0x7e [sunrpc] [ 1413.586903] [<ffffffffa0343fcc>] ? rsi_request+0x36/0x45 [auth_rpcgss] [ 1413.586917] [<ffffffffa02d3eb0>] ? cache_read.isra.21+0x104/0x2a5 [sunrpc] [ 1413.586929] [<ffffffffa02d4081>] ? cache_read_procfs+0x30/0x35 [sunrpc] [ 1413.586931] [<ffffffff81190751>] ? proc_reg_read+0x4e/0x63 [ 1413.586933] [<ffffffff811407cf>] ? __vfs_read+0x21/0x98 [ 1413.586934] [<ffffffff8120af98>] ? fsnotify_perm+0x60/0x6c [ 1413.586936] [<ffffffff8120bd3b>] ? security_file_permission+0x36/0x3d [ 1413.586938] [<ffffffff811412ec>] ? rw_verify_area+0x71/0xaa [ 1413.586940] [<ffffffff811413c4>] ? vfs_read+0x9f/0xd7 [ 1413.586942] [<ffffffff8114215f>] ? SyS_read+0x48/0x73 [ 1413.586945] [<ffffffff814b6860>] ? entry_SYSCALL_64_fastpath+0x13/0x94 [ 1413.808856] (t=6024 jiffies g=37798 c=37797 q=75967) [ 1413.814207] Task dump for CPU 12: [ 1413.817671] rpc.svcgssd R running task 0 3181 1 0x00000008 [ 1413.824977] ffff88085fb83e38 ffffffff8106947a 000000000000000c ffffffff81a44e80 [ 1413.832853] ffff88085fb83e50 ffffffff8106b3c8 ffffffff81a44e80 ffff88085fb83e88 [ 1413.840719] ffffffff810ed797 0000000c00000001 ffffffff81a44e80 ffff88085fb97e40 [ 1413.848590] Call Trace: [ 1413.851184] <IRQ> [ 1413.853159] [<ffffffff8106947a>] sched_show_task+0xca/0xd7 [ 1413.859036] [<ffffffff8106b3c8>] dump_cpu_task+0x32/0x35 [ 1413.864587] [<ffffffff810ed797>] rcu_dump_cpu_stacks+0x8d/0xa1 [ 1413.870668] [<ffffffff8108bcd5>] rcu_check_callbacks+0x234/0x5f7 [ 1413.876919] [<ffffffff8108f5fc>] update_process_times+0x2a/0x4f [ 1413.883081] [<ffffffff8109af8e>] tick_sched_handle+0x32/0x3e [ 1413.888983] [<ffffffff8109b4b3>] tick_sched_timer+0x34/0x5a [ 1413.894797] [<ffffffff8108fdba>] __hrtimer_run_queues+0xb0/0x183 [ 1413.901045] [<ffffffff8109021c>] hrtimer_interrupt+0x5f/0xff [ 1413.906941] [<ffffffff8102e8c4>] local_apic_timer_interrupt+0x46/0x49 [ 1413.913622] [<ffffffff8102ed9f>] smp_apic_timer_interrupt+0x29/0x39 [ 1413.920133] [<ffffffff814b742f>] apic_timer_interrupt+0x7f/0x90 [ 1413.926290] <EOI> [ 1413.928278] [<ffffffffa02d2347>] ? qword_addhex+0x5b/0x7e [sunrpc] [ 1413.934853] [<ffffffffa0343fcc>] rsi_request+0x36/0x45 [auth_rpcgss] [ 1413.941457] [<ffffffffa02d3eb0>] cache_read.isra.21+0x104/0x2a5 [sunrpc] [ 1413.948407] [<ffffffffa02d4081>] cache_read_procfs+0x30/0x35 [sunrpc] [ 1413.955085] [<ffffffff81190751>] proc_reg_read+0x4e/0x63 [ 1413.960640] [<ffffffff811407cf>] __vfs_read+0x21/0x98 [ 1413.965934] [<ffffffff8120af98>] ? fsnotify_perm+0x60/0x6c [ 1413.971664] [<ffffffff8120bd3b>] ? security_file_permission+0x36/0x3d [ 1413.978344] [<ffffffff811412ec>] ? rw_verify_area+0x71/0xaa [ 1413.984152] [<ffffffff811413c4>] vfs_read+0x9f/0xd7 [ 1413.989276] [<ffffffff8114215f>] SyS_read+0x48/0x73 [ 1413.994396] [<ffffffff814b6860>] entry_SYSCALL_64_fastpath+0x13/0x94 So do we have any known-good version at this point, or does it look possible that this problem could have always been there? I don't know why rsi_request() keeps showing up. It doesn't do much, and isn't called in any interesting context. That very first while loop in cache_read() is under a spinlock. In theory some bug might cause us to spend too long in that loop. stallwarn.txt says we can get these warnings if preemption is disabled too long (and I think an ordinary spinlock does disable preemption). I'm not sure how to test that hypothesis.... Maybe patch in a counter incremented on each pass through that loop, and WARN_ON_ONCE() if it gets too large. Might also be worth looking through stallwarn.txt to make sure none of the other cases apply there. (Does your .config have CONFIG_PREEMPT turned off? Are you doing any real-time scheduling?) Well, 4.4 hasn't crashed yet, so perhaps it's a good version? Hard to say. CONFIG_PREEMPT is indeed not set. There is no real-time scheduling. (In reply to Steinar H. Gunderson from comment #9) > Well, 4.4 hasn't crashed yet, so perhaps it's a good version? Hard to say. > > CONFIG_PREEMPT is indeed not set. So, any excessive looping could trigger this. rsi_request/qword_addhex still seems like an odd place for that. In theory corrupted rsi_in{handle,token}.len could cause it, but then it'd also be writing the whole time and I'd think we'd start seeing further oopses resulting from the random memory scribbles. Maybe we're hitting the "goto again" case in cache_read(), and cache_read(), that also covers rsi_request calls. So adding a counter and WARN might again be a way to check for that. Or just adding a schedule() somewhere in that loop and seeing the problem still reproduces after that. > So, any excessive looping could trigger this.
Well, if so, essentially an infinite loop. The problem isn't the warnings per se; it is that all 12 cores (24 hyperthreads) of the machine start to hang, and eventually the machine and all I/O becomes unusable. My understand is that if you just had excessive and not infinite looping, you'd get the warnings but that the system would eventually recover?
It's possible that CONFIG_PREEMPT could make it jump out of the loop somehow; we should probably enable it. But it sounds more like a workaround than a true bug fix.
For the record, the machine isn't heavily loaded at all. Typical load average is 1–2.
(In reply to Steinar H. Gunderson from comment #11) > Well, if so, essentially an infinite loop. The problem isn't the warnings > per se; it is that all 12 cores (24 hyperthreads) of the machine start to > hang, and eventually the machine and all I/O becomes unusable. My understand > is that if you just had excessive and not infinite looping, you'd get the > warnings but that the system would eventually recover? I'm not sure how many loops are required before it becomes a problem. I'm not sure why it's eventually hanging. There is probably important RCU job that are never getting done. It's also possible that a slow serial console, though not the original cause, is adding to the problem somehow. > It's possible that CONFIG_PREEMPT could make it jump out of the loop > somehow; we should probably enable it. It won't affect any of those loops directly but it will allow the looping task to be temporarily preempted for other work. > But it sounds more like a workaround than a true bug fix. Yes, just looking for ways to figure out where the bug is. Could you try applying this? Should be harmless at worst: --- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -767,6 +767,7 @@ static ssize_t cache_read(struct file *filp, char __user *buf, size_t count, struct cache_reader *rp = filp->private_data; struct cache_request *rq; struct inode *inode = file_inode(filp); + int loopcount = 0; int err; if (count == 0) @@ -837,8 +838,11 @@ static ssize_t cache_read(struct file *filp, char __user *buf, size_t count, } else spin_unlock(&queue_lock); } - if (err == -EAGAIN) + if (err == -EAGAIN) { + loopcount++; + WARN_ON_ONCE(loopcount > 100); goto again; + } inode_unlock(inode); return err ? err : count; Unfortunately, it's a server with ~1000 users; I'm not at a liberty to reboot it at will. Hi, After ~50 days on 4.4.83 with no problems, it triggered today: [Mon Oct 9 15:49:28 2017] INFO: rcu_sched self-detected stall on CPU [Mon Oct 9 15:49:28 2017] INFO: rcu_sched detected stalls on CPUs/tasks: [Mon Oct 9 15:49:28 2017] 3-...: (6000 ticks this GP) idle=809/140000000000001/0 softirq=197166633/197166633 fqs=5370 [Mon Oct 9 15:49:28 2017] [Mon Oct 9 15:49:28 2017] (detected by 17, t=6002 jiffies, g=165187832, c=165187831, q=901055) [Mon Oct 9 15:49:28 2017] Task dump for CPU 3: [Mon Oct 9 15:49:28 2017] rpc.svcgssd R [Mon Oct 9 15:49:28 2017] running task [Mon Oct 9 15:49:28 2017] 0 28629 1 0x00000008 [Mon Oct 9 15:49:28 2017] ffffffff8147f803 [Mon Oct 9 15:49:28 2017] 0000001000000010 [Mon Oct 9 15:49:28 2017] 00ffffff8106854d [Mon Oct 9 15:49:28 2017] ffff880800000003 [Mon Oct 9 15:49:28 2017] ffff8807a2981c00 [Mon Oct 9 15:49:28 2017] ffff881036f20000 [Mon Oct 9 15:49:28 2017] 0000000000000000 [Mon Oct 9 15:49:28 2017] ffff881036f1fb74 [Mon Oct 9 15:49:28 2017] 0000000000000000 [Mon Oct 9 15:49:28 2017] 0000000000000000 [Mon Oct 9 15:49:28 2017] ffff881036f1fa50 [Mon Oct 9 15:49:28 2017] ffffffff8147f932 [Mon Oct 9 15:49:28 2017] Call Trace: [Mon Oct 9 15:49:28 2017] [<ffffffff8147f803>] ? __schedule+0x4ae/0x56a [Mon Oct 9 15:49:28 2017] [<ffffffff8147f932>] ? schedule+0x73/0x81 [Mon Oct 9 15:49:28 2017] [<ffffffff8148166f>] ? schedule_hrtimeout_range_clock+0x37/0xbf [Mon Oct 9 15:49:28 2017] [<ffffffff811399e6>] ? __pollwait+0xbe/0xc7 [Mon Oct 9 15:49:28 2017] [<ffffffff8112c5b8>] ? fput+0x11/0x83 [Mon Oct 9 15:49:28 2017] [<ffffffff81139c92>] ? poll_freewait+0x43/0x94 [Mon Oct 9 15:49:28 2017] [<ffffffff8113ac96>] ? do_sys_poll+0x3f6/0x476 [Mon Oct 9 15:49:28 2017] [<ffffffff8113d375>] ? __d_lookup_rcu+0xae/0xd0 [Mon Oct 9 15:49:28 2017] [<ffffffff8113d375>] ? __d_lookup_rcu+0xae/0xd0 [Mon Oct 9 15:49:28 2017] [<ffffffff81171a01>] ? dquot_file_open+0x14/0x32 [Mon Oct 9 15:49:28 2017] [<ffffffff8106804b>] ? __enqueue_entity+0x67/0x69 [Mon Oct 9 15:49:28 2017] [<ffffffff8106ac25>] ? enqueue_entity+0x55d/0x6c6 [Mon Oct 9 15:49:28 2017] [<ffffffff81139bb9>] ? poll_select_copy_remaining+0xf6/0xf6 [Mon Oct 9 15:49:28 2017] [<ffffffffa02d19d1>] ? qword_addhex+0x2b/0x7e [sunrpc] [Mon Oct 9 15:49:28 2017] [<ffffffffa0320f87>] ? rsi_request+0x36/0x45 [auth_rpcgss] [Mon Oct 9 15:49:28 2017] [<ffffffffa02d3550>] ? cache_read.isra.17+0x104/0x2a5 [sunrpc] [Mon Oct 9 15:49:28 2017] [<ffffffffa02d3721>] ? cache_read_procfs+0x30/0x35 [sunrpc] [Mon Oct 9 15:49:28 2017] [<ffffffff8117733a>] ? proc_reg_read+0x4e/0x63 [Mon Oct 9 15:49:28 2017] [<ffffffff8112ab9f>] ? __vfs_read+0x21/0x98 [Mon Oct 9 15:49:28 2017] [<ffffffff811ef15c>] ? fsnotify_perm+0x60/0x6c [Mon Oct 9 15:49:28 2017] [<ffffffff811efef7>] ? security_file_permission+0x36/0x3d [Mon Oct 9 15:49:28 2017] [<ffffffff8112b2b8>] ? rw_verify_area+0x7c/0xd1 [Mon Oct 9 15:49:28 2017] [<ffffffff8112b393>] ? vfs_read+0x86/0xc2 [Mon Oct 9 15:49:28 2017] [<ffffffff8112bb7a>] ? SyS_read+0x48/0x73 [Mon Oct 9 15:49:28 2017] [<ffffffff81482017>] ? entry_SYSCALL_64_fastpath+0x12/0x66 [Mon Oct 9 15:49:31 2017] 3-...: (6000 ticks this GP) idle=809/140000000000001/0 softirq=197166633/197166633 fqs=5602 [Mon Oct 9 15:49:31 2017] (t=6275 jiffies g=165187832 c=165187831 q=947944) [Mon Oct 9 15:49:31 2017] Task dump for CPU 3: [Mon Oct 9 15:49:31 2017] rpc.svcgssd R running task 0 28629 1 0x00000008 [Mon Oct 9 15:49:31 2017] 0000000000000000 ffff88085fc63de8 ffffffff810648cb 0000000000000003 [Mon Oct 9 15:49:31 2017] ffffffff81a3f500 ffff88085fc63e00 ffffffff8106643b ffffffff81a3f500 [Mon Oct 9 15:49:31 2017] ffff88085fc63e30 ffffffff810816ad ffffffff81a3f500 ffff88085fc759c0 [Mon Oct 9 15:49:31 2017] Call Trace: [Mon Oct 9 15:49:31 2017] <IRQ> [<ffffffff810648cb>] sched_show_task+0xb9/0xbe [Mon Oct 9 15:49:31 2017] [<ffffffff8106643b>] dump_cpu_task+0x32/0x35 [Mon Oct 9 15:49:31 2017] [<ffffffff810816ad>] rcu_dump_cpu_stacks+0x71/0x8c [Mon Oct 9 15:49:32 2017] [<ffffffff810844d6>] rcu_check_callbacks+0x212/0x5a7 [Mon Oct 9 15:49:32 2017] [<ffffffff81066f65>] ? irqtime_account_process_tick.isra.3+0x100/0x10f [Mon Oct 9 15:49:32 2017] [<ffffffff810879d4>] update_process_times+0x2a/0x4f [Mon Oct 9 15:49:32 2017] [<ffffffff81092dae>] tick_sched_handle+0x32/0x3e [Mon Oct 9 15:49:32 2017] [<ffffffff810932de>] tick_sched_timer+0x34/0x5a [Mon Oct 9 15:49:32 2017] [<ffffffff810880f6>] __hrtimer_run_queues+0xbe/0x19d [Mon Oct 9 15:49:32 2017] [<ffffffff81088564>] hrtimer_interrupt+0x5f/0xff [Mon Oct 9 15:49:32 2017] [<ffffffff8102cafd>] local_apic_timer_interrupt+0x46/0x49 [Mon Oct 9 15:49:32 2017] [<ffffffff8102cf89>] smp_apic_timer_interrupt+0x29/0x39 [Mon Oct 9 15:49:32 2017] [<ffffffff81482d2f>] apic_timer_interrupt+0x7f/0x90 [Mon Oct 9 15:49:32 2017] <EOI> [<ffffffffa02d19d1>] ? qword_addhex+0x2b/0x7e [sunrpc] [Mon Oct 9 15:49:32 2017] [<ffffffffa0320f87>] rsi_request+0x36/0x45 [auth_rpcgss] [Mon Oct 9 15:49:32 2017] [<ffffffffa02d3550>] cache_read.isra.17+0x104/0x2a5 [sunrpc] [Mon Oct 9 15:49:33 2017] [<ffffffffa02d3721>] cache_read_procfs+0x30/0x35 [sunrpc] [Mon Oct 9 15:49:33 2017] [<ffffffff8117733a>] proc_reg_read+0x4e/0x63 [Mon Oct 9 15:49:33 2017] [<ffffffff8112ab9f>] __vfs_read+0x21/0x98 [Mon Oct 9 15:49:33 2017] [<ffffffff811ef15c>] ? fsnotify_perm+0x60/0x6c [Mon Oct 9 15:49:33 2017] [<ffffffff811efef7>] ? security_file_permission+0x36/0x3d [Mon Oct 9 15:49:33 2017] [<ffffffff8112b2b8>] ? rw_verify_area+0x7c/0xd1 [Mon Oct 9 15:49:33 2017] [<ffffffff8112b393>] vfs_read+0x86/0xc2 [Mon Oct 9 15:49:33 2017] [<ffffffff8112bb7a>] SyS_read+0x48/0x73 [Mon Oct 9 15:49:33 2017] [<ffffffff81482017>] entry_SYSCALL_64_fastpath+0x12/0x66 [Mon Oct 9 15:52:15 2017] INFO: rcu_sched self-detected stall on CPU [Mon Oct 9 15:52:15 2017] INFO: rcu_sched detected stalls on CPUs/tasks: [Mon Oct 9 15:52:15 2017] 6-...: (6000 ticks this GP) idle=64d/140000000000001/0 softirq=203638013/203638013 fqs=5905 [Mon Oct 9 15:52:15 2017] [Mon Oct 9 15:52:15 2017] (detected by 1, t=6002 jiffies, g=165187841, c=165187840, q=932166) [Mon Oct 9 15:52:15 2017] Task dump for CPU 6: [Mon Oct 9 15:52:15 2017] rpc.svcgssd R [Mon Oct 9 15:52:15 2017] running task [Mon Oct 9 15:52:15 2017] 0 28629 1 0x00000008 [Mon Oct 9 15:52:15 2017] ffff88085f8001c0 [Mon Oct 9 15:52:15 2017] 000000018010000c [Mon Oct 9 15:52:15 2017] ffff880d00000003 [Mon Oct 9 15:52:15 2017] ffff880f9deb3200 [Mon Oct 9 15:52:15 2017] 0000000100010000 [Mon Oct 9 15:52:15 2017] 0000000180190011 [Mon Oct 9 15:52:15 2017] 0011000100000001 [Mon Oct 9 15:52:15 2017] 0000000180190012 [Mon Oct 9 15:52:15 2017] 0000000a0000000f [Mon Oct 9 15:52:15 2017] 0000000000000246 [Mon Oct 9 15:52:15 2017] 0000000000000246 [Mon Oct 9 15:52:15 2017] ffff881036f1fa60 [Mon Oct 9 15:52:15 2017] Call Trace: [Mon Oct 9 15:52:15 2017] [<ffffffff81070c2e>] ? add_wait_queue+0x3f/0x44 [Mon Oct 9 15:52:15 2017] [<ffffffff811399e6>] ? __pollwait+0xbe/0xc7 [Mon Oct 9 15:52:15 2017] [<ffffffff8112c5b8>] ? fput+0x11/0x83 [Mon Oct 9 15:52:15 2017] [<ffffffff81139c92>] ? poll_freewait+0x43/0x94 [Mon Oct 9 15:52:15 2017] [<ffffffff8113ac96>] ? do_sys_poll+0x3f6/0x476 [Mon Oct 9 15:52:15 2017] [<ffffffff810e557b>] ? free_pages_prepare+0x81/0xdf [Mon Oct 9 15:52:15 2017] [<ffffffff810e76ee>] ? __free_pages+0x16/0x1f [Mon Oct 9 15:52:15 2017] [<ffffffff810e7813>] ? __free_kmem_pages+0x9/0xb [Mon Oct 9 15:52:15 2017] [<ffffffff81120036>] ? __free_slab+0x120/0x12c [Mon Oct 9 15:52:15 2017] [<ffffffff81120087>] ? discard_slab+0x45/0x47 [Mon Oct 9 15:52:15 2017] [<ffffffff811203fb>] ? unfreeze_partials.isra.65+0x10a/0x124 [Mon Oct 9 15:52:15 2017] [<ffffffff8106804b>] ? __enqueue_entity+0x67/0x69 [Mon Oct 9 15:52:15 2017] [<ffffffff8106ac25>] ? enqueue_entity+0x55d/0x6c6 [Mon Oct 9 15:52:15 2017] [<ffffffff8112054e>] ? put_cpu_partial+0x53/0xce [Mon Oct 9 15:52:15 2017] [<ffffffff81121a90>] ? __slab_free+0x1cf/0x2a3 [Mon Oct 9 15:52:15 2017] [<ffffffffa02d19fd>] ? qword_addhex+0x57/0x7e [sunrpc] [Mon Oct 9 15:52:15 2017] [<ffffffffa0320f87>] ? rsi_request+0x36/0x45 [auth_rpcgss] [Mon Oct 9 15:52:15 2017] [<ffffffffa02d3550>] ? cache_read.isra.17+0x104/0x2a5 [sunrpc] [Mon Oct 9 15:52:15 2017] [<ffffffffa02d3721>] ? cache_read_procfs+0x30/0x35 [sunrpc] [Mon Oct 9 15:52:15 2017] [<ffffffff8117733a>] ? proc_reg_read+0x4e/0x63 [Mon Oct 9 15:52:15 2017] [<ffffffff8112ab9f>] ? __vfs_read+0x21/0x98 [Mon Oct 9 15:52:15 2017] [<ffffffff811ef15c>] ? fsnotify_perm+0x60/0x6c [Mon Oct 9 15:52:15 2017] [<ffffffff811efef7>] ? security_file_permission+0x36/0x3d [Mon Oct 9 15:52:15 2017] [<ffffffff8112b2b8>] ? rw_verify_area+0x7c/0xd1 [Mon Oct 9 15:52:15 2017] [<ffffffff8112b393>] ? vfs_read+0x86/0xc2 [Mon Oct 9 15:52:15 2017] [<ffffffff8112bb7a>] ? SyS_read+0x48/0x73 [Mon Oct 9 15:52:15 2017] [<ffffffff81482017>] ? entry_SYSCALL_64_fastpath+0x12/0x66 [Mon Oct 9 15:52:18 2017] 6-...: (6000 ticks this GP) idle=64d/140000000000001/0 softirq=203638013/203638013 fqs=6172 [Mon Oct 9 15:52:18 2017] (t=6287 jiffies g=165187841 c=165187840 q=971357) [Mon Oct 9 15:52:18 2017] Task dump for CPU 6: [Mon Oct 9 15:52:18 2017] rpc.svcgssd R running task 0 28629 1 0x00000008 [Mon Oct 9 15:52:18 2017] 0000000000000000 ffff88107fc03de8 ffffffff810648cb 0000000000000006 [Mon Oct 9 15:52:18 2017] ffffffff81a3f500 ffff88107fc03e00 ffffffff8106643b ffffffff81a3f500 [Mon Oct 9 15:52:18 2017] ffff88107fc03e30 ffffffff810816ad ffffffff81a3f500 ffff88107fc159c0 [Mon Oct 9 15:52:18 2017] Call Trace: [Mon Oct 9 15:52:18 2017] <IRQ> [<ffffffff810648cb>] sched_show_task+0xb9/0xbe [Mon Oct 9 15:52:18 2017] [<ffffffff8106643b>] dump_cpu_task+0x32/0x35 [Mon Oct 9 15:52:18 2017] [<ffffffff810816ad>] rcu_dump_cpu_stacks+0x71/0x8c [Mon Oct 9 15:52:19 2017] [<ffffffff810844d6>] rcu_check_callbacks+0x212/0x5a7 [Mon Oct 9 15:52:19 2017] [<ffffffff81066f65>] ? irqtime_account_process_tick.isra.3+0x100/0x10f [Mon Oct 9 15:52:19 2017] [<ffffffff810879d4>] update_process_times+0x2a/0x4f [Mon Oct 9 15:52:19 2017] [<ffffffff81092dae>] tick_sched_handle+0x32/0x3e [Mon Oct 9 15:52:19 2017] [<ffffffff810932de>] tick_sched_timer+0x34/0x5a [Mon Oct 9 15:52:19 2017] [<ffffffff810880f6>] __hrtimer_run_queues+0xbe/0x19d [Mon Oct 9 15:52:19 2017] [<ffffffff81088564>] hrtimer_interrupt+0x5f/0xff [Mon Oct 9 15:52:19 2017] [<ffffffff8102cafd>] local_apic_timer_interrupt+0x46/0x49 [Mon Oct 9 15:52:19 2017] [<ffffffff8102cf89>] smp_apic_timer_interrupt+0x29/0x39 [Mon Oct 9 15:52:19 2017] [<ffffffff81482d2f>] apic_timer_interrupt+0x7f/0x90 [Mon Oct 9 15:52:19 2017] <EOI> [<ffffffffa02d19fd>] ? qword_addhex+0x57/0x7e [sunrpc] [Mon Oct 9 15:52:19 2017] [<ffffffffa0320f87>] rsi_request+0x36/0x45 [auth_rpcgss] [Mon Oct 9 15:52:19 2017] [<ffffffffa02d3550>] cache_read.isra.17+0x104/0x2a5 [sunrpc] [Mon Oct 9 15:52:20 2017] [<ffffffffa02d3721>] cache_read_procfs+0x30/0x35 [sunrpc] [Mon Oct 9 15:52:20 2017] [<ffffffff8117733a>] proc_reg_read+0x4e/0x63 [Mon Oct 9 15:52:20 2017] [<ffffffff8112ab9f>] __vfs_read+0x21/0x98 [Mon Oct 9 15:52:20 2017] [<ffffffff811ef15c>] ? fsnotify_perm+0x60/0x6c [Mon Oct 9 15:52:20 2017] [<ffffffff811efef7>] ? security_file_permission+0x36/0x3d [Mon Oct 9 15:52:20 2017] [<ffffffff8112b2b8>] ? rw_verify_area+0x7c/0xd1 [Mon Oct 9 15:52:20 2017] [<ffffffff8112b393>] vfs_read+0x86/0xc2 [Mon Oct 9 15:52:20 2017] [<ffffffff8112bb7a>] SyS_read+0x48/0x73 [Mon Oct 9 15:52:20 2017] [<ffffffff81482017>] entry_SYSCALL_64_fastpath+0x12/0x66 [Mon Oct 9 15:55:15 2017] INFO: rcu_sched self-detected stall on CPU [Mon Oct 9 15:55:15 2017] INFO: rcu_sched detected stalls on CPUs/tasks: [Mon Oct 9 15:55:15 2017] 6-...: (23485 ticks this GP) idle=64d/140000000000001/0 softirq=203638013/203638013 fqs=19517 [Mon Oct 9 15:55:15 2017] [Mon Oct 9 15:55:15 2017] (detected by 1, t=24007 jiffies, g=165187841, c=165187840, q=3544185) [Mon Oct 9 15:55:15 2017] Task dump for CPU 6: [Mon Oct 9 15:55:15 2017] rpc.svcgssd R [Mon Oct 9 15:55:15 2017] running task [Mon Oct 9 15:55:15 2017] 0 28629 1 0x00000008 [Mon Oct 9 15:55:15 2017] ffff88085f8001c0 [Mon Oct 9 15:55:15 2017] 000000018010000c [Mon Oct 9 15:55:15 2017] ffff880d00000003 [Mon Oct 9 15:55:15 2017] ffff880f9deb3200 [Mon Oct 9 15:55:15 2017] 0000000100010000 [Mon Oct 9 15:55:15 2017] 0000000180190011 [Mon Oct 9 15:55:15 2017] 0011000100000001 [Mon Oct 9 15:55:15 2017] 0000000180190012 [Mon Oct 9 15:55:15 2017] 0000000a0000000f [Mon Oct 9 15:55:15 2017] 0000000000000246 [Mon Oct 9 15:55:15 2017] 0000000000000246 [Mon Oct 9 15:55:15 2017] ffff881036f1fa60 [Mon Oct 9 15:55:15 2017] Call Trace: [Mon Oct 9 15:55:15 2017] [<ffffffff81070c2e>] ? add_wait_queue+0x3f/0x44 [Mon Oct 9 15:55:15 2017] [<ffffffff811399e6>] ? __pollwait+0xbe/0xc7 [Mon Oct 9 15:55:15 2017] [<ffffffff8112c5b8>] ? fput+0x11/0x83 [Mon Oct 9 15:55:15 2017] [<ffffffff81139c92>] ? poll_freewait+0x43/0x94 [Mon Oct 9 15:55:15 2017] [<ffffffff8113ac96>] ? do_sys_poll+0x3f6/0x476 [Mon Oct 9 15:55:15 2017] [<ffffffff810e557b>] ? free_pages_prepare+0x81/0xdf [Mon Oct 9 15:55:15 2017] [<ffffffff810e76ee>] ? __free_pages+0x16/0x1f [Mon Oct 9 15:55:15 2017] [<ffffffff810e7813>] ? __free_kmem_pages+0x9/0xb [Mon Oct 9 15:55:15 2017] [<ffffffff81120036>] ? __free_slab+0x120/0x12c [Mon Oct 9 15:55:15 2017] [<ffffffff81120087>] ? discard_slab+0x45/0x47 [Mon Oct 9 15:55:15 2017] [<ffffffff811203fb>] ? unfreeze_partials.isra.65+0x10a/0x124 [Mon Oct 9 15:55:15 2017] [<ffffffff8106804b>] ? __enqueue_entity+0x67/0x69 [Mon Oct 9 15:55:15 2017] [<ffffffff8106ac25>] ? enqueue_entity+0x55d/0x6c6 [Mon Oct 9 15:55:15 2017] [<ffffffff8112054e>] ? put_cpu_partial+0x53/0xce [Mon Oct 9 15:55:15 2017] [<ffffffff81121a90>] ? __slab_free+0x1cf/0x2a3 [Mon Oct 9 15:55:15 2017] [<ffffffffa02d19f2>] ? qword_addhex+0x4c/0x7e [sunrpc] [Mon Oct 9 15:55:15 2017] [<ffffffffa0320f87>] ? rsi_request+0x36/0x45 [auth_rpcgss] [Mon Oct 9 15:55:15 2017] [<ffffffffa02d3550>] ? cache_read.isra.17+0x104/0x2a5 [sunrpc] [Mon Oct 9 15:55:15 2017] [<ffffffffa02d3721>] ? cache_read_procfs+0x30/0x35 [sunrpc] [Mon Oct 9 15:55:15 2017] [<ffffffff8117733a>] ? proc_reg_read+0x4e/0x63 [Mon Oct 9 15:55:15 2017] [<ffffffff8112ab9f>] ? __vfs_read+0x21/0x98 [Mon Oct 9 15:55:15 2017] [<ffffffff811ef15c>] ? fsnotify_perm+0x60/0x6c [Mon Oct 9 15:55:15 2017] [<ffffffff811efef7>] ? security_file_permission+0x36/0x3d [Mon Oct 9 15:55:15 2017] [<ffffffff8112b2b8>] ? rw_verify_area+0x7c/0xd1 [Mon Oct 9 15:55:15 2017] [<ffffffff8112b393>] ? vfs_read+0x86/0xc2 [Mon Oct 9 15:55:15 2017] [<ffffffff8112bb7a>] ? SyS_read+0x48/0x73 [Mon Oct 9 15:55:15 2017] [<ffffffff81482017>] ? entry_SYSCALL_64_fastpath+0x12/0x66 and then many more of the same. So this is a long-standing bug, not a regression. It also died with 4.13.5; I thought I'd done CONFIG_PREEMPT=y, but evidently it hadn't gone through to the .config. An interesting data point: Our Kerberos setup is Active Directory with Samba 4. We have a long-standing issue where suddenly some of the service principals are getting renewed even though they shouldn't, and yesterday, this hang coincided with such an incident. Could it be related? Hi; We're now running 4.13.5 with CONFIG_PREEMPT=y. Today, the WARN_ONCE you asked me to add triggered, and rpc.svcgssd is using 100% CPU. Please advise. :-) [157277.972046] ------------[ cut here ]------------ [157277.976866] WARNING: CPU: 0 PID: 3193 at net/sunrpc/cache.c:854 cache_read.isra.19+0x27e/0x2b2 [sunrpc] [157277.991237] Modules linked in: cts rpcsec_gss_krb5 binfmt_misc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 af_packet ip6_tables xt_REDIRECT nf_nat_redirect iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd iTCO_wdt glue_helper iTCO_vendor_support tpm_tis tpm_tis_core ipmi_si ipmi_devintf tpm ipmi_msghandler button nfsd nfs_acl lockd grace auth_rpcgss oid_registry loop sunrpc ip_tables x_tables autofs4 raid10 raid0 multipath linear mpt3sas raid_class raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 sd_mod hid_generic igb i2c_i801 i2c_algo_bit ehci_pci crc32c_intel ptp i2c_core ehci_hcd [157278.062193] isci pps_core [157278.066409] CPU: 0 PID: 3193 Comm: rpc.svcgssd Not tainted 4.13.5-itk02 #4 [157278.073968] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 03/27/2013 [157278.081168] task: ffff8810596f1d00 task.stack: ffffc90008494000 [157278.095199] RIP: 0010:cache_read.isra.19+0x27e/0x2b2 [sunrpc] [157278.101165] RSP: 0018:ffffc90008497d90 EFLAGS: 00010202 [157278.107223] RAX: 0000000080000000 RBX: 00000000fffffff5 RCX: 0000000000000001 [157278.115162] RDX: 0000000000000001 RSI: ffffc90008497db4 RDI: 00000000ffffffff [157278.122753] RBP: ffffc90008497de8 R08: ffff880242af9fff R09: 0000000000000039 [157278.130828] R10: 0000000000000039 R11: ffff8810596f1d00 R12: 0000000000008000 [157278.138857] R13: ffff88085bbd6100 R14: ffff8801072f7f00 R15: ffff88085a85bae0 [157278.146876] FS: 00007f63e774cb80(0000) GS:ffff88085fa00000(0000) knlGS:0000000000000000 [157278.158522] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [157278.165037] CR2: 00007fbc6498b000 CR3: 000000105a757000 CR4: 00000000000406f0 [157278.172836] Call Trace: [157278.176126] cache_read_procfs+0x30/0x35 [sunrpc] [157278.181469] proc_reg_read+0x4e/0x63 [157278.190331] __vfs_read+0x21/0x98 [157278.194563] ? fsnotify_perm+0x60/0x6c [157278.199088] ? security_file_permission+0x36/0x3d [157278.204951] ? rw_verify_area+0x71/0xaa [157278.209691] vfs_read+0x9f/0xd7 [157278.213918] SyS_read+0x48/0x73 [157278.217626] entry_SYSCALL_64_fastpath+0x13/0x94 [157278.222501] RIP: 0033:0x7f63e64fe700 [157278.226987] RSP: 002b:00007fff08c6f2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [157278.236390] RAX: ffffffffffffffda RBX: 0000557566a2d620 RCX: 00007f63e64fe700 [157278.249255] RDX: 0000000000008000 RSI: 00007fff08c71410 RDI: 0000000000000003 [157278.257809] RBP: 00007f63e67b9440 R08: 0000557566a1bdd0 R09: 00007f63e621ff00 [157278.266484] R10: 0000000000000020 R11: 0000000000000246 R12: 0000000000000000 [157278.275193] R13: 00007fff08c6f0c0 R14: 000000001443c51d R15: 0000000000000000 [157278.283142] Code: 85 fb e0 4c 89 f7 e8 63 85 fb e0 eb 0c 48 c7 c7 a0 25 1a a0 e8 23 f8 36 e1 83 fb f5 75 14 ff 45 bc 83 7d bc 64 0f 8e d5 fd ff ff <0f> ff e9 ce fd ff ff 48 8b 7d c0 e8 36 27 ef e0 48 63 c3 85 db [157278.303181] ---[ end trace 70b707cc726a612d ]--- This is the file it's stuck on, I suppose (according to lsof): rpc.svcgs 3193 root 3u REG 0,4 0 4026532487 /proc/3193/net/rpc/auth.rpcsec.init/channel I suppose the backtrace from perf isn't so interesting (I can't attach with gdb, so I thought it was an OK way to get out a stack trace), but just in case: - 100,00% 0,00% rpc.svcgssd rpc.svcgssd [.] handle_nullreq ▒ handle_nullreq ▒ __GI___libc_read ▒ entry_SYSCALL_64_fastpath ▒ sys_read ▒ vfs_read ▒ __vfs_read ▒ proc_reg_read ▒ - cache_read_procfs ▒ - 99,87% cache_read.isra.19 ▒ - 98,92% rsi_request ▒ 98,81% qword_addhex (In reply to Steinar H. Gunderson from comment #16) > We're now running 4.13.5 with CONFIG_PREEMPT=y. Today, the WARN_ONCE you > asked me to add triggered, and rpc.svcgssd is using 100% CPU. Please advise. > :-) So you were running with the patch from Comment 12? > So you were running with the patch from Comment 12?
Yes.
We eventually booted back into 4.4 today, since the looping kernel thread was making I/O unreasonably slow; at least it was more stable last time.
It happened again today (rpc.svcgssd going unkillable to 100% CPU, server had to be rebooted)—we've had to upgrade to 4.15.0-rc9 to get Meltdown/Spectre patches, so we know that the bug is still there. Interestingly enough, it appears to be correlated with a specific user logging in on a workstation (triggering automount mounting a given partition over Kerberized NFSv4). This user belongs to a lot of groups (214 of them!) -- could that be a clue? Sorry, just noticed this was left dangling. Is this still happening? (In reply to Steinar H. Gunderson from comment #20) > Interestingly enough, it appears to be correlated with a specific user > logging in on a workstation (triggering automount mounting a given partition > over Kerberized NFSv4). This user belongs to a lot of groups (214 of them!) > -- could that be a clue? That could be making the init_sec_context tokens larger than expected, I guess, I seem to remember that could be the case depending on the kdc. (What is your kdc?) It stopped happening after we switched to gssproxy. (It still happens if we don't use it.) Like I said in one of the first emails, the KDC is Samba 4. Both Active Directory and FreeIPA use PACs to pass group membership information in the kerberos ticket. That is known tto blow up the size of the init_sec_context... Samba would, of course, be providing Active Directory. Yes, and the switch to gss-proxy was made in part to fix our inability to handle the large init_sec_context tokens generated by Active Directory for users in lots of groups. Still, this RCU warning shouldn't be the result.... Looks like a too-large token should hit the SVC_DENIED case in gss_read_verf and prevent further processing, I wonder why that's not happening. There may still be some bug here for all I know, but perhaps it's just not a priority after the switch to gss-proxy. Reopen if it becomes an issue again. |