Bug 194589

Summary: loadkeys causes BUG: unable to handle kernel paging request in __memmove
Product: Other Reporter: Klaus Kusche (klaus.kusche)
Component: OtherAssignee: other_other
Status: NEW ---    
Severity: normal CC: slyfox
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: Additional locking for vt code

Description Klaus Kusche 2017-02-14 17:20:11 UTC
I've no idea if this is related to the standard kernel, to the loadkeys utility,
or to the grsec kernel patch (this is on a grsec kernel on Gentoo, 
I've no standard kernel here) - please advise where this bug should go:

At about every third boot, a loadkeys call causes the following bug:

Feb 13 08:35:03 lap systemd-vconsole-setup[226]: /usr/bin/loadkeys terminated by signal KILL.
Feb 13 08:35:03 lap kernel: BUG: unable to handle kernel paging request at ffff88082e000000
Feb 13 08:35:03 lap kernel: IP: [<ffffffff81385874>] __memmove+0x24/0x1b0
Feb 13 08:35:03 lap kernel: PGD 8000000001f2e063 
Feb 13 08:35:03 lap kernel: PUD 2538063 
Feb 13 08:35:03 lap kernel: PMD 0 
Feb 13 08:35:03 lap kernel: 
Feb 13 08:35:03 lap kernel: Oops: 0000 [#1] PREEMPT SMP
Feb 13 08:35:03 lap kernel: CPU: 1 PID: 227 Comm: loadkeys Not tainted 4.9.8-hardened #1
Feb 13 08:35:03 lap kernel: Hardware name: Dell Inc. Precision M6700/0V2MFG, BIOS A16 10/13/2016
Feb 13 08:35:03 lap kernel: task: ffff880809b1d500 task.stack: ffffc90001144000
Feb 13 08:35:03 lap kernel: RIP: 0010:[<ffffffff81385874>]  [<ffffffff81385874>] __memmove+0x24/0x1b0
Feb 13 08:35:03 lap kernel: RSP: 0018:ffffc90001147cd0  EFLAGS: 00010246
Feb 13 08:35:03 lap kernel: RAX: ffff880807b38ad0 RBX: ffff880805c0d401 RCX: ffffffffd7f4fbfe
Feb 13 08:35:03 lap kernel: RDX: fffffffffe41712e RSI: ffff88082e000000 RDI: ffff88082e000000
Feb 13 08:35:03 lap kernel: RBP: ffff880805c0d400 R08: 0000000000000000 R09: 0000000000000212
Feb 13 08:35:03 lap kernel: R10: 0000000000000000 R11: ffff880809b1d500 R12: 8000000000000000
Feb 13 08:35:03 lap kernel: R13: ffff880807b38acb R14: 0000000000000073 R15: 0000000000000074
Feb 13 08:35:03 lap kernel: FS:  000003eb3c5fb700(0000) GS:ffff88082dc40000(0000) knlGS:0000000000000000
Feb 13 08:35:03 lap kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 13 08:35:03 lap kernel: CR2: ffff88082e000000 CR3: 0000000001f20000 CR4: 00000000001606f0
Feb 13 08:35:03 lap kernel: Stack:
Feb 13 08:35:03 lap kernel:  ffffffff8148914f ffff880807b38ad0 0000000000000000 000003f500000000
Feb 13 08:35:03 lap kernel:  ffff880807cea800 ffff88080a011000 000003f5602ccfd0 ffff880807cea800
Feb 13 08:35:03 lap kernel:  0000000000000000 0000000000004b49 ffffffff814815b6 000003f5602ccfd0
Feb 13 08:35:03 lap kernel: Call Trace:
Feb 13 08:35:03 lap kernel:  [<ffffffff8148914f>] ? vt_do_kdgkb_ioctl+0x46f/0x5f0
Feb 13 08:35:03 lap kernel:  [<ffffffff814815b6>] ? vt_ioctl+0x976/0x1d60
Feb 13 08:35:03 lap kernel:  [<ffffffff8114642b>] ? generic_file_read_iter+0x6db/0x9c0
Feb 13 08:35:03 lap kernel:  [<ffffffff8146fadf>] ? tty_ioctl+0x3bf/0x12e0
Feb 13 08:35:03 lap kernel:  [<ffffffff811b2e13>] ? __vfs_read+0x123/0x190
Feb 13 08:35:03 lap kernel:  [<ffffffff811cf00b>] ? do_vfs_ioctl+0xab/0xa60
Feb 13 08:35:03 lap kernel:  [<ffffffff81127391>] ? __seccomp_filter+0x71/0x2f0
Feb 13 08:35:03 lap kernel:  [<ffffffff811cfa0f>] ? sys_ioctl+0x4f/0xb0
Feb 13 08:35:03 lap kernel:  [<ffffffff811cfa82>] ? rap_sys_ioctl+0x12/0x40
Feb 13 08:35:03 lap kernel:  [<ffffffff81001606>] ? do_syscall_64+0x76/0x2e0
Feb 13 08:35:03 lap kernel:  [<ffffffff81a744f9>] ? entry_SYSCALL64_slow_path+0x2d/0x2d
Feb 13 08:35:03 lap kernel: Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 13 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f af 00 00 00 48 89 d1 <f3> a4 4c
Feb 13 08:35:03 lap kernel: RIP  [<ffffffff81385874>] __memmove+0x24/0x1b0
Feb 13 08:35:03 lap kernel:  RSP <ffffc90001147cd0>
Feb 13 08:35:03 lap kernel: CR2: ffff88082e000000
Feb 13 08:35:03 lap kernel: ---[ end trace ef574ab91ab8998e ]---
Feb 13 08:35:03 lap systemd-udevd[203]: Process '/usr/lib/systemd/systemd-vconsole-setup' failed with exit code 1.

The keymap is not set.

This can happen for any keymap, it does not seem to depend on the actual keymap loaded: The same keymap sometimes works and sometimes fails.

It somehow seems to depend on some random effects during boot (memory layout?)
which then persists as long as the kernel is running:
During a single uptime, either all loadkeys calls succeed (60 % - 80 % of
all boots) or all fail in the way above (the remaining boots).

I've been hit by the problem for some months now (since 4.6 or 4.7 ?),
but it is possibly the same bug as observed in Fedora for 4.5:
https://bugzilla.redhat.com/show_bug.cgi?id=1334079
Comment 2 Klaus Kusche 2019-05-11 08:12:42 UTC
Created attachment 282721 [details]
Additional locking for vt code
Comment 3 Klaus Kusche 2019-05-11 08:13:03 UTC
When I initially reported this problem in 2017,
I received an almost immediate response from Brad Spengler.
He said that there is a race condition in the vt code causing this problem
and provided the patch attached.

That patch reliably fixed the problem for me,
and it still applies (I'm still using it for 5.0,
without any observable negative consequences).

I asked Brad Spengler to send his patch to the kernel maintainers,
but he didn't.