Bug 199965

Summary: amdgpu: unbind leaves stale references
Product: Drivers Reporter: Mateusz Lenik (mlen)
Component: Video(DRI - non Intel)Assignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: high CC: andrey.grodzovsky, harry.wentland
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.17.0 Tree: Mainline
Regression: No
Attachments: signature.asc
dmesg output with kobject debug during amdgpu unbind and rebind

Description Mateusz Lenik 2018-06-07 18:21:24 UTC
Reboot randomly fails on 4.17.0 due to memory management issues. Worked fine on 4.16.13

<4>[21100.397182] ------------[ cut here ]------------
<4>[21100.397185] kobject: '(null)' (0000000047d32b91): is not initialized, yet kobject_get() is being called.
<4>[21100.397209] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:593 kobject_get+0x21/0x32
<4>[21100.397211] Modules linked in:
<4>[21100.397215] CPU: 1 PID: 25848 Comm: reboot Not tainted 4.17.0-gentoo #2
<4>[21100.397217] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
<4>[21100.397219] RIP: 0010:kobject_get+0x21/0x32
<4>[21100.397220] RSP: 0018:ffffa6c6cd9d3db0 EFLAGS: 00010296
<4>[21100.397223] RAX: 0000000000000000 RBX: ffff8d6af5012da8 RCX: 0000000000000002
<4>[21100.397225] RDX: 0000000000000003 RSI: 0000000000000003 RDI: 00000000ffffffff
<4>[21100.397227] RBP: ffff8d6af3dc9800 R08: 0000baada7db872a R09: ffff8d69a1bc5cd8
<4>[21100.397228] R10: ffffa6c6cd9d3ce8 R11: ffffffffa7264f7d R12: ffff8d6af50099a0
<4>[21100.397230] R13: ffffffffa57dfb43 R14: ffff8d6af3dc8060 R15: 0000000000000000
<4>[21100.397232] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000) knlGS:0000000000000000
<4>[21100.397233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21100.397235] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4: 00000000003606e0
<4>[21100.397237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[21100.397238] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[21100.397240] Call Trace:
<4>[21100.397246]  get_device+0x16/0x1b
<4>[21100.397249]  device_shutdown+0x48/0x1a3
<4>[21100.397256]  kernel_restart+0xe/0x4d
<4>[21100.397259]  __do_sys_reboot+0x168/0x1c5
<4>[21100.397264]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.397266]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.397270]  ? cycles_2_ns+0x55/0x75
<4>[21100.397276]  ? task_work_run+0x63/0x8a
<4>[21100.397284]  ? _raw_spin_unlock_irq+0x2f/0x41
<4>[21100.397287]  ? task_work_run+0x63/0x8a
<4>[21100.397292]  do_syscall_64+0x5e/0x6c
<4>[21100.397295]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[21100.397298] RIP: 0033:0x7efef99840f7
<4>[21100.397299] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
<4>[21100.397303] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX: 00007efef99840f7
<4>[21100.397305] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
<4>[21100.397306] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09: 00007fff75e04439
<4>[21100.397308] R10: 00000000000002f4 R11: 0000000000000246 R12: 0000000000000002
<4>[21100.397310] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>[21100.397319] Code: f0 0f b1 0a 75 d5 48 89 f8 c3 48 85 ff 53 48 89 fb 74 24 f6 47 3c 01 75 14 48 8b 37 48 89 fa 48 c7 c7 98 3f 85 a5 e8 a5 70 4e ff <0f> 0b f0 ff 43 38 0f 88 03 f6 00 00 48 89 d8 5b c3 e9 88 1f 64 
<4>[21100.397399] ---[ end trace 734263fa4c996033 ]---
<3>[21100.397402] INFO: trying to register non-static key.
<3>[21100.398065] the code is fine but needs lockdep annotation.
<3>[21100.399007] turning off the locking correctness validator.
<4>[21100.399628] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W         4.17.0-gentoo #2
<4>[21100.400270] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
<4>[21100.400923] Call Trace:
<4>[21100.401573]  dump_stack+0x46/0x59
<4>[21100.402227]  register_lock_class+0x192/0x361
<4>[21100.402886]  __lock_acquire.isra.32+0x97/0x595
<4>[21100.403548]  lock_acquire+0x105/0x12e
<4>[21100.404212]  ? device_shutdown+0x89/0x1a3
<4>[21100.404879]  ? wake_up_klogd+0x4f/0x61
<4>[21100.405546]  ? device_shutdown+0x89/0x1a3
<4>[21100.406217]  __mutex_lock+0x78/0x3a9
<4>[21100.406892]  ? device_shutdown+0x89/0x1a3
<4>[21100.407571]  ? device_shutdown+0x78/0x1a3
<4>[21100.408251]  ? device_shutdown+0x89/0x1a3
<4>[21100.408928]  device_shutdown+0x89/0x1a3
<4>[21100.409612]  kernel_restart+0xe/0x4d
<4>[21100.410303]  __do_sys_reboot+0x168/0x1c5
<4>[21100.410996]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.411707]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.412398]  ? cycles_2_ns+0x55/0x75
<4>[21100.413127]  ? task_work_run+0x63/0x8a
<4>[21100.413818]  ? _raw_spin_unlock_irq+0x2f/0x41
<4>[21100.414510]  ? task_work_run+0x63/0x8a
<4>[21100.415205]  do_syscall_64+0x5e/0x6c
<4>[21100.415900]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[21100.416603] RIP: 0033:0x7efef99840f7
<4>[21100.417307] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
<4>[21100.418033] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX: 00007efef99840f7
<4>[21100.418769] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
<4>[21100.419510] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09: 00007fff75e04439
<4>[21100.420257] R10: 00000000000002f4 R11: 0000000000000246 R12: 0000000000000002
<4>[21100.421015] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>[21100.421789] ------------[ cut here ]------------
<4>[21100.422564] kobject: '(null)' (0000000047d32b91): is not initialized, yet kobject_put() is being called.
<4>[21100.423378] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:688 kobject_put+0x28/0x8f
<4>[21100.424186] Modules linked in:
<4>[21100.424995] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W         4.17.0-gentoo #2
<4>[21100.425829] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
<4>[21100.426683] RIP: 0010:kobject_put+0x28/0x8f
<4>[21100.427536] RSP: 0018:ffffa6c6cd9d3da8 EFLAGS: 00010286
<4>[21100.428399] RAX: 0000000000000000 RBX: ffff8d6af5012da8 RCX: 0000000000000001
<4>[21100.429278] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI: 00000000ffffffff
<4>[21100.430165] RBP: ffff8d6af3dc9800 R08: 0000000000000001 R09: 0000000000000000
<4>[21100.431057] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12: ffff8d6af5012d98
<4>[21100.431961] R13: ffffffffa57dfb43 R14: ffff8d6af3dc9860 R15: 0000000000000000
<4>[21100.432873] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000) knlGS:0000000000000000
<4>[21100.433808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21100.434745] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4: 00000000003606e0
<4>[21100.435697] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[21100.436663] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[21100.437618] Call Trace:
<4>[21100.438570]  device_shutdown+0x17d/0x1a3
<4>[21100.439530]  kernel_restart+0xe/0x4d
<4>[21100.440495]  __do_sys_reboot+0x168/0x1c5
<4>[21100.441467]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.442495]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.443466]  ? cycles_2_ns+0x55/0x75
<4>[21100.444436]  ? task_work_run+0x63/0x8a
<4>[21100.445400]  ? _raw_spin_unlock_irq+0x2f/0x41
<4>[21100.446357]  ? task_work_run+0x63/0x8a
<4>[21100.447305]  do_syscall_64+0x5e/0x6c
<4>[21100.448268]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[21100.449219] RIP: 0033:0x7efef99840f7
<4>[21100.450171] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
<4>[21100.451143] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX: 00007efef99840f7
<4>[21100.452126] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
<4>[21100.453115] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09: 00007fff75e04439
<4>[21100.454098] R10: 00000000000002f4 R11: 0000000000000246 R12: 0000000000000002
<4>[21100.455076] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>[21100.456056] Code: 41 5d c3 48 85 ff 0f 84 85 00 00 00 41 54 55 53 f6 47 3c 01 48 89 fb 75 14 48 8b 37 48 89 fa 48 c7 c7 f4 3f 85 a5 e8 f2 6f 4e ff <0f> 0b f0 ff 4b 38 0f 88 56 f5 00 00 75 53 8a 43 3c 4c 8b 63 28 
<4>[21100.457169] ---[ end trace 734263fa4c996034 ]---
<4>[21100.458244] ------------[ cut here ]------------
<4>[21100.459316] kobject: '(null)' (00000000359a1c66): is not initialized, yet kobject_get() is being called.
<4>[21100.460424] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:593 kobject_get+0x21/0x32
<4>[21100.461532] Modules linked in:
<4>[21100.462684] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W         4.17.0-gentoo #2
<4>[21100.463826] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
<4>[21100.464992] RIP: 0010:kobject_get+0x21/0x32
<4>[21100.466188] RSP: 0018:ffffa6c6cd9d3db0 EFLAGS: 00010296
<4>[21100.467349] RAX: 0000000000000000 RBX: ffff8d6af59e6da8 RCX: 0000000000000002
<4>[21100.468531] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI: 00000000ffffffff
<4>[21100.469724] RBP: ffff8d6af3dcc800 R08: 0000000000000001 R09: 0000000000000000
<4>[21100.470921] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12: ffff8d6af500c9a0
<4>[21100.472116] R13: ffffffffa57dfb43 R14: ffff8d6af3dce860 R15: 0000000000000000
<4>[21100.473316] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000) knlGS:0000000000000000
<4>[21100.474519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21100.475723] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4: 00000000003606e0
<4>[21100.476927] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[21100.478130] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[21100.479331] Call Trace:
<4>[21100.480514]  get_device+0x16/0x1b
<4>[21100.481684]  device_shutdown+0x48/0x1a3
<4>[21100.482844]  kernel_restart+0xe/0x4d
<4>[21100.484003]  __do_sys_reboot+0x168/0x1c5
<4>[21100.485173]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.486330]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.487480]  ? cycles_2_ns+0x55/0x75
<4>[21100.488629]  ? task_work_run+0x63/0x8a
<4>[21100.489779]  ? _raw_spin_unlock_irq+0x2f/0x41
<4>[21100.490929]  ? task_work_run+0x63/0x8a
<4>[21100.492078]  do_syscall_64+0x5e/0x6c
<4>[21100.493228]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[21100.494389] RIP: 0033:0x7efef99840f7
<4>[21100.495554] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
<4>[21100.496741] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX: 00007efef99840f7
<4>[21100.497935] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
<4>[21100.499136] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09: 00007fff75e04439
<4>[21100.500340] R10: 00000000000002f4 R11: 0000000000000246 R12: 0000000000000002
<4>[21100.501553] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>[21100.502767] Code: f0 0f b1 0a 75 d5 48 89 f8 c3 48 85 ff 53 48 89 fb 74 24 f6 47 3c 01 75 14 48 8b 37 48 89 fa 48 c7 c7 98 3f 85 a5 e8 a5 70 4e ff <0f> 0b f0 ff 43 38 0f 88 03 f6 00 00 48 89 d8 5b c3 e9 88 1f 64 
<4>[21100.504102] ---[ end trace 734263fa4c996035 ]---
<4>[21100.505453] ------------[ cut here ]------------
<4>[21100.506738] kobject: '(null)' (00000000359a1c66): is not initialized, yet kobject_put() is being called.
<4>[21100.508059] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:688 kobject_put+0x28/0x8f
<4>[21100.509373] Modules linked in:
<4>[21100.510680] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W         4.17.0-gentoo #2
<4>[21100.512002] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
<4>[21100.513331] RIP: 0010:kobject_put+0x28/0x8f
<4>[21100.514651] RSP: 0018:ffffa6c6cd9d3da8 EFLAGS: 00010286
<4>[21100.515977] RAX: 0000000000000000 RBX: ffff8d6af59e6da8 RCX: 0000000000000001
<4>[21100.517310] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI: 00000000ffffffff
<4>[21100.518650] RBP: ffff8d6af3dcc800 R08: 0000000000000001 R09: 0000000000000000
<4>[21100.519981] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12: ffff8d6af59e6d98
<4>[21100.521302] R13: ffffffffa57dfb43 R14: ffff8d6af3dcc860 R15: 0000000000000000
<4>[21100.522612] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000) knlGS:0000000000000000
<4>[21100.524028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21100.525337] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4: 00000000003606e0
<4>[21100.526642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[21100.527938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[21100.529229] Call Trace:
<4>[21100.530507]  device_shutdown+0x17d/0x1a3
<4>[21100.531787]  kernel_restart+0xe/0x4d
<4>[21100.533066]  __do_sys_reboot+0x168/0x1c5
<4>[21100.534341]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.535612]  ? sched_clock_cpu+0x10/0xb4
<4>[21100.536866]  ? cycles_2_ns+0x55/0x75
<4>[21100.538111]  ? task_work_run+0x63/0x8a
<4>[21100.539350]  ? _raw_spin_unlock_irq+0x2f/0x41
<4>[21100.540590]  ? task_work_run+0x63/0x8a
<4>[21100.541833]  do_syscall_64+0x5e/0x6c
<4>[21100.543066]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[21100.544359] RIP: 0033:0x7efef99840f7
<4>[21100.545597] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
<4>[21100.546853] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX: 00007efef99840f7
<4>[21100.548163] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
<4>[21100.549435] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09: 00007fff75e04439
<4>[21100.550711] R10: 00000000000002f4 R11: 0000000000000246 R12: 0000000000000002
<4>[21100.552044] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>[21100.553324] Code: 41 5d c3 48 85 ff 0f 84 85 00 00 00 41 54 55 53 f6 47 3c 01 48 89 fb 75 14 48 8b 37 48 89 fa 48 c7 c7 f4 3f 85 a5 e8 f2 6f 4e ff <0f> 0b f0 ff 4b 38 0f 88 56 f5 00 00 75 53 8a 43 3c 4c 8b 63 28 
<4>[21100.554724] ---[ end trace 734263fa4c996036 ]---
<5>[21100.556701] sd 5:0:0:0: [sdc] Synchronizing SCSI cache
<5>[21100.558348] sd 4:0:0:0: [sdb] Synchronizing SCSI cache
<5>[21100.560204] sd 2:0:0:0: [sda] Synchronizing SCSI cache
<6>[21100.563419] mlx4_core 0000:81:00.0: mlx4_shutdown was called
<7>[21100.566483] mlx4_en: enp129s0: Close port called
<4>[21102.258970] ------------[ cut here ]------------
<4>[21102.260698] kobject: '(null)' (0000000047d32b91): is not initialized, yet kobject_get() is being called.
<4>[21102.262150] WARNING: CPU: 2 PID: 25848 at lib/kobject.c:593 kobject_get+0x21/0x32
<4>[21102.263588] Modules linked in:
<4>[21102.264977] CPU: 2 PID: 25848 Comm: reboot Tainted: G        W         4.17.0-gentoo #2
<4>[21102.266344] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
<4>[21102.267785] RIP: 0010:kobject_get+0x21/0x32
<4>[21102.269148] RSP: 0018:ffffa6c6cd9d3db0 EFLAGS: 00010296
<4>[21102.270503] RAX: 0000000000000000 RBX: ffff8d6af5012da8 RCX: 0000000000000002
<4>[21102.271868] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI: 00000000ffffffff
<4>[21102.273211] RBP: ffff8d6af5012d98 R08: 0000000000000001 R09: 0000000000000000
<4>[21102.274541] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12: 0000000000000000
<4>[21102.275862] R13: ffffffffa57dfb43 R14: ffff8d6af5009a00 R15: 0000000000000000
<4>[21102.277179] FS:  00007efef9e42500(0000) GS:ffff8d6afda00000(0000) knlGS:0000000000000000
<4>[21102.278502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21102.279821] CR2: 00007fbd98235f68 CR3: 00000010277fc001 CR4: 00000000003606e0
<4>[21102.281171] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[21102.282492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[21102.283808] Call Trace:
<4>[21102.285111]  get_device+0x16/0x1b
<4>[21102.286403]  device_shutdown+0x53/0x1a3
<4>[21102.287688]  kernel_restart+0xe/0x4d
<4>[21102.288970]  __do_sys_reboot+0x168/0x1c5
<4>[21102.290255]  ? sched_clock_cpu+0x10/0xb4
<4>[21102.291543]  ? sched_clock_cpu+0x10/0xb4
<4>[21102.292806]  ? cycles_2_ns+0x55/0x75
<4>[21102.294069]  ? task_work_run+0x63/0x8a
<4>[21102.295334]  ? _raw_spin_unlock_irq+0x2f/0x41
<4>[21102.296598]  ? task_work_run+0x63/0x8a
<4>[21102.297862]  do_syscall_64+0x5e/0x6c
<4>[21102.299132]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[21102.300408] RIP: 0033:0x7efef99840f7
<4>[21102.301698] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
<4>[21102.302988] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX: 00007efef99840f7
<4>[21102.304285] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
<4>[21102.305587] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09: 00007fff75e04439
<4>[21102.306890] R10: 00000000000002f4 R11: 0000000000000246 R12: 0000000000000002
<4>[21102.308199] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>[21102.309497] Code: f0 0f b1 0a 75 d5 48 89 f8 c3 48 85 ff 53 48 89 fb 74 24 f6 47 3c 01 75 14 48 8b 37 48 89 fa 48 c7 c7 98 3f 85 a5 e8 a5 70 4e ff <0f> 0b f0 ff 43 38 0f 88 03 f6 00 00 48 89 d8 5b c3 e9 88 1f 64 
<4>[21102.310932] ---[ end trace 734263fa4c996037 ]---
<1>[21102.312296] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
<6>[21102.313668] PGD 0 P4D 0 
<4>[21102.315026] Oops: 0002 [#1] PREEMPT SMP PTI
<4>[21102.316392] Modules linked in:
<4>[21102.317714] CPU: 2 PID: 25848 Comm: reboot Tainted: G        W         4.17.0-gentoo #2
<4>[21102.319022] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
<4>[21102.320341] RIP: 0010:device_shutdown+0x5e/0x1a3
<4>[21102.321658] RSP: 0018:ffffa6c6cd9d3dc8 EFLAGS: 00010282
<4>[21102.322950] RAX: 0000000000000000 RBX: ffff8d6af5012db0 RCX: 0000000000000002
<4>[21102.324242] RDX: ffff8d6af95b8d80 RSI: 0000000000000001 RDI: 00000000ffffffff
<4>[21102.325531] RBP: ffff8d6af5012d98 R08: 0000000000000001 R09: 0000000000000000
<4>[21102.326807] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12: 0000000000000000
<4>[21102.328071] R13: ffffffffa57dfb43 R14: ffff8d6af5009a00 R15: 0000000000000000
<4>[21102.329335] FS:  00007efef9e42500(0000) GS:ffff8d6afda00000(0000) knlGS:0000000000000000
<4>[21102.330631] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21102.331909] CR2: 0000000000000000 CR3: 00000010277fc001 CR4: 00000000003606e0
<4>[21102.333233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[21102.334503] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[21102.335765] Call Trace:
<4>[21102.337012]  kernel_restart+0xe/0x4d
<4>[21102.338253]  __do_sys_reboot+0x168/0x1c5
<4>[21102.339489]  ? sched_clock_cpu+0x10/0xb4
<4>[21102.340719]  ? sched_clock_cpu+0x10/0xb4
<4>[21102.341961]  ? cycles_2_ns+0x55/0x75
<4>[21102.343176]  ? task_work_run+0x63/0x8a
<4>[21102.344392]  ? _raw_spin_unlock_irq+0x2f/0x41
<4>[21102.345617]  ? task_work_run+0x63/0x8a
<4>[21102.346835]  do_syscall_64+0x5e/0x6c
<4>[21102.348061]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4>[21102.349290] RIP: 0033:0x7efef99840f7
<4>[21102.350515] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
<4>[21102.351779] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX: 00007efef99840f7
<4>[21102.353031] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
<4>[21102.354287] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09: 00007fff75e04439
<4>[21102.355547] R10: 00000000000002f4 R11: 0000000000000246 R12: 0000000000000002
<4>[21102.356808] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
<4>[21102.358066] Code: 5b 01 00 00 48 8b 5f 08 48 8b 7b e8 48 8d 6b e8 e8 01 d7 ff ff 48 89 ef 49 89 c4 e8 f6 d6 ff ff 48 8b 43 08 48 8b 13 48 89 42 08 <48> 89 10 48 89 1b 48 8b 05 0e 0d bd 02 48 89 5b 08 48 8d 78 10 
<1>[21102.359437] RIP: device_shutdown+0x5e/0x1a3 RSP: ffffa6c6cd9d3dc8
<4>[21102.360749] CR2: 0000000000000000
<4>[21102.362070] ---[ end trace 734263fa4c996038 ]---
Comment 1 Mateusz Lenik 2018-06-07 20:48:30 UTC
Turns out it is not specific to reboot -- I was able to trigger it by attempting to change display brightness via ddcutil:

[10268.259294] ------------[ cut here ]------------
[10268.259300] kobject: '(null)' (000000003e6a2b0c): is not initialized, yet kobject_get() is being called.
[10268.259336] WARNING: CPU: 16 PID: 18955 at lib/kobject.c:593 kobject_get+0x21/0x32
[10268.259338] Modules linked in:
[10268.259344] CPU: 16 PID: 18955 Comm: ddcutil Not tainted 4.17.0-gentoo #4
[10268.259345] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
[10268.259348] RIP: 0010:kobject_get+0x21/0x32
[10268.259350] RSP: 0018:ffffb4350eda3c88 EFLAGS: 00010282
[10268.259353] RAX: 0000000000000000 RBX: ffffa311354e0da8 RCX: 0000000000000001
[10268.259354] RDX: 0000000000000000 RSI: ffffffff8fa0ed9e RDI: 00000000ffffffff
[10268.259356] RBP: ffffa31134eb5220 R08: 000014a37cdcdb24 R09: ffffa320b42332d8
[10268.259358] R10: ffffa3113cc14998 R11: ffffffff91464f7d R12: ffffa320c5b8f840
[10268.259359] R13: ffffa3113351c6a8 R14: ffffa320c5b8f840 R15: ffffa320c5b8f850
[10268.259361] FS:  00007ff0a5b91280(0000) GS:ffffa3213da00000(0000) knlGS:0000000000000000
[10268.259363] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10268.259365] CR2: 00007ff0a4bb141d CR3: 0000001f31bc8004 CR4: 00000000003606e0
[10268.259366] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10268.259368] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10268.259369] Call Trace:
[10268.259381]  get_device+0x16/0x1b
[10268.259387]  i2c_get_adapter+0x49/0x5a
[10268.259391]  i2cdev_open+0x1a/0x82
[10268.259398]  chrdev_open+0x155/0x18b
[10268.259403]  ? cdev_put+0x1e/0x1e
[10268.259406]  do_dentry_open+0x1a8/0x279
[10268.259414]  path_openat+0x54c/0x6bd
[10268.259421]  do_filp_open+0x5c/0xc6
[10268.259437]  ? __alloc_fd+0x135/0x147
[10268.259442]  ? do_sys_open+0x79/0x113
[10268.259444]  do_sys_open+0x79/0x113
[10268.259454]  do_syscall_64+0x5e/0x6c
[10268.259459]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10268.259463] RIP: 0033:0x7ff0a4c526de
[10268.259464] RSP: 002b:00007ffe032bdd20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[10268.259467] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007ff0a4c526de
[10268.259469] RDX: 0000000000000002 RSI: 00007ffe032bdd90 RDI: 00000000ffffff9c
[10268.259471] RBP: 0000000000000080 R08: 001f7107eb7b76bb R09: 00007ffe033db080
[10268.259472] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe032bdd90
[10268.259474] R13: 1535fb7c757887eb R14: 000000000000000a R15: 000055700ef14788
[10268.259483] Code: f0 0f b1 0a 75 d5 48 89 f8 c3 48 85 ff 53 48 89 fb 74 24 f6 47 3c 01 75 14 48 8b 37 48 89 fa 48 c7 c7 d8 ef af 8f e8 a5 70 4e ff <0f> 0b f0 ff 43 38 0f 88 03 f6 00 00 48 89 d8 5b c3 e9 88 1f 64 
[10268.259581] ---[ end trace 764a561e76f5c6d4 ]---
[10268.259640] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[10268.259665] PGD 0 P4D 0 
[10268.259685] Oops: 0000 [#1] PREEMPT SMP PTI
[10268.259693] Modules linked in:
[10268.259705] CPU: 16 PID: 18955 Comm: ddcutil Tainted: G        W         4.17.0-gentoo #4
[10268.259711] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
[10268.259719] RIP: 0010:i2c_transfer+0x14/0x98
[10268.259726] RSP: 0018:ffffb4350eda3dc8 EFLAGS: 00010246
[10268.259732] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[10268.259734] RDX: 0000000000000001 RSI: ffffb4350eda3df0 RDI: ffffa311354e0bf8
[10268.259736] RBP: 00000000ffffffa1 R08: 0000000000000000 R09: 0000000000000008
[10268.259738] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3213687a800
[10268.259740] R13: 00007ffe032bdda7 R14: ffffb4350eda3f00 R15: ffffa320b4232a00
[10268.259743] FS:  00007ff0a5b91280(0000) GS:ffffa3213da00000(0000) knlGS:0000000000000000
[10268.259745] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10268.259748] CR2: 0000000000000000 CR3: 0000001f31bc8004 CR4: 00000000003606e0
[10268.259750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10268.259752] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10268.259754] Call Trace:
[10268.259758]  i2c_transfer_buffer_flags+0x4b/0x6e
[10268.259763]  i2cdev_write+0x48/0x5a
[10268.259767]  __vfs_write+0x33/0xd7
[10268.259771]  ? i2cdev_ioctl+0xfe/0x24d
[10268.259775]  ? vfs_ioctl+0x1e/0x2b
[10268.259778]  ? do_vfs_ioctl+0x537/0x55f
[10268.259784]  ? kmem_cache_free+0x16e/0x192
[10268.259789]  ? do_sys_open+0xec/0x113
[10268.259795]  vfs_write+0xa5/0xe2
[10268.259801]  ksys_write+0x5f/0xa3
[10268.259807]  do_syscall_64+0x5e/0x6c
[10268.259810]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10268.259814] RIP: 0033:0x7ff0a4c52c98
[10268.259818] RSP: 002b:00007ffe032bdd10 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[10268.259823] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff0a4c52c98
[10268.259825] RDX: 0000000000000001 RSI: 00007ffe032bdda7 RDI: 0000000000000003
[10268.259828] RBP: 00007ffe032bdda7 R08: 001f7666d2edcd47 R09: 00007ffe033db080
[10268.259830] R10: 00000000000008c5 R11: 0000000000000246 R12: 1535fb7c757de6d2
[10268.259837] R13: 0000000000000003 R14: 000000000000000a R15: 000055700ef14788
[10268.259846] Code: c9 48 c7 c2 a0 55 34 8e 48 c7 c6 40 4d c8 8f e8 6b ee 85 ff 31 c0 c3 0f 1f 44 00 00 41 55 41 54 55 bd a1 ff ff ff 53 48 8b 47 10 <48> 83 38 00 74 75 65 8b 05 d4 df 52 71 a9 ff ff ff 7f 41 89 d5 
[10268.259942] RIP: i2c_transfer+0x14/0x98 RSP: ffffb4350eda3dc8
[10268.259946] CR2: 0000000000000000
[10268.259962] ---[ end trace 764a561e76f5c6d5 ]---
[10268.732971] ------------[ cut here ]------------
[10268.732981] kobject: '(null)' (000000003e6a2b0c): is not initialized, yet kobject_put() is being called.
[10268.733003] WARNING: CPU: 16 PID: 18955 at lib/kobject.c:688 kobject_put+0x28/0x8f
[10268.733006] Modules linked in:
[10268.733014] CPU: 16 PID: 18955 Comm: ddcutil Tainted: G      D W         4.17.0-gentoo #4
[10268.733017] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
[10268.733022] RIP: 0010:kobject_put+0x28/0x8f
[10268.733026] RSP: 0018:ffffb4350eda3e30 EFLAGS: 00010292
[10268.733031] RAX: 0000000000000000 RBX: ffffa311354e0da8 RCX: 0000000000000001
[10268.733033] RDX: 0000000000000000 RSI: ffffffff8fa0ed9e RDI: 00000000ffffffff
[10268.733036] RBP: ffffa3213687a800 R08: 0000000000000001 R09: 0000000000000000
[10268.733058] R10: ffffa320c5b8f850 R11: ffffffff91464b47 R12: ffffa3113351c6a8
[10268.733062] R13: ffffa31134af1f40 R14: ffffa3112e927ba0 R15: ffffa3113351c6a8
[10268.733065] FS:  0000000000000000(0000) GS:ffffa3213da00000(0000) knlGS:0000000000000000
[10268.733068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10268.733070] CR2: 0000000000000000 CR3: 0000001e07624005 CR4: 00000000003606e0
[10268.733072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10268.733075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10268.733078] Call Trace:
[10268.733086]  i2c_put_adapter+0x1a/0x24
[10268.733093]  i2cdev_release+0x1a/0x32
[10268.733098]  __fput+0xe4/0x187
[10268.733107]  task_work_run+0x77/0x8a
[10268.733114]  do_exit+0x497/0x9a7
[10268.733120]  ? ksys_write+0x5f/0xa3
[10268.733127]  rewind_stack_do_exit+0x17/0x20
[10268.733135] Code: 41 5d c3 48 85 ff 0f 84 85 00 00 00 41 54 55 53 f6 47 3c 01 48 89 fb 75 14 48 8b 37 48 89 fa 48 c7 c7 34 f0 af 8f e8 f2 6f 4e ff <0f> 0b f0 ff 4b 38 0f 88 56 f5 00 00 75 53 8a 43 3c 4c 8b 63 28 
[10268.733259] ---[ end trace 764a561e76f5c6d6 ]---
Comment 2 Mateusz Lenik 2018-06-07 21:15:10 UTC
Looks like this is related to the unbind script I run to pass one of my gpus to vfio driver. I cannot do this via pci ids parameter, because I have two identical cards and I only want to pass a single one. I unbind the driver by running the following command:

  pci_ids=("0000:03:00.0" "0000:03:00.1")

  for id in "${pci_ids[@]}"; do
    vendor="$(cat "/sys/bus/pci/devices/$id/vendor")"
    device="$(cat "/sys/bus/pci/devices/$id/device")"

    if [ -e "/sys/bus/pci/devices/$id/driver/unbind" ]; then
      echo "Unbinding $id"
      echo "$id" >"/sys/bus/pci/devices/$id/driver/unbind"
    fi
  done

After unbinding, when I ran `ddcutil detect`, it got killed with the following log in dmesg:

[  127.594909] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  127.594916] PGD 0 P4D 0 
[  127.594921] Oops: 0000 [#1] PREEMPT SMP PTI
[  127.594924] Modules linked in:
[  127.594928] CPU: 22 PID: 4412 Comm: ddcutil Not tainted 4.17.0-gentoo #4
[  127.594930] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3407 03/10/2017
[  127.594936] RIP: 0010:dal_i2caux_submit_i2c_command+0x47/0x16f
[  127.594938] RSP: 0018:ffffa9a54a9ebcf8 EFLAGS: 00010297
[  127.594941] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[  127.594943] RDX: ffffa9a54a9ebd50 RSI: cd91c8f83cf3d822 RDI: 0000000000000000
[  127.594945] RBP: 0000000000000000 R08: 00000000014080c0 R09: ffff9bf6b71c5cd8
[  127.594947] R10: ffffa9a54a9ebc88 R11: 0000000000000000 R12: cd91c8f83cf3d822
[  127.594949] R13: ffffa9a54a9ebd50 R14: 0000000000000000 R15: 00000000fffd5e8d
[  127.594952] FS:  00007fd65f8f1280(0000) GS:ffff9bf6bee00000(0000) knlGS:0000000000000000
[  127.594954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  127.594956] CR2: 0000000000000008 CR3: 000000102a9e4004 CR4: 00000000003606e0
[  127.594958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  127.594960] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  127.594962] Call Trace:
[  127.594969]  ? __kmalloc+0x12a/0x13c
[  127.594974]  ? amdgpu_dm_i2c_xfer+0x4e/0x10a
[  127.594978]  amdgpu_dm_i2c_xfer+0xd7/0x10a
[  127.594983]  __i2c_transfer+0x289/0x3d5
[  127.594987]  i2c_transfer+0x78/0x98
[  127.594991]  i2c_transfer_buffer_flags+0x4b/0x6e
[  127.594994]  i2cdev_write+0x48/0x5a
[  127.595000]  __vfs_write+0x33/0xd7
[  127.595002]  ? i2cdev_ioctl+0xfe/0x24d
[  127.595007]  ? vfs_ioctl+0x1e/0x2b
[  127.595011]  ? do_vfs_ioctl+0x537/0x55f
[  127.595016]  ? kmem_cache_free+0x16e/0x192
[  127.595018]  ? do_sys_open+0xec/0x113
[  127.595023]  vfs_write+0xa5/0xe2
[  127.595027]  ksys_write+0x5f/0xa3
[  127.595032]  do_syscall_64+0x5e/0x6c
[  127.595036]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  127.595039] RIP: 0033:0x7fd65e9b2c98
[  127.595041] RSP: 002b:00007fff80f701c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  127.595044] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd65e9b2c98
[  127.595046] RDX: 0000000000000001 RSI: 00007fff80f70257 RDI: 0000000000000003
[  127.595048] RBP: 00007fff80f70257 R08: 000636d4b5ddea6e R09: 00007fff80f82080
[  127.595050] R10: 000000000001b8e8 R11: 0000000000000246 R12: 1535fca7c7d110b5
[  127.595052] R13: 0000000000000003 R14: 0000000000000000 R15: 000055a3bac42788
[  127.595057] Code: 89 44 24 28 31 c0 48 85 f6 75 07 0f 0b e9 1e 01 00 00 48 85 d2 49 89 d5 75 0a 0f 0b 45 31 e4 e9 0c 01 00 00 83 7a 0c 01 48 89 fd <48> 8b 47 08 77 1b 48 8b 40 08 e8 d2 2b 8e 00 48 85 c0 48 89 c3 
[  127.595107] RIP: dal_i2caux_submit_i2c_command+0x47/0x16f RSP: ffffa9a54a9ebcf8
[  127.595109] CR2: 0000000000000008
[  127.595112] ---[ end trace d6e7fee8a04eafda ]---


Looks like there are some stale references left after unbinding the driver.
Comment 3 Andrew Morton 2018-06-08 22:15:11 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 07 Jun 2018 18:21:24 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=199965
> 
>             Bug ID: 199965
>            Summary: Memory management: BUG in kernel_restart
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.17.0
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: mlen@mlen.pl
>         Regression: No
> 
> Reboot randomly fails on 4.17.0 due to memory management issues. Worked fine
> on
> 4.16.13

Oh gee, there isn't much to go on here.  Unknown kobject on
devices_kset() is in a crappy state during kernel restart.  Greg, is
there something we can do to make that kobject_get() warning more
informative?  Probably not.


> <4>[21100.397182] ------------[ cut here ]------------
> <4>[21100.397185] kobject: '(null)' (0000000047d32b91): is not initialized,
> yet
> kobject_get() is being called.
> <4>[21100.397209] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:593
> kobject_get+0x21/0x32
> <4>[21100.397211] Modules linked in:
> <4>[21100.397215] CPU: 1 PID: 25848 Comm: reboot Not tainted 4.17.0-gentoo #2
> <4>[21100.397217] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> WS, BIOS 3407 03/10/2017
> <4>[21100.397219] RIP: 0010:kobject_get+0x21/0x32
> <4>[21100.397220] RSP: 0018:ffffa6c6cd9d3db0 EFLAGS: 00010296
> <4>[21100.397223] RAX: 0000000000000000 RBX: ffff8d6af5012da8 RCX:
> 0000000000000002
> <4>[21100.397225] RDX: 0000000000000003 RSI: 0000000000000003 RDI:
> 00000000ffffffff
> <4>[21100.397227] RBP: ffff8d6af3dc9800 R08: 0000baada7db872a R09:
> ffff8d69a1bc5cd8
> <4>[21100.397228] R10: ffffa6c6cd9d3ce8 R11: ffffffffa7264f7d R12:
> ffff8d6af50099a0
> <4>[21100.397230] R13: ffffffffa57dfb43 R14: ffff8d6af3dc8060 R15:
> 0000000000000000
> <4>[21100.397232] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000)
> knlGS:0000000000000000
> <4>[21100.397233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[21100.397235] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4:
> 00000000003606e0
> <4>[21100.397237] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> <4>[21100.397238] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> <4>[21100.397240] Call Trace:
> <4>[21100.397246]  get_device+0x16/0x1b
> <4>[21100.397249]  device_shutdown+0x48/0x1a3
> <4>[21100.397256]  kernel_restart+0xe/0x4d
> <4>[21100.397259]  __do_sys_reboot+0x168/0x1c5
> <4>[21100.397264]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.397266]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.397270]  ? cycles_2_ns+0x55/0x75
> <4>[21100.397276]  ? task_work_run+0x63/0x8a
> <4>[21100.397284]  ? _raw_spin_unlock_irq+0x2f/0x41
> <4>[21100.397287]  ? task_work_run+0x63/0x8a
> <4>[21100.397292]  do_syscall_64+0x5e/0x6c
> <4>[21100.397295]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> <4>[21100.397298] RIP: 0033:0x7efef99840f7
> <4>[21100.397299] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a9
> <4>[21100.397303] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX:
> 00007efef99840f7
> <4>[21100.397305] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> 00000000fee1dead
> <4>[21100.397306] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09:
> 00007fff75e04439
> <4>[21100.397308] R10: 00000000000002f4 R11: 0000000000000246 R12:
> 0000000000000002
> <4>[21100.397310] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> <4>[21100.397319] Code: f0 0f b1 0a 75 d5 48 89 f8 c3 48 85 ff 53 48 89 fb 74
> 24 f6 47 3c 01 75 14 48 8b 37 48 89 fa 48 c7 c7 98 3f 85 a5 e8 a5 70 4e ff
> <0f>
> 0b f0 ff 43 38 0f 88 03 f6 00 00 48 89 d8 5b c3 e9 88 1f 64 
> <4>[21100.397399] ---[ end trace 734263fa4c996033 ]---
> <3>[21100.397402] INFO: trying to register non-static key.
> <3>[21100.398065] the code is fine but needs lockdep annotation.
> <3>[21100.399007] turning off the locking correctness validator.
> <4>[21100.399628] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W        
> 4.17.0-gentoo #2
> <4>[21100.400270] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> WS, BIOS 3407 03/10/2017
> <4>[21100.400923] Call Trace:
> <4>[21100.401573]  dump_stack+0x46/0x59
> <4>[21100.402227]  register_lock_class+0x192/0x361
> <4>[21100.402886]  __lock_acquire.isra.32+0x97/0x595
> <4>[21100.403548]  lock_acquire+0x105/0x12e
> <4>[21100.404212]  ? device_shutdown+0x89/0x1a3
> <4>[21100.404879]  ? wake_up_klogd+0x4f/0x61
> <4>[21100.405546]  ? device_shutdown+0x89/0x1a3
> <4>[21100.406217]  __mutex_lock+0x78/0x3a9
> <4>[21100.406892]  ? device_shutdown+0x89/0x1a3
> <4>[21100.407571]  ? device_shutdown+0x78/0x1a3
> <4>[21100.408251]  ? device_shutdown+0x89/0x1a3
> <4>[21100.408928]  device_shutdown+0x89/0x1a3
> <4>[21100.409612]  kernel_restart+0xe/0x4d
> <4>[21100.410303]  __do_sys_reboot+0x168/0x1c5
> <4>[21100.410996]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.411707]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.412398]  ? cycles_2_ns+0x55/0x75
> <4>[21100.413127]  ? task_work_run+0x63/0x8a
> <4>[21100.413818]  ? _raw_spin_unlock_irq+0x2f/0x41
> <4>[21100.414510]  ? task_work_run+0x63/0x8a
> <4>[21100.415205]  do_syscall_64+0x5e/0x6c
> <4>[21100.415900]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> <4>[21100.416603] RIP: 0033:0x7efef99840f7
> <4>[21100.417307] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a9
> <4>[21100.418033] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX:
> 00007efef99840f7
> <4>[21100.418769] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> 00000000fee1dead
> <4>[21100.419510] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09:
> 00007fff75e04439
> <4>[21100.420257] R10: 00000000000002f4 R11: 0000000000000246 R12:
> 0000000000000002
> <4>[21100.421015] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> <4>[21100.421789] ------------[ cut here ]------------
> <4>[21100.422564] kobject: '(null)' (0000000047d32b91): is not initialized,
> yet
> kobject_put() is being called.
> <4>[21100.423378] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:688
> kobject_put+0x28/0x8f
> <4>[21100.424186] Modules linked in:
> <4>[21100.424995] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W        
> 4.17.0-gentoo #2
> <4>[21100.425829] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> WS, BIOS 3407 03/10/2017
> <4>[21100.426683] RIP: 0010:kobject_put+0x28/0x8f
> <4>[21100.427536] RSP: 0018:ffffa6c6cd9d3da8 EFLAGS: 00010286
> <4>[21100.428399] RAX: 0000000000000000 RBX: ffff8d6af5012da8 RCX:
> 0000000000000001
> <4>[21100.429278] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI:
> 00000000ffffffff
> <4>[21100.430165] RBP: ffff8d6af3dc9800 R08: 0000000000000001 R09:
> 0000000000000000
> <4>[21100.431057] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12:
> ffff8d6af5012d98
> <4>[21100.431961] R13: ffffffffa57dfb43 R14: ffff8d6af3dc9860 R15:
> 0000000000000000
> <4>[21100.432873] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000)
> knlGS:0000000000000000
> <4>[21100.433808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[21100.434745] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4:
> 00000000003606e0
> <4>[21100.435697] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> <4>[21100.436663] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> <4>[21100.437618] Call Trace:
> <4>[21100.438570]  device_shutdown+0x17d/0x1a3
> <4>[21100.439530]  kernel_restart+0xe/0x4d
> <4>[21100.440495]  __do_sys_reboot+0x168/0x1c5
> <4>[21100.441467]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.442495]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.443466]  ? cycles_2_ns+0x55/0x75
> <4>[21100.444436]  ? task_work_run+0x63/0x8a
> <4>[21100.445400]  ? _raw_spin_unlock_irq+0x2f/0x41
> <4>[21100.446357]  ? task_work_run+0x63/0x8a
> <4>[21100.447305]  do_syscall_64+0x5e/0x6c
> <4>[21100.448268]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> <4>[21100.449219] RIP: 0033:0x7efef99840f7
> <4>[21100.450171] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a9
> <4>[21100.451143] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX:
> 00007efef99840f7
> <4>[21100.452126] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> 00000000fee1dead
> <4>[21100.453115] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09:
> 00007fff75e04439
> <4>[21100.454098] R10: 00000000000002f4 R11: 0000000000000246 R12:
> 0000000000000002
> <4>[21100.455076] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> <4>[21100.456056] Code: 41 5d c3 48 85 ff 0f 84 85 00 00 00 41 54 55 53 f6 47
> 3c 01 48 89 fb 75 14 48 8b 37 48 89 fa 48 c7 c7 f4 3f 85 a5 e8 f2 6f 4e ff
> <0f>
> 0b f0 ff 4b 38 0f 88 56 f5 00 00 75 53 8a 43 3c 4c 8b 63 28 
> <4>[21100.457169] ---[ end trace 734263fa4c996034 ]---
> <4>[21100.458244] ------------[ cut here ]------------
> <4>[21100.459316] kobject: '(null)' (00000000359a1c66): is not initialized,
> yet
> kobject_get() is being called.
> <4>[21100.460424] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:593
> kobject_get+0x21/0x32
> <4>[21100.461532] Modules linked in:
> <4>[21100.462684] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W        
> 4.17.0-gentoo #2
> <4>[21100.463826] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> WS, BIOS 3407 03/10/2017
> <4>[21100.464992] RIP: 0010:kobject_get+0x21/0x32
> <4>[21100.466188] RSP: 0018:ffffa6c6cd9d3db0 EFLAGS: 00010296
> <4>[21100.467349] RAX: 0000000000000000 RBX: ffff8d6af59e6da8 RCX:
> 0000000000000002
> <4>[21100.468531] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI:
> 00000000ffffffff
> <4>[21100.469724] RBP: ffff8d6af3dcc800 R08: 0000000000000001 R09:
> 0000000000000000
> <4>[21100.470921] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12:
> ffff8d6af500c9a0
> <4>[21100.472116] R13: ffffffffa57dfb43 R14: ffff8d6af3dce860 R15:
> 0000000000000000
> <4>[21100.473316] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000)
> knlGS:0000000000000000
> <4>[21100.474519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[21100.475723] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4:
> 00000000003606e0
> <4>[21100.476927] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> <4>[21100.478130] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> <4>[21100.479331] Call Trace:
> <4>[21100.480514]  get_device+0x16/0x1b
> <4>[21100.481684]  device_shutdown+0x48/0x1a3
> <4>[21100.482844]  kernel_restart+0xe/0x4d
> <4>[21100.484003]  __do_sys_reboot+0x168/0x1c5
> <4>[21100.485173]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.486330]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.487480]  ? cycles_2_ns+0x55/0x75
> <4>[21100.488629]  ? task_work_run+0x63/0x8a
> <4>[21100.489779]  ? _raw_spin_unlock_irq+0x2f/0x41
> <4>[21100.490929]  ? task_work_run+0x63/0x8a
> <4>[21100.492078]  do_syscall_64+0x5e/0x6c
> <4>[21100.493228]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> <4>[21100.494389] RIP: 0033:0x7efef99840f7
> <4>[21100.495554] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a9
> <4>[21100.496741] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX:
> 00007efef99840f7
> <4>[21100.497935] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> 00000000fee1dead
> <4>[21100.499136] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09:
> 00007fff75e04439
> <4>[21100.500340] R10: 00000000000002f4 R11: 0000000000000246 R12:
> 0000000000000002
> <4>[21100.501553] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> <4>[21100.502767] Code: f0 0f b1 0a 75 d5 48 89 f8 c3 48 85 ff 53 48 89 fb 74
> 24 f6 47 3c 01 75 14 48 8b 37 48 89 fa 48 c7 c7 98 3f 85 a5 e8 a5 70 4e ff
> <0f>
> 0b f0 ff 43 38 0f 88 03 f6 00 00 48 89 d8 5b c3 e9 88 1f 64 
> <4>[21100.504102] ---[ end trace 734263fa4c996035 ]---
> <4>[21100.505453] ------------[ cut here ]------------
> <4>[21100.506738] kobject: '(null)' (00000000359a1c66): is not initialized,
> yet
> kobject_put() is being called.
> <4>[21100.508059] WARNING: CPU: 1 PID: 25848 at lib/kobject.c:688
> kobject_put+0x28/0x8f
> <4>[21100.509373] Modules linked in:
> <4>[21100.510680] CPU: 1 PID: 25848 Comm: reboot Tainted: G        W        
> 4.17.0-gentoo #2
> <4>[21100.512002] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> WS, BIOS 3407 03/10/2017
> <4>[21100.513331] RIP: 0010:kobject_put+0x28/0x8f
> <4>[21100.514651] RSP: 0018:ffffa6c6cd9d3da8 EFLAGS: 00010286
> <4>[21100.515977] RAX: 0000000000000000 RBX: ffff8d6af59e6da8 RCX:
> 0000000000000001
> <4>[21100.517310] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI:
> 00000000ffffffff
> <4>[21100.518650] RBP: ffff8d6af3dcc800 R08: 0000000000000001 R09:
> 0000000000000000
> <4>[21100.519981] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12:
> ffff8d6af59e6d98
> <4>[21100.521302] R13: ffffffffa57dfb43 R14: ffff8d6af3dcc860 R15:
> 0000000000000000
> <4>[21100.522612] FS:  00007efef9e42500(0000) GS:ffff8d6afd800000(0000)
> knlGS:0000000000000000
> <4>[21100.524028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[21100.525337] CR2: 0000561f1e29c4d8 CR3: 00000010277fc005 CR4:
> 00000000003606e0
> <4>[21100.526642] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> <4>[21100.527938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> <4>[21100.529229] Call Trace:
> <4>[21100.530507]  device_shutdown+0x17d/0x1a3
> <4>[21100.531787]  kernel_restart+0xe/0x4d
> <4>[21100.533066]  __do_sys_reboot+0x168/0x1c5
> <4>[21100.534341]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.535612]  ? sched_clock_cpu+0x10/0xb4
> <4>[21100.536866]  ? cycles_2_ns+0x55/0x75
> <4>[21100.538111]  ? task_work_run+0x63/0x8a
> <4>[21100.539350]  ? _raw_spin_unlock_irq+0x2f/0x41
> <4>[21100.540590]  ? task_work_run+0x63/0x8a
> <4>[21100.541833]  do_syscall_64+0x5e/0x6c
> <4>[21100.543066]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> <4>[21100.544359] RIP: 0033:0x7efef99840f7
> <4>[21100.545597] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a9
> <4>[21100.546853] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX:
> 00007efef99840f7
> <4>[21100.548163] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> 00000000fee1dead
> <4>[21100.549435] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09:
> 00007fff75e04439
> <4>[21100.550711] R10: 00000000000002f4 R11: 0000000000000246 R12:
> 0000000000000002
> <4>[21100.552044] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> <4>[21100.553324] Code: 41 5d c3 48 85 ff 0f 84 85 00 00 00 41 54 55 53 f6 47
> 3c 01 48 89 fb 75 14 48 8b 37 48 89 fa 48 c7 c7 f4 3f 85 a5 e8 f2 6f 4e ff
> <0f>
> 0b f0 ff 4b 38 0f 88 56 f5 00 00 75 53 8a 43 3c 4c 8b 63 28 
> <4>[21100.554724] ---[ end trace 734263fa4c996036 ]---
> <5>[21100.556701] sd 5:0:0:0: [sdc] Synchronizing SCSI cache
> <5>[21100.558348] sd 4:0:0:0: [sdb] Synchronizing SCSI cache
> <5>[21100.560204] sd 2:0:0:0: [sda] Synchronizing SCSI cache
> <6>[21100.563419] mlx4_core 0000:81:00.0: mlx4_shutdown was called
> <7>[21100.566483] mlx4_en: enp129s0: Close port called
> <4>[21102.258970] ------------[ cut here ]------------
> <4>[21102.260698] kobject: '(null)' (0000000047d32b91): is not initialized,
> yet
> kobject_get() is being called.
> <4>[21102.262150] WARNING: CPU: 2 PID: 25848 at lib/kobject.c:593
> kobject_get+0x21/0x32
> <4>[21102.263588] Modules linked in:
> <4>[21102.264977] CPU: 2 PID: 25848 Comm: reboot Tainted: G        W        
> 4.17.0-gentoo #2
> <4>[21102.266344] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> WS, BIOS 3407 03/10/2017
> <4>[21102.267785] RIP: 0010:kobject_get+0x21/0x32
> <4>[21102.269148] RSP: 0018:ffffa6c6cd9d3db0 EFLAGS: 00010296
> <4>[21102.270503] RAX: 0000000000000000 RBX: ffff8d6af5012da8 RCX:
> 0000000000000002
> <4>[21102.271868] RDX: ffff8d69a1bc5400 RSI: 0000000000000001 RDI:
> 00000000ffffffff
> <4>[21102.273211] RBP: ffff8d6af5012d98 R08: 0000000000000001 R09:
> 0000000000000000
> <4>[21102.274541] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12:
> 0000000000000000
> <4>[21102.275862] R13: ffffffffa57dfb43 R14: ffff8d6af5009a00 R15:
> 0000000000000000
> <4>[21102.277179] FS:  00007efef9e42500(0000) GS:ffff8d6afda00000(0000)
> knlGS:0000000000000000
> <4>[21102.278502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[21102.279821] CR2: 00007fbd98235f68 CR3: 00000010277fc001 CR4:
> 00000000003606e0
> <4>[21102.281171] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> <4>[21102.282492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> <4>[21102.283808] Call Trace:
> <4>[21102.285111]  get_device+0x16/0x1b
> <4>[21102.286403]  device_shutdown+0x53/0x1a3
> <4>[21102.287688]  kernel_restart+0xe/0x4d
> <4>[21102.288970]  __do_sys_reboot+0x168/0x1c5
> <4>[21102.290255]  ? sched_clock_cpu+0x10/0xb4
> <4>[21102.291543]  ? sched_clock_cpu+0x10/0xb4
> <4>[21102.292806]  ? cycles_2_ns+0x55/0x75
> <4>[21102.294069]  ? task_work_run+0x63/0x8a
> <4>[21102.295334]  ? _raw_spin_unlock_irq+0x2f/0x41
> <4>[21102.296598]  ? task_work_run+0x63/0x8a
> <4>[21102.297862]  do_syscall_64+0x5e/0x6c
> <4>[21102.299132]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> <4>[21102.300408] RIP: 0033:0x7efef99840f7
> <4>[21102.301698] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a9
> <4>[21102.302988] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX:
> 00007efef99840f7
> <4>[21102.304285] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> 00000000fee1dead
> <4>[21102.305587] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09:
> 00007fff75e04439
> <4>[21102.306890] R10: 00000000000002f4 R11: 0000000000000246 R12:
> 0000000000000002
> <4>[21102.308199] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> <4>[21102.309497] Code: f0 0f b1 0a 75 d5 48 89 f8 c3 48 85 ff 53 48 89 fb 74
> 24 f6 47 3c 01 75 14 48 8b 37 48 89 fa 48 c7 c7 98 3f 85 a5 e8 a5 70 4e ff
> <0f>
> 0b f0 ff 43 38 0f 88 03 f6 00 00 48 89 d8 5b c3 e9 88 1f 64 
> <4>[21102.310932] ---[ end trace 734263fa4c996037 ]---
> <1>[21102.312296] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000000
> <6>[21102.313668] PGD 0 P4D 0 
> <4>[21102.315026] Oops: 0002 [#1] PREEMPT SMP PTI
> <4>[21102.316392] Modules linked in:
> <4>[21102.317714] CPU: 2 PID: 25848 Comm: reboot Tainted: G        W        
> 4.17.0-gentoo #2
> <4>[21102.319022] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16
> WS, BIOS 3407 03/10/2017
> <4>[21102.320341] RIP: 0010:device_shutdown+0x5e/0x1a3
> <4>[21102.321658] RSP: 0018:ffffa6c6cd9d3dc8 EFLAGS: 00010282
> <4>[21102.322950] RAX: 0000000000000000 RBX: ffff8d6af5012db0 RCX:
> 0000000000000002
> <4>[21102.324242] RDX: ffff8d6af95b8d80 RSI: 0000000000000001 RDI:
> 00000000ffffffff
> <4>[21102.325531] RBP: ffff8d6af5012d98 R08: 0000000000000001 R09:
> 0000000000000000
> <4>[21102.326807] R10: ffff8d69a1bc5400 R11: ffffffffa52a2f40 R12:
> 0000000000000000
> <4>[21102.328071] R13: ffffffffa57dfb43 R14: ffff8d6af5009a00 R15:
> 0000000000000000
> <4>[21102.329335] FS:  00007efef9e42500(0000) GS:ffff8d6afda00000(0000)
> knlGS:0000000000000000
> <4>[21102.330631] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[21102.331909] CR2: 0000000000000000 CR3: 00000010277fc001 CR4:
> 00000000003606e0
> <4>[21102.333233] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> <4>[21102.334503] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> <4>[21102.335765] Call Trace:
> <4>[21102.337012]  kernel_restart+0xe/0x4d
> <4>[21102.338253]  __do_sys_reboot+0x168/0x1c5
> <4>[21102.339489]  ? sched_clock_cpu+0x10/0xb4
> <4>[21102.340719]  ? sched_clock_cpu+0x10/0xb4
> <4>[21102.341961]  ? cycles_2_ns+0x55/0x75
> <4>[21102.343176]  ? task_work_run+0x63/0x8a
> <4>[21102.344392]  ? _raw_spin_unlock_irq+0x2f/0x41
> <4>[21102.345617]  ? task_work_run+0x63/0x8a
> <4>[21102.346835]  do_syscall_64+0x5e/0x6c
> <4>[21102.348061]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> <4>[21102.349290] RIP: 0033:0x7efef99840f7
> <4>[21102.350515] RSP: 002b:00007fff75e04590 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a9
> <4>[21102.351779] RAX: ffffffffffffffda RBX: 0000561f1d0c8bf0 RCX:
> 00007efef99840f7
> <4>[21102.353031] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> 00000000fee1dead
> <4>[21102.354287] RBP: 00007fff75e046f8 R08: 00007efef9e42500 R09:
> 00007fff75e04439
> <4>[21102.355547] R10: 00000000000002f4 R11: 0000000000000246 R12:
> 0000000000000002
> <4>[21102.356808] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> <4>[21102.358066] Code: 5b 01 00 00 48 8b 5f 08 48 8b 7b e8 48 8d 6b e8 e8 01
> d7 ff ff 48 89 ef 49 89 c4 e8 f6 d6 ff ff 48 8b 43 08 48 8b 13 48 89 42 08
> <48>
> 89 10 48 89 1b 48 8b 05 0e 0d bd 02 48 89 5b 08 48 8d 78 10 
> <1>[21102.359437] RIP: device_shutdown+0x5e/0x1a3 RSP: ffffa6c6cd9d3dc8
> <4>[21102.360749] CR2: 0000000000000000
> <4>[21102.362070] ---[ end trace 734263fa4c996038 ]---
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 4 Mateusz Lenik 2018-06-10 06:53:47 UTC
Created attachment 276441 [details]
signature.asc

On Sat, Jun 09, 2018 at 04:07:26PM +0200, Greg Kroah-Hartman wrote:
> Here's the full callstack, but yeah, it's not very obvious as to what
> device is having the problem, which isn't good.  I don't know what to
> suggest here.

I think it is related to amdgpu -- when the script that unbinds the 
second GPU from host is not executed, there are no issues during reboot.

I found out that amdgpu leaves i2c-dev and hwmon device nodes after 
unbind, which seems to be a bug in that driver.  In 4.16, it only left 
stale hwmon node.

Best,
Mateusz
Comment 5 Andrey Grodzovsky 2018-06-12 21:03:58 UTC
Is anything of this consistently reproduces ? Are you able to bisect to find the offending patch ? 

P.S You showed here multiple stack traces under different scenarios. Why are sure sure those are all the same bug and not different issues ?

Andrey
Comment 6 Mateusz Lenik 2018-06-13 09:03:05 UTC
(In reply to Andrey Grodzovsky from comment #5)
> Is anything of this consistently reproduces ? Are you able to bisect to find
> the offending patch ? 
It reproduces consistently, but I won't be able to work on this until the beginning of July -- for the next two weeks I'm on vacations without access to my machines.

> P.S You showed here multiple stack traces under different scenarios. Why are
> sure sure those are all the same bug and not different issues ?
These may be different bugs indeed, but I'm convinced that the root cause is broken amdgpu unbind. These issues only occur when I unbind the gpu. There is a visible change in unbind behaviour between 4.16 and 4.17 -- 4.16 cleaned up i2c device nodes properly.
Comment 7 Andrey Grodzovsky 2018-06-13 13:59:26 UTC
I will take a look, can you point me to where you view those stale nodes ?

Andrey
Comment 8 Mateusz Lenik 2018-06-14 13:27:24 UTC
After executing unbind like this:

echo 0000:03:00.0 >/sys/bus/pci/devices/0000:03:00.0/driver/unbind

I can still see i2c device nodes 7-13 point to that device:

lrwxrwxrwx 1 root root 0 cze 14 15:16 /sys/class/i2c-dev/i2c-10/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 cze 14 15:16 /sys/class/i2c-dev/i2c-11/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 cze 14 15:16 /sys/class/i2c-dev/i2c-12/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 cze 14 15:16 /sys/class/i2c-dev/i2c-13/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 cze 14 15:16 /sys/class/i2c-dev/i2c-7/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 cze 14 15:16 /sys/class/i2c-dev/i2c-8/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 cze 14 15:16 /sys/class/i2c-dev/i2c-9/device/device -> ../../0000:03:00.0

This is a polaris10 device and I'm running 4.17.1 kernel.
Comment 9 Andrey Grodzovsky 2018-06-14 16:44:30 UTC
I discussed this with our Display team, since the issue seems to be related to them. They will take a look next week.
Comment 10 Mateusz Lenik 2018-07-14 21:39:47 UTC
I had some time to test with latest amd-staging-drm-next (d0987b4ee380e9d814052071c939b38a74a34ab1) today and unfortunately the issue is still present.
Comment 11 Mateusz Lenik 2018-07-24 07:08:12 UTC
Created attachment 277475 [details]
dmesg output with kobject debug during amdgpu unbind and rebind
Comment 12 Mateusz Lenik 2018-07-24 07:29:16 UTC
Tested tip of staging today with kobject debug enabled.
I ran the following test:

- unbind one gpu from amdgpu
- bind to vfio
- unbind from vfio
- rebind to amdgpu


after unbind there are 15 i2c device nodes, i2c 7-13 nodes were not removed:

lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-14/device/device -> ../../0000:00:1f.3
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-0/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-1/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-2/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-3/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-4/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-5/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-6/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-7/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-8/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-9/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-10/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-11/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-12/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-13/device/device -> ../../0000:03:00.0

after binding amdgpu driver again there are 22 i2c device nodes, new set of i2c nodes is created (15-21):

lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-14/device/device -> ../../0000:00:1f.3
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-0/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-1/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-2/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-3/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-4/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-5/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-6/device/device -> ../../0000:02:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-7/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-8/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-9/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-10/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-11/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-12/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-13/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-15/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-16/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-17/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-18/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-19/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-20/device/device -> ../../0000:03:00.0
lrwxrwxrwx 1 root root 0 lip 24 08:54 /sys/class/i2c-dev/i2c-21/device/device -> ../../0000:03:00.0

Above, I attached a log that shows kobject allocations for these i2c nodes, but they are never freed during unbind.
This not only breaks some functionality after unbind, like DDC, but also occasionally causes crashes during shutdown.