Bug 209341 - BUG: kernel NULL pointer dereference (acpi_os_read_port/pci_conf1_read)
Summary: BUG: kernel NULL pointer dereference (acpi_os_read_port/pci_conf1_read)
Status: RESOLVED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: acpi_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-21 11:59 UTC by Frantisek Sumsal
Modified: 2020-11-02 14:24 UTC (History)
0 users

See Also:
Kernel Version: 5.8.13
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Frantisek Sumsal 2020-09-21 11:59:00 UTC
Hello,

In attempts to update our systemd CI Arch Linux machines to kernel 5.8.x I've encountered several issues which make the kernel unusable in the test VMs (see [0] for a rough timeline). Since kernel 5.8.6 there are two NULL pointer dereference issues which kill the QEMU/KVM VMs basically right after startup:

```
�c�[?7l�[2J�[0mSeaBIOS (version ArchLinux 1.14.0-1)
Booting from ROM...
Probing EDD (edd=off to disable)... ok
�c�[?7l�[2J[    4.646738] BUG: kernel NULL pointer dereference, address: 00000000000000fb
[    4.649314] #PF: supervisor write access in kernel mode
[    4.649314] #PF: error_code(0x0002) - not-present page
[    4.649314] PGD 0 P4D 0 
[    4.649314] Oops: 0002 [#1] PREEMPT SMP NOPTI
[    4.649314] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.8.10-arch1-1 #1
[    4.649314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[    4.649314] RIP: 0010:acpi_os_read_port+0x61/0x70
[    4.649314] Code: 44 24 08 65 48 2b 04 25 28 00 00 00 75 21 31 c0 48 83 c4 10 c3 83 fa 10 76 0c 83 fa 20 77 15 89 fa ed 89 06 eb d8 89 fa 66 ed <66> 89 06 eb cf e8 75 23 45 00 0f 0b 0f 1f 00 0f 1f 44 00 00 83 fa
[    4.649314] RSP: 0018:ffffa760c0013d38 EFLAGS: 00010056
[    4.649314] RAX: 00000000000000fb RBX: ffff8a7e3cdd1e80 RCX: 0000000000000830
[    4.649314] RDX: 0000000000000000 RSI: 00000000000000fb RDI: 0000000000000830
[    4.649314] RBP: 0000000000000000 R08: 0000000114f6770e R09: 0000000000000000
[    4.649314] R10: 0000000000000007 R11: 0000000000000200 R12: 0000000000000000
[    4.649314] R13: 0000000000000000 R14: ffff8a7e3cdd262c R15: 000000000002c340
[    4.649314] FS:  0000000000000000(0000) GS:ffff8a7e3d380000(0000) knlGS:0000000000000000
[    4.649314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.649314] CR2: 00000000000000fb CR3: 000000003b40a000 CR4: 00000000000406e0
[    4.649314] Call Trace:
[    4.649314]  ? x2apic_send_IPI+0x46/0x50
[    4.649314]  ? ttwu_queue_wakelist+0xb6/0xd0
[    4.649314]  ? try_to_wake_up+0x1b2/0x620
[    4.649314]  ? pcpu_alloc+0x345/0x6f0
[    4.649314]  ? pwq_adjust_max_active+0x95/0xe0
[    4.649314]  ? alloc_workqueue+0x289/0x478
[    4.649314]  ? acpi_container_init+0x11/0x11
[    4.649314]  ? acpi_thermal_init+0x46/0x82
[    4.649314]  ? do_one_initcall+0x59/0x240
[    4.649314]  ? kernel_init_freeable+0x1b0/0x214
[    4.649314]  ? rest_init+0xbf/0xbf
[    4.649314]  ? kernel_init+0xa/0x101
[    4.649314]  ? ret_from_fork+0x22/0x30
[    4.649314] Modules linked in:
[    4.649314] CR2: 00000000000000fb
[    4.649314] ---[ end trace 5b92b8567582453a ]---
[    4.649314] RIP: 0010:acpi_os_read_port+0x61/0x70
[    4.649314] Code: 44 24 08 65 48 2b 04 25 28 00 00 00 75 21 31 c0 48 83 c4 10 c3 83 fa 10 76 0c 83 fa 20 77 15 89 fa ed 89 06 eb d8 89 fa 66 ed <66> 89 06 eb cf e8 75 23 45 00 0f 0b 0f 1f 00 0f 1f 44 00 00 83 fa
[    4.649314] RSP: 0018:ffffa760c0013d38 EFLAGS: 00010056
[    4.649314] RAX: 00000000000000fb RBX: ffff8a7e3cdd1e80 RCX: 0000000000000830
[    4.649314] RDX: 0000000000000000 RSI: 00000000000000fb RDI: 0000000000000830
[    4.649314] RBP: 0000000000000000 R08: 0000000114f6770e R09: 0000000000000000
[    4.649314] R10: 0000000000000007 R11: 0000000000000200 R12: 0000000000000000
[    4.649314] R13: 0000000000000000 R14: ffff8a7e3cdd262c R15: 000000000002c340
[    4.649314] FS:  0000000000000000(0000) GS:ffff8a7e3d380000(0000) knlGS:0000000000000000
[    4.649314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.649314] CR2: 00000000000000fb CR3: 000000003b40a000 CR4: 00000000000406e0
[    4.649314] note: swapper/0[1] exited with preempt_count 3
[    4.649314] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    4.649314] Shutting down cpus with NMI
[    4.649314] Kernel Offset: 0x17800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    4.649314] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
```

```
�c�[?7l�[2J�[0mSeaBIOS (version ArchLinux 1.14.0-1)
Booting from ROM...
Probing EDD (edd=off to disable)... ok
�c�[?7l�[2J[    3.505002] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    3.507748] #PF: supervisor write access in kernel mode
[    3.507748] #PF: error_code(0x0002) - not-present page
[    3.507748] PGD 0 P4D 0 
[    3.507748] Oops: 0002 [#1] PREEMPT SMP NOPTI
[    3.507748] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.8.10-arch1-1 #1
[    3.507748] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[    3.507748] RIP: 0010:pci_conf1_read+0xd5/0x100
[    3.507748] Code: 5d 41 5e c3 41 83 e4 02 41 8d 94 24 fc 0c 00 00 66 ed 0f b7 c0 89 45 00 eb d3 41 83 e4 03 41 8d 94 24 fc 0c 00 00 ec 0f b6 c0 <89> 45 00 eb be ba fc 0c 00 00 ed 89 45 00 eb b3 c7 45 00 ff ff ff
[    3.507748] RSP: 0018:ffffb1aa40013c50 EFLAGS: 00010012
[    3.507748] RAX: 00000000000000fb RBX: ffff9acf3cef0000 RCX: 0000000000000830
[    3.507748] RDX: 0000000000000001 RSI: 00000000000000fb RDI: 0000000000000830
[    3.507748] RBP: 0000000000000000 R08: 00000000d0e8f583 R09: ffff9acf3cd1a0d0
[    3.507748] R10: ffff9acf3bc16b80 R11: 0000000000000000 R12: 0000000000000001
[    3.507748] R13: 0000000000000001 R14: ffff9acf3cef07ac R15: 000000000002c340
[    3.507748] FS:  0000000000000000(0000) GS:ffff9acf3d380000(0000) knlGS:0000000000000000
[    3.507748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.507748] CR2: 0000000000000000 CR3: 000000005ec0a000 CR4: 00000000000406e0
[    3.507748] Call Trace:
[    3.507748]  ? x2apic_send_IPI+0x46/0x50
[    3.507748]  ? ttwu_queue_wakelist+0xb6/0xd0
[    3.507748]  ? try_to_wake_up+0x1b2/0x620
[    3.507748]  ? devtmpfs_submit_req+0x66/0x80
[    3.507748]  ? devtmpfs_create_node+0x9c/0xd0
[    3.507748]  ? device_add+0x6e5/0x7f0
[    3.507748]  ? device_create_groups_vargs+0xd3/0xf0
[    3.507748]  ? device_create+0x51/0x70
[    3.507748]  ? chr_dev_init+0x127/0x146
[    3.507748]  ? serdev_init+0x1d/0x1d
[    3.507748]  ? do_one_initcall+0x59/0x240
[    3.507748]  ? kernel_init_freeable+0x1b0/0x214
[    3.507748]  ? rest_init+0xbf/0xbf
[    3.507748]  ? kernel_init+0xa/0x101
[    3.507748]  ? ret_from_fork+0x22/0x30
[    3.507748] Modules linked in:
[    3.507748] CR2: 0000000000000000
[    3.507748] ---[ end trace 80141c373e8a535f ]---
[    3.507748] RIP: 0010:pci_conf1_read+0xd5/0x100
[    3.507748] Code: 5d 41 5e c3 41 83 e4 02 41 8d 94 24 fc 0c 00 00 66 ed 0f b7 c0 89 45 00 eb d3 41 83 e4 03 41 8d 94 24 fc 0c 00 00 ec 0f b6 c0 <89> 45 00 eb be ba fc 0c 00 00 ed 89 45 00 eb b3 c7 45 00 ff ff ff
[    3.507748] RSP: 0018:ffffb1aa40013c50 EFLAGS: 00010012
[    3.507748] RAX: 00000000000000fb RBX: ffff9acf3cef0000 RCX: 0000000000000830
[    3.507748] RDX: 0000000000000001 RSI: 00000000000000fb RDI: 0000000000000830
[    3.507748] RBP: 0000000000000000 R08: 00000000d0e8f583 R09: ffff9acf3cd1a0d0
[    3.507748] R10: ffff9acf3bc16b80 R11: 0000000000000000 R12: 0000000000000001
[    3.507748] R13: 0000000000000001 R14: ffff9acf3cef07ac R15: 000000000002c340
[    3.507748] FS:  0000000000000000(0000) GS:ffff9acf3d380000(0000) knlGS:0000000000000000
[    3.507748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.507748] CR2: 0000000000000000 CR3: 000000005ec0a000 CR4: 00000000000406e0
[    3.507748] note: swapper/0[1] exited with preempt_count 2
[    3.507748] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    3.507748] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
```

I'm not how to debug this further or if it's a misconfiguration on our side, but the issue is reproducible quite reliably when running a specific part of the testsuite. If there's anything I could do to provide more information, please let me know.

Also, my apologies if I filed this in a wrong category, I'm not entirely sure which subsystem/category this falls into.


[0] https://github.com/systemd/systemd-centos-ci/pull/295#issuecomment-682519585
Comment 1 Frantisek Sumsal 2020-09-21 13:33:57 UTC
Few additional notes:

 * this seems to happen only with nested KVM
 * all affected machines are booted with -smp 4 or -smp 8
 * this used to work (and still works) as expected with kernel 5.7.12
 * both the VM and the host use the same package set
 * an example QEMU command line from one of the failing runs:

/sbin/qemu-system-x86_64 -smp 4 -net none -m 2048M -nographic -kernel /boot/vmlinuz-linux -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test-TEST-01-BASIC_sanitizers-qemu/basic.img -initrd /var/tmp/initrd-testsuite-kBq.img -machine accel=kvm -enable-kvm -cpu host -append ' root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/usr/lib/systemd/systemd-under-asan console=ttyS0 selinux=0  SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-01.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-01.service systemd.wants=end.service  '
Comment 2 Frantisek Sumsal 2020-11-02 14:24:51 UTC
Appears to be fixed in kernel 5.9.

Note You need to log in before you can comment on or make changes to this bug.