Bug 215647 - aoe: removing aoe devices with flush (implicit in rmmod aoe) leads to page fault
Summary: aoe: removing aoe devices with flush (implicit in rmmod aoe) leads to page fault
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-01 10:17 UTC by Valentin Kleibel
Modified: 2022-03-01 10:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.20-rc1 - 5.13-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
patch for aoe flush (485 bytes, patch)
2022-03-01 10:17 UTC, Valentin Kleibel
Details | Diff

Description Valentin Kleibel 2022-03-01 10:17:41 UTC
Created attachment 300511 [details]
patch for aoe flush

Hello,

there is a bug in the aoe driver module between v4.20-rc1 and v5.14-rc1 inroduced in 3582dd2 (aoe: convert aoeblk to blk-mq) and fixed in 6560ec9 (aoe: use blk_mq_alloc_disk and blk_cleanup_disk).
Every forcible removal of an aoe device (eg. "rmmod aoe" with aoe devices available or "aoe-flush ex.x") leads to a page fault.
This bug was successfully reproduced with kernel 5.10.92 from the debian repository, there were no changes to the affected code between v4.20-rc1 and v5.14-rc1. Version 4.19.208 (from debian buster) and 5.17-rc4 (from debian experimental) are confirmed not to be affected.

The code in freedev() calls blk_mq_free_tag_set() before running blk_cleanup_queue() which leads to this issue (drivers/block/aoe/aoedev.c L281ff).
The attached patch for affected kernel versions just changes the order of function calls to match the one introduced with blk_cleanup_disk() to mitigate this issue.

*Reproducing the issue:*

prepare "aoe storage backend"
```
apt-get install vblade
dd if=/dev/zero of=/root/1g.img bs=1M count=1024
```
trigger the actual bug:
```
rmmod aoe
vbladed 1 1 lo /root/1g.img
modprobe aoe aoe_iflist=lo
sleep 1
rmmod aoe
```
This immediately leads to 
```
[ 1789.579832] BUG: unable to handle page fault for address: ffff9ca947342bd0
[ 1789.580993] #PF: supervisor read access in kernel mode
[ 1789.581984] #PF: error_code(0x0000) - not-present page
[ 1789.582912] PGD 1baa01067 P4D 1baa01067 PUD 0 
[ 1789.583830] Oops: 0000 [#1] SMP PTI
[ 1789.584732] CPU: 0 PID: 1758 Comm: rmmod Tainted: G          I       5.10.0-11-amd64 #1 Debian 5.10.92-1
[ 1789.585662] Hardware name: IBM System x3650 M2 -[79474LG]-/49Y6512, BIOS -[D6E162AUS-1.20]- 05/07/2014
[ 1789.586575] RIP: 0010:sbitmap_queue_wake_all+0x21/0x60
[ 1789.587481] Code: 84 00 00 00 00 00 0f 1f 00 41 54 49 89 fc 55 53 f0 83 44 24 fc 00 8b 5f 24 bd 08 00 00 00 48 63 c3 48 c1 e0 06 49 03 44 24 28 <48> 8b 50 10 48 8d 78 08 48 83 c0 10 48 39 c2 74 11 31 c9 ba 01 00
[ 1789.589452] RSP: 0018:ffffb1cc06c73e18 EFLAGS: 00010287
[ 1789.590477] RAX: ffff9ca947342bc0 RBX: 00000000e9ffa0df RCX: ffff9caec74bcbc0
[ 1789.591537] RDX: ffff9caec74be7d0 RSI: 0000000000000001 RDI: ffff9caec5194060
[ 1789.592603] RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000000
[ 1789.593689] R10: 0000000000000001 R11: 0000000000000200 R12: ffff9caec5194060
[ 1789.594785] R13: 0000000000000000 R14: 0000000000000000 R15: dead000000000100
[ 1789.595890] FS:  00007f80a072a4c0(0000) GS:ffff9cb61fc00000(0000) knlGS:0000000000000000
[ 1789.597004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1789.598109] CR2: ffff9ca947342bd0 CR3: 00000001057c0000 CR4: 00000000000006f0
[ 1789.599243] Call Trace:
[ 1789.600384]  blk_mq_wake_waiters+0x3d/0x50
[ 1789.601547]  blk_set_queue_dying+0x22/0x40
[ 1789.602716]  blk_cleanup_queue+0x26/0xd0
[ 1789.603897]  flush+0x34e/0x510 [aoe]
[ 1789.605089]  aoe_exit+0x31/0x795 [aoe]
[ 1789.606288]  __do_sys_delete_module+0x190/0x300
[ 1789.607473]  ? exit_to_user_mode_prepare+0x32/0x120
[ 1789.608644]  do_syscall_64+0x33/0x80
[ 1789.609805]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1789.610973] RIP: 0033:0x7f80a083f8b7
[ 1789.612149] Code: 73 01 c3 48 8b 0d 79 95 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 95 0e 00 f7 d8 64 89 01 48
[ 1789.614677] RSP: 002b:00007ffe6d6f5748 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1789.615988] RAX: ffffffffffffffda RBX: 0000555a083a4760 RCX: 00007f80a083f8b7
[ 1789.617320] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555a083a47c8
[ 1789.618655] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1789.619984] R10: 00007f80a08ceac0 R11: 0000000000000206 R12: 00007ffe6d6f59b0
[ 1789.621313] R13: 00007ffe6d6f5f01 R14: 0000555a083a42a0 R15: 0000555a083a4760
[ 1789.622646] Modules linked in: aoe(-) btrfs blake2b_generic xor raid6_pq ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs libcrc32c intel_powerclamp ipmi_ssif coretemp kvm_intel kvm mgag200 irqbypass iTCO_wdt drm_kms_helper tpm_tis cdc_ether intel_pmc_bxt intel_cstate cec tpm_tis_core iTCO_vendor_support sg usbnet watchdog mii intel_uncore i2c_algo_bit pcspkr evdev ipmi_si i7core_edac tpm ioatdma dca button ipmi_devintf rng_core ipmi_msghandler i5500_temp acpi_cpufreq drm fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_common hid_generic usbhid hid sr_mod cdrom ata_generic mptsas ata_piix mptscsih libata mptbase uhci_hcd ehci_pci scsi_transport_sas ehci_hcd crc32c_intel usbcore scsi_mod i2c_i801 i2c_smbus lpc_ich bnx2 usb_common [last unloaded: aoe]
[ 1789.633118] CR2: ffff9ca947342bd0
[ 1789.634681] ---[ end trace 1a490b55bf7d5ea0 ]---
[ 1789.636233] RIP: 0010:sbitmap_queue_wake_all+0x21/0x60
[ 1789.637766] Code: 84 00 00 00 00 00 0f 1f 00 41 54 49 89 fc 55 53 f0 83 44 24 fc 00 8b 5f 24 bd 08 00 00 00 48 63 c3 48 c1 e0 06 49 03 44 24 28 <48> 8b 50 10 48 8d 78 08 48 83 c0 10 48 39 c2 74 11 31 c9 ba 01 00
[ 1789.640940] RSP: 0018:ffffb1cc06c73e18 EFLAGS: 00010287
[ 1789.642540] RAX: ffff9ca947342bc0 RBX: 00000000e9ffa0df RCX: ffff9caec74bcbc0
[ 1789.644163] RDX: ffff9caec74be7d0 RSI: 0000000000000001 RDI: ffff9caec5194060
[ 1789.645763] RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000000
[ 1789.647341] R10: 0000000000000001 R11: 0000000000000200 R12: ffff9caec5194060
[ 1789.648896] R13: 0000000000000000 R14: 0000000000000000 R15: dead000000000100
[ 1789.650411] FS:  00007f80a072a4c0(0000) GS:ffff9cb61fc00000(0000) knlGS:0000000000000000
[ 1789.651919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1789.653405] CR2: ffff9ca947342bd0 CR3: 00000001057c0000 CR4: 00000000000006f0
[ 1789.878033] systemd-journald[355]: Compressed data object 848 -> 740 using LZ4
```

*fix for the issue*
call blk_cleanup_queue() before blk_mq_free_tag_set().
```
diff -Nru a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
--- a/drivers/block/aoe/aoedev.c        2022-02-22 08:10:49.182885294 +0100
+++ b/drivers/block/aoe/aoedev.c        2022-02-22 08:21:36.983298488 +0100
@@ -277,9 +282,9 @@
        if (d->gd) {
                aoedisk_rm_debugfs(d);
                del_gendisk(d->gd);
+               blk_cleanup_queue(d->blkq);
                put_disk(d->gd);
                blk_mq_free_tag_set(&d->tag_set);
-               blk_cleanup_queue(d->blkq);
        }
        t = d->targets;
        e = t + d->ntargets;
```

Note You need to log in before you can comment on or make changes to this bug.