Created attachment 165591 [details] procfs_mmap.c When a module creates a file in /proc, the .owner field of file_operations structure is ignored. When this file is used with mmap, the refcount of the module can be kept as 0, and running rmmod can cause a "BUG: unable to handle kernel paging request" later when the mmap'ed area is acceded. Existing workarounds: * When using a file in debugfs instead of procfs, the module refcount is incremented in open and decremented in close so there is no issue. * When using try_module_get(THIS_MODULE) and module_put(THIS_MODULE) in open and close file_operations functions, the module refcount is never 0 when the mmap exists, preventing a nasty rmmod. System information: * Tested with both on Fedora 3.17.3-200.fc20.x86_64 and Arch Linux 3.18.5-1 kernels. Example code: A short module creating a file in the procfs which can be used with mmap is attached to this report. It can be used with the following Python script: #!/usr/bin/env python import mmap import os f = open("/proc/procfs_mmap","r+b") m = mmap.mmap(f.fileno(), 0) # "First line: This is a test." print("First line: %s" % m.readline().rstrip()) # "lsmod says: procfs_mmap 12587 0" # (Please note the 0 in the "Used by" column) print("lsmod says:") os.system("lsmod | grep procfs_mmap") # Unload the module while it is still used, possible because the refcount is zero os.system("rmmod procfs_mmap") # "Second line: On two lines" print("Second line: %s" % m.readline().rstrip()) Then, dmesg shows: [ 1234.337725] BUG: unable to handle kernel paging request at ffffffffa030a268 [ 1234.338007] IP: [<ffffffff811bf054>] remove_vma+0x24/0x70 [ 1234.338007] PGD 1c17067 PUD 1c18063 PMD 3d713067 PTE 0 [ 1234.338007] Oops: 0000 [#1] SMP [ 1234.338007] Modules linked in: fuse rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill ppdev bochs_drm virtio_net serio_raw parport_pc ttm pvpanic parport drm_kms_helper drm i2c_piix4 nfsd auth_rpcgss nfs_acl lockd sunrpc virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi [last unloaded: procfs_mmap] [ 1234.338007] CPU: 1 PID: 706 Comm: python Tainted: G OE 3.17.3-200.fc20.x86_64 #1 [ 1234.338007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140617_173321-var-lib-archbuild-testing-x86_64-tobias 04/01/2014 [ 1234.338007] task: ffff88003c3f6220 ti: ffff8800370a0000 task.ti: ffff8800370a0000 [ 1234.338007] RIP: 0010:[<ffffffff811bf054>] [<ffffffff811bf054>] remove_vma+0x24/0x70 [ 1234.338007] RSP: 0018:ffff8800370a3ec8 EFLAGS: 00010286 [ 1234.338007] RAX: 0000000000000000 RBX: ffff88003bd61678 RCX: ffffffffffffffff [ 1234.338007] RDX: ffffffffa030a260 RSI: 0000000000000002 RDI: ffff88003bd61678 [ 1234.338007] RBP: ffff8800370a3ed8 R08: ffff88003c14b348 R09: 0000000000000001 [ 1234.338007] R10: ffff88003bd616d0 R11: 0000000000000286 R12: 0000000000000000 [ 1234.338007] R13: 00007f56df207000 R14: ffff88003bd61678 R15: ffff88003bd61ac8 [ 1234.338007] FS: 00007f56df1f9740(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [ 1234.338007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1234.338007] CR2: ffffffffa030a268 CR3: 00000000372cf000 CR4: 00000000000006e0 [ 1234.338007] Stack: [ 1234.338007] ffff88003c14b000 0000000000000000 ffff8800370a3f28 ffffffff811c155f [ 1234.338007] ffff88003bd61ad8 00007f56df206000 ffff88003cee6ba0 ffff88003c14b060 [ 1234.338007] ffff88003c14b000 00007f56df206000 0000000000001000 000000000000000d [ 1234.338007] Call Trace: [ 1234.338007] [<ffffffff811c155f>] do_munmap+0x27f/0x3b0 [ 1234.338007] [<ffffffff811c16d1>] vm_munmap+0x41/0x60 [ 1234.338007] [<ffffffff811c2652>] SyS_munmap+0x22/0x30 [ 1234.338007] [<ffffffff817301a9>] system_call_fastpath+0x16/0x1b [ 1234.338007] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 4c 8b 67 10 e8 f8 d1 56 00 48 8b 93 90 00 00 00 48 85 d2 74 0e <48> 8b 42 08 48 85 c0 74 05 48 89 df ff d0 48 8b bb a0 00 00 00 [ 1234.338007] RIP [<ffffffff811bf054>] remove_vma+0x24/0x70 [ 1234.338007] RSP <ffff8800370a3ec8> [ 1234.338007] CR2: ffffffffa030a268 [ 1234.338007] ---[ end trace 7635f2cc5254e578 ]--- This dump comes from a qemu-kvm guest where "uname -a" is "Linux fedora 3.17.3-200.fc20.x86_64 #1 SMP Fri Nov 14 19:45:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux".
procfs knows it is never built modular in the base kernel, and all the stuff which is modular ends up in sysfs or debugfs. It's as such not really a bug, although if there was a sensible use case for modules in /proc I'm sure patches would be happily accepted.
Whether procfs is built modular or not is not the subject here. A question more on topic would more likely be whether code using proc_create are in modules which can be unloaded. A quick search of "proc_create" [1] shows that such modules exists, like almost every network protocol module (for statistics in /proc/*/net/). Nevertheless it seems no module in the current kernel tree creates a mmap-able file in /proc. Moreover, the documentation suggests ".owner = THIS_MODULE" is needed for a file_operations structure given to proc_create [2], but in reality this field is currently ignored, unless I missed something in the code (I know this field is used by fops_get/fops_put for usual filesystems). Anyway, I've written a patch, tested it on a VM and on my computer, sent it to Al Viro and fs-devel (as the MAINTAINERS file specifies [3]), and waiting for a reply. The patch can be downloaded from fs-devel archives: http://marc.info/?l=linux-fsdevel&m=142362465426897&w=2 . Comments and feedback on it would be greatly appreciated. Also, for my project, I'm basically using mmap as a kind of communication channel between a kernel module and a userspace program. At first the virtual file was created in debugfs, but as this filesystem is not always mounted on production systems, I needed another place. As I couldn't find a suitable directory in sysfs, I ended up using procfs. Could you please suggest what would be a logical location for such a file in sysfs, if procfs does not suit well? Thanks [1] http://lxr.free-electrons.com/ident?v=3.18;i=proc_create [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/seq_file.txt?id=v3.19#n279 [3] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/MAINTAINERS?id=v3.19#n3904 [Reopening the bug report because of there is a pending patch]
The only in-tree code which creates mmap'able files in /proc is not modular, so this went unnoticed.
This needs bits and pieces from revoke(2) I guess.
I got a NAK from Al Viro with a great answer about why procfs doesn't grab a reference to the module owning the file_operations structure like other filesystems: https://lkml.org/lkml/2015/2/17/539 . I therefore close this bug as "WILL_NOT_FIX".