Bug 92511 - procfs does not honor .owner field of struct file_operations
Summary: procfs does not honor .owner field of struct file_operations
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-02 14:39 UTC by Nicolas Iooss
Modified: 2015-02-21 13:11 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.17.3 and 3.18.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
procfs_mmap.c (1.39 KB, text/x-csrc)
2015-02-02 14:39 UTC, Nicolas Iooss
Details

Description Nicolas Iooss 2015-02-02 14:39:44 UTC
Created attachment 165591 [details]
procfs_mmap.c

When a module creates a file in /proc, the .owner field of file_operations structure is ignored. When this file is used with mmap, the refcount of the module can be kept as 0, and running rmmod can cause a "BUG: unable to handle kernel paging request" later when the mmap'ed area is acceded.

Existing workarounds:

* When using a file in debugfs instead of procfs, the module refcount is incremented in open and decremented in close so there is no issue.
* When using try_module_get(THIS_MODULE) and module_put(THIS_MODULE) in open and close file_operations functions, the module refcount is never 0 when the mmap exists, preventing a nasty rmmod.

System information:

* Tested with both on Fedora 3.17.3-200.fc20.x86_64 and Arch Linux 3.18.5-1 kernels.

Example code:

A short module creating a file in the procfs which can be used with mmap is attached to this report. It can be used with the following Python script:

#!/usr/bin/env python
import mmap
import os

f = open("/proc/procfs_mmap","r+b")
m = mmap.mmap(f.fileno(), 0)

# "First line: This is a test."
print("First line: %s" % m.readline().rstrip())

# "lsmod says: procfs_mmap            12587  0"
# (Please note the 0 in the "Used by" column)
print("lsmod says:")
os.system("lsmod | grep procfs_mmap")

# Unload the module while it is still used, possible because the refcount is zero
os.system("rmmod procfs_mmap")

# "Second line: On two lines"
print("Second line: %s" % m.readline().rstrip())

Then, dmesg shows:

[ 1234.337725] BUG: unable to handle kernel paging request at ffffffffa030a268
[ 1234.338007] IP: [<ffffffff811bf054>] remove_vma+0x24/0x70
[ 1234.338007] PGD 1c17067 PUD 1c18063 PMD 3d713067 PTE 0
[ 1234.338007] Oops: 0000 [#1] SMP 
[ 1234.338007] Modules linked in: fuse rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill ppdev bochs_drm virtio_net serio_raw parport_pc ttm pvpanic parport drm_kms_helper drm i2c_piix4 nfsd auth_rpcgss nfs_acl lockd sunrpc virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi [last unloaded: procfs_mmap]
[ 1234.338007] CPU: 1 PID: 706 Comm: python Tainted: G           OE  3.17.3-200.fc20.x86_64 #1
[ 1234.338007] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140617_173321-var-lib-archbuild-testing-x86_64-tobias 04/01/2014
[ 1234.338007] task: ffff88003c3f6220 ti: ffff8800370a0000 task.ti: ffff8800370a0000
[ 1234.338007] RIP: 0010:[<ffffffff811bf054>]  [<ffffffff811bf054>] remove_vma+0x24/0x70
[ 1234.338007] RSP: 0018:ffff8800370a3ec8  EFLAGS: 00010286
[ 1234.338007] RAX: 0000000000000000 RBX: ffff88003bd61678 RCX: ffffffffffffffff
[ 1234.338007] RDX: ffffffffa030a260 RSI: 0000000000000002 RDI: ffff88003bd61678
[ 1234.338007] RBP: ffff8800370a3ed8 R08: ffff88003c14b348 R09: 0000000000000001
[ 1234.338007] R10: ffff88003bd616d0 R11: 0000000000000286 R12: 0000000000000000
[ 1234.338007] R13: 00007f56df207000 R14: ffff88003bd61678 R15: ffff88003bd61ac8
[ 1234.338007] FS:  00007f56df1f9740(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 1234.338007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1234.338007] CR2: ffffffffa030a268 CR3: 00000000372cf000 CR4: 00000000000006e0
[ 1234.338007] Stack:
[ 1234.338007]  ffff88003c14b000 0000000000000000 ffff8800370a3f28 ffffffff811c155f
[ 1234.338007]  ffff88003bd61ad8 00007f56df206000 ffff88003cee6ba0 ffff88003c14b060
[ 1234.338007]  ffff88003c14b000 00007f56df206000 0000000000001000 000000000000000d
[ 1234.338007] Call Trace:
[ 1234.338007]  [<ffffffff811c155f>] do_munmap+0x27f/0x3b0
[ 1234.338007]  [<ffffffff811c16d1>] vm_munmap+0x41/0x60
[ 1234.338007]  [<ffffffff811c2652>] SyS_munmap+0x22/0x30
[ 1234.338007]  [<ffffffff817301a9>] system_call_fastpath+0x16/0x1b
[ 1234.338007] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 4c 8b 67 10 e8 f8 d1 56 00 48 8b 93 90 00 00 00 48 85 d2 74 0e <48> 8b 42 08 48 85 c0 74 05 48 89 df ff d0 48 8b bb a0 00 00 00 
[ 1234.338007] RIP  [<ffffffff811bf054>] remove_vma+0x24/0x70
[ 1234.338007]  RSP <ffff8800370a3ec8>
[ 1234.338007] CR2: ffffffffa030a268
[ 1234.338007] ---[ end trace 7635f2cc5254e578 ]---

This dump comes from a qemu-kvm guest where "uname -a" is "Linux fedora 3.17.3-200.fc20.x86_64 #1 SMP Fri Nov 14 19:45:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux".
Comment 1 Alan 2015-02-10 15:26:58 UTC
procfs knows it is never built modular in the base kernel, and all the stuff which is modular ends up in sysfs or debugfs.

It's  as such not really a bug, although if there was a sensible use case for modules in /proc I'm sure patches would be happily accepted.
Comment 2 Nicolas Iooss 2015-02-13 08:54:17 UTC
Whether procfs is built modular or not is not the subject here.  A question more on topic would more likely be whether code using proc_create are in modules which can be unloaded.  A quick search of "proc_create" [1] shows that such modules exists, like almost every network protocol module (for statistics in /proc/*/net/). Nevertheless it seems no module in the current kernel tree creates a mmap-able file in /proc.

Moreover, the documentation suggests ".owner = THIS_MODULE" is needed for a file_operations structure given to proc_create [2], but in reality this field is currently ignored, unless I missed something in the code (I know this field is used by fops_get/fops_put for usual filesystems).

Anyway, I've written a patch, tested it on a VM and on my computer, sent it to Al Viro and fs-devel (as the MAINTAINERS file specifies [3]), and waiting for a reply.  The patch can be downloaded from fs-devel archives: http://marc.info/?l=linux-fsdevel&m=142362465426897&w=2 .  Comments and feedback on it would be greatly appreciated.

Also, for my project, I'm basically using mmap as a kind of communication channel between a kernel module and a userspace program.  At first the virtual file was created in debugfs, but as this filesystem is not always mounted on production systems, I needed another place.  As I couldn't find a suitable directory in sysfs, I ended up using procfs.  Could you please suggest what would be a logical location for such a file in sysfs, if procfs does not suit well?

Thanks

[1] http://lxr.free-electrons.com/ident?v=3.18;i=proc_create
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/seq_file.txt?id=v3.19#n279
[3] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/MAINTAINERS?id=v3.19#n3904

[Reopening the bug report because of there is a pending patch]
Comment 3 Alexey Dobriyan 2015-02-20 19:45:45 UTC
The only in-tree code which creates mmap'able files in /proc is not modular,
so this went unnoticed.
Comment 4 Alexey Dobriyan 2015-02-20 19:46:31 UTC
This needs bits and pieces from revoke(2) I guess.
Comment 5 Nicolas Iooss 2015-02-21 13:11:33 UTC
I got a NAK from Al Viro with a great answer about why procfs doesn't grab a reference to the module owning the file_operations structure like other filesystems: https://lkml.org/lkml/2015/2/17/539 .  I therefore close this bug as "WILL_NOT_FIX".

Note You need to log in before you can comment on or make changes to this bug.