Created attachment 277493 [details] ko demo 'get_unused_fd_flags' in kthread cause kernel crash. It works fine on 4.1, but causes crash after get 64 fds. It also cause crash on ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the same. The crash message on centos7.5 shows below: [ 93.216084] start fd 61 [ 93.316064] start fd 62 [ 93.416084] start fd 63 [ 93.521024] BUG: unable to handle kernel NULL pointer dereference at (null) [ 93.521111] IP: [<ffffffff9c4c4dbe>] __wake_up_common+0x2e/0x90 [ 93.521172] PGD 0 [ 93.521197] Oops: 0000 [#1] SMP [ 93.521233] Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core [ 93.522026] virtio floppy dm_mirror dm_region_hash dm_log dm_mod [ 93.522089] CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.3.3.el7.x86_64 #1 [ 93.522172] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 93.522258] task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000 [ 93.522314] RIP: 0010:[<ffffffff9c4c4dbe>] [<ffffffff9c4c4dbe>] __wake_up_common+0x2e/0x90 [ 93.522382] RSP: 0018:ffff8e94247a2d18 EFLAGS: 00010086 [ 93.522424] RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000 [ 93.522477] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0 [ 93.522530] RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8 [ 93.522584] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8 [ 93.522637] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003 [ 93.522691] FS: 0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000 [ 93.522751] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 93.522796] CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0 [ 93.522854] Call Trace: [ 93.522884] [<ffffffff9c4c7d39>] __wake_up+0x39/0x50 [ 93.522931] [<ffffffff9c63a771>] expand_files+0x131/0x250 [ 93.522979] [<ffffffff9cb1182c>] ? schedule_timeout+0x17c/0x2c0 [ 93.523027] [<ffffffff9c63b017>] __alloc_fd+0x47/0x170 [ 93.523071] [<ffffffff9c63b170>] get_unused_fd_flags+0x30/0x40 [ 93.523120] [<ffffffffc07ea12a>] test_fd+0x12a/0x1c0 [test] [ 93.523178] [<ffffffffc07ea000>] ? 0xffffffffc07e9fff [ 93.523220] [<ffffffff9c4bb161>] kthread+0xd1/0xe0 [ 93.523262] [<ffffffff9c4bb090>] ? insert_kthread_work+0x40/0x40 [ 93.523312] [<ffffffff9cb20677>] ret_from_fork_nospec_begin+0x21/0x21 [ 93.523363] [<ffffffff9c4bb090>] ? insert_kthread_work+0x40/0x40 [ 93.523409] Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 <48> 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef [ 93.523756] RIP [<ffffffff9c4c4dbe>] __wake_up_common+0x2e/0x90 [ 93.523806] RSP <ffff8e94247a2d18> [ 93.523835] CR2: 0000000000000000
This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4 (3.10.0-693.21.1 ) is ok. Root cause: the item 'resize_wait' is not initialized before being used. Patch: --- linux-4.14.91/fs/file.c 2018-12-29 20:39:11.000000000 +0800 +++ linux-4.14.91/fs/file.c.new 2019-01-08 11:55:44.297048053 +0800 @@ -462,6 +462,7 @@ .full_fds_bits = init_files.full_fds_bits_init, }, .file_lock = __SPIN_LOCK_UNLOCKED(init_files.file_lock), + .resize_wait = __WAIT_QUEUE_HEAD_INITIALIZER(init_files.resize_wait), }; static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
Thanks. I queued the patch. Please send me a Signed-off-by: as per Documentation/process/submitting-patches.rst, section 11.