Bug 73711

Summary: epmutex corrupts and hangs user space process at epoll_release_file
Product: File System Reporter: yangzhe1990
Component: VFSAssignee: fs_vfs
Status: NEW ---    
Severity: high CC: alan, szg00000
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.13.5 Subsystem:
Regression: No Bisected commit-id:

Description yangzhe1990 2014-04-09 07:33:55 UTC
Hello,

We are running a lot of busy IO extensive nginx processes. There's about 30K connections to nginx. So there's frequent call to close(). It seems that nginx isn't removing some or all fds from epoll before close()ing. We are now experiencing frequently hanging up on Linux 3.13.5.

The wchan of all the nginx process is epoll_release_file. The kernel warning message is like:

Apr  8 18:08:15 localhost kernel: [72183.278929] ------------[ cut here ]------------
Apr  8 18:08:15 localhost kernel: [72183.278972] WARNING: CPU: 2 PID: 19494 at kernel/locking/mutex.c:516 __mutex_lock_slowpath+0x33d/0x390()
Apr  8 18:08:15 localhost kernel: [72183.279023] DEBUG_LOCKS_WARN_ON(l->magic != l)
Apr  8 18:08:15 localhost kernel: [72183.279049] Modules linked in: xfs libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mgag200 ttm drm_kms
_helper drm ixgbe igb kvm reiserfs crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_he
lper ablk_helper cryptd ipmi_devintf syscopyarea sysfillrect gpio_ich dcdbas iTCO_wdt iTCO_vendor_support sysimgblt ptp pps_core cdc_ether usbnet mii joydev m
dio mousedev i2c_algo_bit microcode dca i2c_core evdev pcspkr ipmi_si ipmi_msghandler wmi acpi_power_meter sb_edac edac_core lpc_ich button mei_me mei shpchp 
processor tcp_hybla ext4 crc16 mbcache jbd2 hid_generic usbhid hid ses enclosure sd_mod ahci libahci ehci_pci libata ehci_hcd usbcore usb_common megaraid_sas 
scsi_mod bcache
Apr  8 18:08:15 localhost kernel: [72183.291142] CPU: 2 PID: 19494 Comm: nginx Not tainted 3.13.5-1-ARCH #1
Apr  8 18:08:15 localhost kernel: [72183.292807] Hardware name: Dell Inc. PowerEdge R720/0DCWD1, BIOS 2.1.3 11/20/2013
Apr  8 18:08:15 localhost kernel: [72183.294493]  0000000000000009 ffff88010cefdd60 ffffffff815176c4 ffff88010cefdda8
Apr  8 18:08:15 localhost kernel: [72183.296201]  ffff88010cefdd98 ffffffff81061a3d ffff88000e177808 ffff88000e177808
Apr  8 18:08:15 localhost kernel: [72183.297926]  ffff88041668c800 ffff88000e17780c 0000000000000246 ffff88010cefddf8
Apr  8 18:08:15 localhost kernel: [72183.299666] Call Trace:
Apr  8 18:08:15 localhost kernel: [72183.301396]  [<ffffffff815176c4>] dump_stack+0x4d/0x6f
Apr  8 18:08:15 localhost kernel: [72183.303150]  [<ffffffff81061a3d>] warn_slowpath_common+0x7d/0xa0
Apr  8 18:08:15 localhost kernel: [72183.304878]  [<ffffffff81061aac>] warn_slowpath_fmt+0x4c/0x50
Apr  8 18:08:15 localhost kernel: [72183.306575]  [<ffffffff811e8e20>] ? clear_tfile_check_list+0x60/0x60
Apr  8 18:08:15 localhost kernel: [72183.308514]  [<ffffffff8151cc6d>] __mutex_lock_slowpath+0x33d/0x390
Apr  8 18:08:15 localhost kernel: [72183.311018]  [<ffffffff8151ccd2>] mutex_lock+0x12/0x22
Apr  8 18:08:15 localhost kernel: [72183.313499]  [<ffffffff811ea090>] eventpoll_release_file+0x50/0xa0
Apr  8 18:08:15 localhost kernel: [72183.315964]  [<ffffffff811a55f3>] __fput+0x1f3/0x220
Apr  8 18:08:15 localhost kernel: [72183.318395]  [<ffffffff811a566e>] ____fput+0xe/0x10
Apr  8 18:08:15 localhost kernel: [72183.320798]  [<ffffffff810805cf>] task_work_run+0x9f/0xe0
Apr  8 18:08:15 localhost kernel: [72183.323184]  [<ffffffff81014e2c>] do_notify_resume+0x8c/0xa0
Apr  8 18:08:15 localhost kernel: [72183.325536]  [<ffffffff81525dea>] int_signal+0x12/0x17
Apr  8 18:08:15 localhost kernel: [72183.327866] ---[ end trace cb0d5d3c41bb7a8e ]---

From the trace I can identify that the warning message was printed when it acquired the global epmutex. But since the kernel code continued to run. I've no idea how it get blocked.

The CPU% is very low when nginx hung. And after nginx hung, I tried to run some test epoll code, and the test program also close() without removing first from epoll, the test program also hung at epoll_release_file.
Comment 1 Alan 2014-04-22 17:29:10 UTC
It actually looks like some kind of memory coprruption - the magic number used to check the lock was corrupted.

It's possible building a kernel with a lot more debug on may give better answers (notably slab poisoning and the various lock validators) but they will have a performance impact so I don't know how practical that is ?
Comment 2 Alan 2014-04-22 17:29:49 UTC
(tentatively reassigning to fs/vfs for now until we can get more idea)