Bug 108631 - Stuck on mb_cache_spinlock
Summary: Stuck on mb_cache_spinlock
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-30 16:36 UTC by Alexander Polakov
Modified: 2016-03-20 21:54 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.3.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Patches fixing lockups with mbcache (32.93 KB, application/x-gzip)
2016-01-07 14:10 UTC, Jan Kara
Details

Description Alexander Polakov 2015-11-30 16:36:08 UTC
We use POSIX ACL's extensively on our systems and we're experiencing mbcache-related hangs recently. 
My amature analysis of the problem follows: when it happens that memory gets low mb_cache_shrink_* functions run acquiring the mb_cache_spinlock, which in turn leads to mb_cache_entry_alloc() blocking and CPUs getting stuck for seconds (2015-11-25T22:16:25.326266+03:00 iohost [1069489.325447] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [sh:17142])

Here're some excerpts from netconsole logs:

CPU0
2015-11-25T22:16:33.056273+03:00 somehost [1069497.048211]  [<ffffffffa024e739>] ? ext4_xattr_cache_insert+0x19/0x60
2015-11-25T22:16:33.056537+03:00 somehost [1069497.048481]  [<ffffffffa024ff58>] ? ext4_xattr_get+0x168/0x200
2015-11-25T22:16:33.056828+03:00 somehost [1069497.048745]  [<ffffffffa01b4ade>] ? generic_getxattr+0x3e/0x50
2015-11-25T22:16:33.057076+03:00 somehost [1069497.049006]  [<ffffffffa0291714>] ? cap_inode_need_killpriv+0x24/0x30
2015-11-25T22:16:33.057455+03:00 somehost [1069497.049266]  [<ffffffffa0294769>] ? security_inode_need_killpriv+0x29/0x40
2015-11-25T22:16:33.057940+03:00 somehost [1069497.049622]  [<ffffffffa01ac6c4>] ? dentry_needs_remove_privs.part.6+0x14/0x30
2015-11-25T22:16:33.058176+03:00 somehost [1069497.050123]  [<ffffffffa01ac70c>] ? file_remove_privs+0x1c/0x80
2015-11-25T22:16:33.058438+03:00 somehost [1069497.050384]  [<ffffffffa0254d6d>] ? __ext4_set_acl+0x2ad/0x330
2015-11-25T22:16:33.058704+03:00 somehost [1069497.050645]  [<ffffffffa012e36e>] ? __generic_file_write_iter+0x5e/0x1d0
2015-11-25T22:16:33.058975+03:00 somehost [1069497.050912]  [<ffffffffa02f7f36>] ? string.isra.2+0x36/0xc0
2015-11-25T22:16:33.059236+03:00 somehost [1069497.051177]  [<ffffffffa020ab31>] ? ext4_file_write_iter+0xe1/0x420
2015-11-25T22:16:33.059507+03:00 somehost [1069497.051440]  [<ffffffffa02f9d49>] ? snprintf+0x39/0x40
2015-11-25T22:16:33.059792+03:00 somehost [1069497.051701]  [<ffffffffa01d301a>] ? subbuf_start_callback+0xa/0xe0
2015-11-25T22:16:33.060038+03:00 somehost [1069497.051967]  [<ffffffffa00fbde6>] ? relay_switch_subbuf+0xd6/0x160
2015-11-25T22:16:33.060304+03:00 somehost [1069497.052230]  [<ffffffffa0195044>] ? __vfs_write+0xa4/0xf0
2015-11-25T22:16:33.060564+03:00 somehost [1069497.052489]  [<ffffffffa01956a9>] ? vfs_write+0x99/0x1d0
2015-11-25T22:16:33.060815+03:00 somehost [1069497.052749]  [<ffffffffa0195fad>] ? SyS_write+0x3d/0xa0
2015-11-25T22:16:33.061073+03:00 somehost [1069497.053013]  [<ffffffffa064dd57>] ? entry_SYSCALL_64_fastpath+0x12/0x6a

CPU8
2015-11-25T22:16:37.166151+03:00 somehost [1069501.154112]  [<ffffffffa01e124c>] ? mb_cache_shrink_count+0xc/0xa0
2015-11-25T22:16:37.166419+03:00 somehost [1069501.154375]  [<ffffffffa013e73a>] ? shrink_slab.part.15+0x11a/0x350
2015-11-25T22:16:37.166685+03:00 somehost [1069501.154637]  [<ffffffffa00e5f38>] ? css_next_descendant_pre+0x48/0x60
2015-11-25T22:16:37.166945+03:00 somehost [1069501.154899]  [<ffffffffa01413d6>] ? shrink_zone+0x296/0x2a0
2015-11-25T22:16:37.167224+03:00 somehost [1069501.155159]  [<ffffffffa014223b>] ? kswapd+0x4fb/0x8e0
2015-11-25T22:16:37.167464+03:00 somehost [1069501.155419]  [<ffffffffa0141d40>] ? mem_cgroup_shrink_node_zone+0x150/0x150
2015-11-25T22:16:37.167760+03:00 somehost [1069501.155684]  [<ffffffffa0094be8>] ? kthread+0xb8/0xd0
2015-11-25T22:16:37.168001+03:00 somehost [1069501.155950]  [<ffffffffa0094b30>] ? kthread_worker_fn+0x140/0x140
2015-11-25T22:16:37.168252+03:00 somehost [1069501.156212]  [<ffffffffa064e09f>] ? ret_from_fork+0x3f/0x70
2015-11-25T22:16:37.168512+03:00 somehost [1069501.156471]  [<ffffffffa0094b30>] ? kthread_worker_fn+0x140/0x140

CPU13
2015-11-25T22:16:37.310698+03:00 somehost [1069501.298509]  [<ffffffffa01e124c>] ? mb_cache_shrink_count+0xc/0xa0
2015-11-25T22:16:37.310969+03:00 somehost [1069501.298776]  [<ffffffffa013e73a>] ? shrink_slab.part.15+0x11a/0x350
2015-11-25T22:16:37.311246+03:00 somehost [1069501.299045]  [<ffffffffa00e5f38>] ? css_next_descendant_pre+0x48/0x60
2015-11-25T22:16:37.311502+03:00 somehost [1069501.299311]  [<ffffffffa01413d6>] ? shrink_zone+0x296/0x2a0
2015-11-25T22:16:37.311755+03:00 somehost [1069501.299574]  [<ffffffffa014223b>] ? kswapd+0x4fb/0x8e0
2015-11-25T22:16:37.312029+03:00 somehost [1069501.299836]  [<ffffffffa0141d40>] ? mem_cgroup_shrink_node_zone+0x150/0x150
2015-11-25T22:16:37.312292+03:00 somehost [1069501.300107]  [<ffffffffa0094be8>] ? kthread+0xb8/0xd0
2015-11-25T22:16:37.312561+03:00 somehost [1069501.300370]  [<ffffffffa0094b30>] ? kthread_worker_fn+0x140/0x140
2015-11-25T22:16:37.312857+03:00 somehost [1069501.300640]  [<ffffffffa064e09f>] ? ret_from_fork+0x3f/0x70
2015-11-25T22:16:37.313096+03:00 somehost [1069501.300905]  [<ffffffffa0094b30>] ? kthread_worker_fn+0x140/0x140

CPU15
2015-11-25T22:16:53.337301+03:00 somehost [1069517.309523]  [<ffffffffa01e167f>] ? mb_cache_entry_alloc+0x1f/0x1d0
2015-11-25T22:16:53.337568+03:00 somehost [1069517.309793]  [<ffffffffa024e739>] ? ext4_xattr_cache_insert+0x19/0x60
2015-11-25T22:16:53.337829+03:00 somehost [1069517.310060]  [<ffffffffa024ff58>] ? ext4_xattr_get+0x168/0x200
2015-11-25T22:16:53.338103+03:00 somehost [1069517.310327]  [<ffffffffa01b4ade>] ? generic_getxattr+0x3e/0x50
2015-11-25T22:16:53.338367+03:00 somehost [1069517.310593]  [<ffffffffa0292052>] ? get_vfs_caps_from_disk+0x52/0xf0
2015-11-25T22:16:53.338630+03:00 somehost [1069517.310859]  [<ffffffffa02922e5>] ? cap_bprm_set_creds+0x1f5/0x590
2015-11-25T22:16:53.338905+03:00 somehost [1069517.311125]  [<ffffffffa0293af9>] ? security_bprm_set_creds+0x29/0x40
2015-11-25T22:16:53.339167+03:00 somehost [1069517.311393]  [<ffffffffa019acbd>] ? prepare_binprm+0x6d/0x160
2015-11-25T22:16:53.339426+03:00 somehost [1069517.311657]  [<ffffffffa019b6b3>] ? do_execveat_common+0x443/0x690
2015-11-25T22:16:53.339691+03:00 somehost [1069517.311921]  [<ffffffffa019bb33>] ? SyS_execve+0x23/0x30
2015-11-25T22:16:53.339954+03:00 somehost [1069517.312187]  [<ffffffffa064dfe5>] ? stub_execve+0x5/0x5
2015-11-25T22:16:53.340222+03:00 somehost [1069517.312450]  [<ffffffffa064dd57>] ? entry_SYSCALL_64_fastpath+0x12/0x6a


I've not yet found a way to reproduce the problem reliably (this is particularly hard as it only happens under load), but it's definitely connected to shrinker being run simultaneosly with heavy filesystem activity (i.e. running rsync / setfacl -R).
Comment 1 Norman Meilick 2016-01-07 09:02:42 UTC
Hi,

we have the same problem on several of our servers (mostly HP ProLiant ML350p Gen8 but also other brands) and are able to reproduce it reliably on some of them. Usually the soft lockups are triggered during our nightly rsync backups (backing up lots of local files with POSIX ACLs to a local disk), but they also happened during normal use. After a few hours of soft lockups and an unusable server, the hardware watchdog finally kicks in and a kernel panic results.

All file systems involved are EXT3.
We've confirmed the problem with kernels 3.16, 3.18 and 4.3.
Kernel 3.14 seems not to be affected - no problems on any server after the downgrade.

Soft lockup:
[26331.750524] BUG: soft lockup - CPU#2 stuck for 22s! [rsync:17146]
[26331.750552] Modules linked in: snd_mtpav netconsole snd_rawmidi snd_seq_device snd soundcore mptctl mptbase xt_REDIRECT ip_set_bitmap_port ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ipmi_devintf configfs hpwdt quota_v2 quota_tree autofs4 veth act_police cls_basic cls_flow cls_fw cls_u32 sch_fq_codel sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq stp llc ppdev lp parport xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_LOG xt_connlimit xt_realm xt_addrtype xt_comment xt_recent xt_nat ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_defrag_ipv6 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables openvswitch gre vxlan libcrc32c intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel hpilo sb_edac edac_core aes_x86_64 lrw shpchp gf128mul glue_helper ipmi_si ablk_helper ipmi_msghandler serio_raw cryptd ioatdma pcspkr lpc_ich wmi joydev acpi_power_meter mac_hid hid_generic psmouse igb usbhid tg3 i2c_algo_bit pata_acpi hid dca hpsa ptp pps_core [last unloaded: snd_mtpav]
[26331.750591] CPU: 2 PID: 17146 Comm: rsync Tainted: G        W     3.16.0-57-generic #77~14.04.1-Ubuntu
[26331.750592] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 07/01/2015
[26331.750593] task: ffff8800857ea8c0 ti: ffff88015c01c000 task.ti: ffff88015c01c000
[26331.750599] RIP: 0010:[<ffffffff81770b12>]  [<ffffffff81770b12>] _raw_spin_lock+0x32/0x50
[26331.750600] RSP: 0018:ffff88015c01fb40  EFLAGS: 00000202
[26331.750600] RAX: 0000000000004ade RBX: ffffffff81120324 RCX: 00000000000054e8
[26331.750601] RDX: 00000000000054ea RSI: 00000000000054ea RDI: ffffffff81f8d600
[26331.750602] RBP: ffff88015c01fb40 R08: 20043814dd400000 R09: dffb4fec30c53750
[26331.750602] R10: 3998000000000000 R11: 000000609ccc0000 R12: ffffffff81206960
[26331.750603] R13: ffff8800b9481310 R14: ffffffff81206fd8 R15: ffff88015c01fb00
[26331.750604] FS:  00007fc052b8f740(0000) GS:ffff88043f440000(0000) knlGS:0000000000000000
[26331.750605] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26331.750606] CR2: 00000000edd93000 CR3: 000000015c2d0000 CR4: 00000000000407e0
[26331.750606] Stack:
[26331.750608]  ffff88015c01fb98 ffffffff8122d027 ffff8801b4364000 ffff8800b9481040
[26331.750609]  0000000008290bcc 0000000000001000 ffff88015c01fba8 ffff88020f0218f0
[26331.750611]  ffff88020f0218f0 ffff88015c01fc70 ffffff9f634c2733 ffff88015c01fbc0
[26331.750611] Call Trace:
[26331.750616]  [<ffffffff8122d027>] __mb_cache_entry_find+0x187/0x1e0
[26331.750617]  [<ffffffff8122d1a2>] mb_cache_entry_find_next+0x82/0xb0
[26331.750621]  [<ffffffff81296e7b>] ext4_xattr_block_set+0x26b/0xa30
[26331.750623]  [<ffffffff81298235>] ext4_xattr_set_handle+0x315/0x450
[26331.750625]  [<ffffffff8129c64b>] __ext4_set_acl+0x29b/0x350
[26331.750628]  [<ffffffff8129cb53>] ext4_init_acl+0x43/0xd0
[26331.750629]  [<ffffffff81253b19>] __ext4_new_inode+0xe59/0x12f0
[26331.750632]  [<ffffffff812641ac>] ext4_mkdir+0x1ac/0x400
[26331.750635]  [<ffffffff811e0747>] vfs_mkdir+0xb7/0x160
[26331.750637]  [<ffffffff811e686b>] SyS_mkdir+0xab/0xe0
[26331.750639]  [<ffffffff8177108d>] system_call_fastpath+0x1a/0x1f
[26331.750652] Code: 89 e5 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f b7 f2 b8 00 80 00 00 eb 0c 0f 1f 44 00 00 f3 90 <83> e8 01 74 0a 0f b7 0f 66 39 ca 75 f1 5d c3 66 66 66 90 66 66 
[26332.682647] Modules linked in: snd_mtpav netconsole snd_rawmidi snd_seq_device snd soundcore mptctl mptbase xt_REDIRECT ip_set_bitmap_port ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ipmi_devintf configfs hpwdt quota_v2 quota_tree autofs4 veth act_police cls_basic cls_flow cls_fw cls_u32 sch_fq_codel sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq stp llc ppdev lp parport xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_LOG xt_connlimit xt_realm xt_addrtype xt_comment xt_recent xt_nat ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_defrag_ipv6 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables openvswitch gre vxlan libcrc32c intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel hpilo sb_edac edac_core aes_x86_64 lrw shpchp gf128mul glue_helper ipmi_si ablk_helper ipmi_msghandler serio_raw cryptd ioatdma pcspkr lpc_ich wmi joydev acpi_power_meter mac_hid hid_generic psmouse igb usbhid tg3 i2c_algo_bit pata_acpi hid dca hpsa ptp pps_core [last unloaded: snd_mtpav]
[26332.890168] CPU: 0 PID: 51806 Comm: rsync Tainted: G        W     3.16.0-57-generic #77~14.04.1-Ubuntu
[26332.911627] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 07/01/2015
[26332.933421] task: ffff88034d5a7010 ti: ffff88009089c000 task.ti: ffff88009089c000
[26332.954043] RIP: 0010:[<ffffffff8122d869>]  [<ffffffff8122d869>] mb_cache_entry_alloc+0x69/0x230
[26332.974573] RSP: 0018:ffff88009089fbd0  EFLAGS: 00000246
[26332.994921] RAX: ffff88010e053750 RBX: ffff88009089fbc0 RCX: ffff88010e053750
[26333.015346] RDX: ffff88010e053750 RSI: 0000000000000050 RDI: 0000000000000001
[26333.035819] RBP: ffff88009089fbe8 R08: ffff880329aaa750 R09: ffffffff81aaa85e
[26333.056281] R10: ffff880329aaa750 R11: 0000000000000246 R12: ffff8801986e49a0
[26333.075995] R13: ffff8801986e49a0 R14: ffffffff810b5dc5 R15: ffffffff8129edba
[26333.095567] FS:  00007fc052b8f740(0000) GS:ffff88043f400000(0000) knlGS:0000000000000000
[26333.115318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[26333.134546] CR2: 00000000f77b7000 CR3: 000000016c415000 CR4: 00000000000407f0
[26333.153632] Stack:
[26333.172429]  ffff880003603fc0 ffff880003603fc0 ffff880329aaa750 ffff88009089fc10
[26333.191585]  ffffffff81296073 ffff880003603fc0 ffffffff81aaa85e ffff880003603f58
[26333.210546]  ffff88009089fc80 ffffffff81297cc0 ffff880003603fc0 ffff8800b9481040
[26333.229103] Call Trace:
[26333.247342]  [<ffffffff81296073>] ext4_xattr_cache_insert+0x23/0x70
[26333.265668]  [<ffffffff81297cc0>] ext4_xattr_get+0x200/0x290
[26333.283669]  [<ffffffff812960fb>] ? ext4_xattr_find_entry+0x3b/0x100
[26333.301506]  [<ffffffff8129ccaa>] ext4_xattr_security_get+0x2a/0x30
[26333.319294]  [<ffffffff811f9b2c>] generic_getxattr+0x4c/0x70
[26333.338108]  [<ffffffff812ec0ad>] cap_inode_need_killpriv+0x2d/0x40
[26333.357066]  [<ffffffff812edeb6>] security_inode_need_killpriv+0x16/0x20
[26333.375541]  [<ffffffff811efe63>] file_remove_suid+0x43/0xc0
[26333.393366]  [<ffffffff8129def0>] ? jbd2_journal_stop+0x1c0/0x3b0
[26333.411155]  [<ffffffff81164f86>] __generic_file_write_iter+0x126/0x350
[26333.428916]  [<ffffffff81250c00>] ext4_file_write_iter+0xf0/0x3a0
[26333.445980]  [<ffffffff811d51a1>] new_sync_write+0x81/0xb0
[26333.461304]  [<ffffffff811d5957>] vfs_write+0xb7/0x1f0
[26333.476357]  [<ffffffff811d64f6>] SyS_write+0x46/0xb0
[26333.490799]  [<ffffffff8177108d>] system_call_fastpath+0x1a/0x1f


Kernel panic:
[35127.073300] Kernel panic - not syncing: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources:
[35127.073300] 1. Integrated Management Log (IML)
[35127.073300] 2. OA Syslog
[35127.073300] 3. OA Forward Progress Log
[35127.073300] 4. iLO Event Log
[35127.148146] CPU: 0 PID: 51806 Comm: rsync Tainted: G        W     3.16.0-57-generic #77~14.04.1-Ubuntu
[35127.163193] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 07/01/2015
[35127.178164]  00001ff2a8b9cc34 ffff88043f405d88 ffffffff8176891e ffffffffc05532d8
[35127.193308]  ffff88043f405e00 ffffffff81760d39 0000000000000008 ffff88043f405e10
[35127.208326]  ffff88043f405db0 0000000000000000 ffffc90012068072 0000000000000001
[35127.223370] Call Trace:
[35127.238143]  <NMI>  [<ffffffff8176891e>] dump_stack+0x45/0x56
[35127.252982]  [<ffffffff81760d39>] panic+0xc8/0x1fc
[35127.268159]  [<ffffffffc05528fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt]
[35127.283291]  [<ffffffff8101d2d9>] ? sched_clock+0x9/0x10
[35127.298505]  [<ffffffff81017f68>] nmi_handle+0x88/0x120
[35127.313634]  [<ffffffff81017f72>] ? nmi_handle+0x92/0x120
[35127.329260]  [<ffffffff8101840e>] io_check_error+0x1e/0xa0
[35127.345045]  [<ffffffff81018529>] default_do_nmi+0x99/0x140
[35127.360063]  [<ffffffff810186c0>] do_nmi+0xf0/0x160
[35127.374762]  [<ffffffff8177348f>] end_repeat_nmi+0x1a/0x1e
[35127.389518]  [<ffffffff8122d85b>] ? mb_cache_entry_alloc+0x5b/0x230
[35127.404386]  [<ffffffff8122d85b>] ? mb_cache_entry_alloc+0x5b/0x230
[35127.419001]  [<ffffffff8122d85b>] ? mb_cache_entry_alloc+0x5b/0x230
[35127.433400]  <<EOE>>  [<ffffffff81296073>] ext4_xattr_cache_insert+0x23/0x70
[35127.448042]  [<ffffffff81297cc0>] ext4_xattr_get+0x200/0x290
[35127.462789]  [<ffffffff812960fb>] ? ext4_xattr_find_entry+0x3b/0x100
[35127.478030]  [<ffffffff8129ccaa>] ext4_xattr_security_get+0x2a/0x30
[35127.491919]  [<ffffffff811f9b2c>] generic_getxattr+0x4c/0x70
[35127.505158]  [<ffffffff812ec0ad>] cap_inode_need_killpriv+0x2d/0x40
[35127.517814]  [<ffffffff812edeb6>] security_inode_need_killpriv+0x16/0x20
[35127.530709]  [<ffffffff811efe63>] file_remove_suid+0x43/0xc0
[35127.543130]  [<ffffffff8129def0>] ? jbd2_journal_stop+0x1c0/0x3b0
[35127.555322]  [<ffffffff81164f86>] __generic_file_write_iter+0x126/0x350
[35127.567068]  [<ffffffff81250c00>] ext4_file_write_iter+0xf0/0x3a0
[35127.578833]  [<ffffffff811d51a1>] new_sync_write+0x81/0xb0
[35127.589820]  [<ffffffff811d5957>] vfs_write+0xb7/0x1f0
[35127.600817]  [<ffffffff811d64f6>] SyS_write+0x46/0xb0
[35127.611291]  [<ffffffff8177108d>] system_call_fastpath+0x1a/0x1f
Comment 2 Jan Kara 2016-01-07 14:09:39 UTC
Yeah, although your workload seems to be different than what's described in bug 107301, the principal problem with mbache is the same. Can either of you guys test patches I have posted which should fix it? I'll attach tarball with them in a moment.
Comment 3 Jan Kara 2016-01-07 14:10:14 UTC
Created attachment 198911 [details]
Patches fixing lockups with mbcache
Comment 4 Theodore Tso 2016-01-07 15:06:26 UTC
I would definitely ask you to try Jan's patches to see if they make a difference, although given that your reports are with what happens with the mbcache shrinker, it's not quite the same as the bug in 107301.

Quick question, are you using containers at all in these systems?
Comment 5 Norman Meilick 2016-01-07 15:33:08 UTC
Thank you very much, I'll try the patches tomorrow.

And yes, we're indeed running several LXC containers on the affected servers.

One quick correction to my original comment: When I stated that kernel 3.14 does not seem to be affected, I mixed up the version number. It's 3.13 that doesn't exhibit any problems for us (we didn't test with 3.14). I'm sorry for the mistake.
Comment 6 Theodore Tso 2016-01-07 15:49:40 UTC
You said that you had replicated this problem on 4.2, so it's probably not the primary driver of the problem in your case, but be aware that in some of the older kernels (and 3.14 is included in that list), the shrinker is not cgroup aware.   That means that if a single container starts aggressively using all of its memory, we will start kicking off *all* of the slab cache shrinkers.   I've seen that in a very container-heavy environment, with many tasks that use right up to their memory cgroup limit (and sometimes beyond), that this can cause the ext4 extent status cache shrinker to end up getting run a lot, which leads to symptoms similar to yours due to spinlock contention.  (Of course, for a different spinlock since a different slab cache shrinker is involved).

I've been playing with some patches to use trylock and to abandon the ext4 es shrinker if it's under too much lock contention, but I haven't come up with something for which I was sufficiently happy to push upstream yet --- because if the slab cache does need shrinker, just simply giving up isn't going to do the right thing entirely, either.

One hacky solution would be to add knobs so we can cap the size of various caches like the mbcache and extents status cache so we can afford to be more aggressive at giving up if the trylock fails.  But assuming that system administrators will *set* these knobs correctly is probably a bad choice.  So this is a more general problem than just the mbcache, and I haven't found a good general solution yet.  And there may not be; we can try to make systems that are less likely to lead to spinlock contention, but under sufficiently heavy memory pressure, this may only be forestalling the inevitable.
Comment 7 Norman Meilick 2016-01-22 09:08:18 UTC
I'm sorry for the long delay, testing took longer than expected. While we could trigger the problem every time with kernels 3.14 to 3.18, it took 2-3 attempts (usually about 8 hours of backing up around 2.5M files) until the spinlock contention appeared with kernels 4.2.0 and 4.4.0-rc8.

Since applying Jan's patches to 4.4.0-rc8 we have not experienced any problems, so it seems to us that this fixed the issue. Thank you very much to both of you.

I'm also grateful for the advice regarding containers. On our test server we only have six containers with a modest memory use (we made sure to monitor it during the tests), so this likely wasn't an issue there.
Comment 8 Theodore Tso 2016-03-20 21:54:18 UTC
These patches have been pushed to Linus, and will be in Linux 4.6.

Note You need to log in before you can comment on or make changes to this bug.