Bug 113041 - mbcache NULL pointer dereference
Summary: mbcache NULL pointer dereference
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-24 10:05 UTC by Johnny
Modified: 2016-04-27 21:35 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.2.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Johnny 2016-02-24 10:05:11 UTC
Experienced a sudden restart without any noticeable load except high memory usage.
Logged crash message below:

[1500169.920760] BUG: unable to handle kernel NULL pointer dereference at           (null)
[1500169.921056] IP: [<ffffffffa00f4fb9>] mb_cache_shrink+0x2c9/0x3a0 [mbcache]
[1500169.921056] PGD 78938f067 PUD 30aa81067 PMD 0 
[1500169.921056] Oops: 0000 [#1] SMP 
[1500169.921056] Modules linked in: xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 vxlan ip6_udp_tunnel udp_tunnel iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat nf_conntrack bridge stp llc xfs libcrc32c nls_ascii nls_cp437 vfat fat xenfs xen_privcmd ext4 crc16 mbcache jbd2 crc32c_intel hmac ata_piix drbg libata aesni_intel aes_x86_64 glue_helper lrw mousedev gf128mul ablk_helper cryptd i2c_piix4 xen_blkfront microcode scsi_mod firmware_class ixgbevf i2c_core psmouse evdev acpi_cpufreq button sch_fq_codel ip_tables autofs4
[1500169.921056] CPU: 0 PID: 23022 Comm: java Not tainted 4.2.2-coreos-r2 #2
[1500169.921056] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[1500169.921056] task: ffff8800eba50000 ti: ffff8800270fc000 task.ti: ffff8800270fc000
[1500169.921056] RIP: 0010:[<ffffffffa00f4fb9>]  [<ffffffffa00f4fb9>] mb_cache_shrink+0x2c9/0x3a0 [mbcache]
[1500169.921056] RSP: 0018:ffff8800270ff358  EFLAGS: 00010213
[1500169.921056] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000180270025
[1500169.921056] RDX: 0000000180270026 RSI: ffffea001ce5fcc0 RDI: 0000000000000000
[1500169.921056] RBP: ffff8800270ff388 R08: 00000000397f3e01 R09: 0000000180270025
[1500169.921056] R10: ffff8807b0e18f80 R11: ffff8807397f3e38 R12: ffff8800270ff358
[1500169.921056] R13: 0000000000000036 R14: 0000000000000080 R15: ffffffffa00f7000
[1500169.921056] FS:  00007f43d702d700(0000) GS:ffff8807b0e00000(0000) knlGS:0000000000000000
[1500169.921056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1500169.921056] CR2: 0000000000000000 CR3: 00000002f1e43000 CR4: 00000000001406f0
[1500169.921056] Stack:
[1500169.921056]  ffff880584aebf70 ffff8803d200b208 0000000000000000 0000000000000098
[1500169.921056]  0000000000000080 000000000000004c ffff8800270ff468 ffffffff8115ef3d
[1500169.921056]  ffff880000000003 ffffffff8109cd00 ffff880000000001 ffff880788906340
[1500169.921056] Call Trace:
[1500169.921056]  [<ffffffff8115ef3d>] shrink_slab+0x1ed/0x370
[1500169.921056]  [<ffffffff8109cd00>] ? enqueue_entity+0x3e0/0xdc0
[1500169.921056]  [<ffffffff81163283>] shrink_zone+0x283/0x290
[1500169.921056]  [<ffffffff811633ec>] do_try_to_free_pages+0x15c/0x430
[1500169.921056]  [<ffffffff8116377a>] try_to_free_pages+0xba/0x130
[1500169.921056]  [<ffffffff8115658a>] __alloc_pages_nodemask+0x56a/0x970
[1500169.921056]  [<ffffffff81199221>] alloc_pages_current+0x91/0x100
[1500169.921056]  [<ffffffff811a3d9c>] new_slab+0x34c/0x440
[1500169.921056]  [<ffffffff810afc01>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[1500169.921056]  [<ffffffff811a4239>] __slab_alloc+0x3a9/0x490
[1500169.921056]  [<ffffffffa01e5a6f>] ? ext4_orphan_del+0x47ff/0xda20 [ext4]
[1500169.921056]  [<ffffffff8126818c>] ? hashtab_search+0x5c/0x80
[1500169.921056]  [<ffffffff81274787>] ? mls_level_isvalid+0x57/0x60
[1500169.921056]  [<ffffffffa01e5a6f>] ? ext4_orphan_del+0x47ff/0xda20 [ext4]
[1500169.921056]  [<ffffffff811a44b1>] kmem_cache_alloc+0x191/0x1f0
[1500169.921056]  [<ffffffffa01e5a6f>] ext4_orphan_del+0x47ff/0xda20 [ext4]
[1500169.921056]  [<ffffffff811d7a9d>] alloc_inode+0x1d/0x90
[1500169.921056]  [<ffffffff811d98a1>] new_inode_pseudo+0x11/0x60
[1500169.921056]  [<ffffffff811d990b>] new_inode+0x1b/0x40
[1500169.921056]  [<ffffffffa01cec7f>] __ext4_new_inode+0x7f/0x1190 [ext4]
[1500169.921056]  [<ffffffffa01df63c>] ext4_insert_dentry+0x188c/0x1900 [ext4]
[1500169.921056]  [<ffffffff811c9e2a>] vfs_create+0xca/0x130
[1500169.921056]  [<ffffffff8123c748>] ovl_create_real+0xb8/0x230
[1500169.921056]  [<ffffffff8123d9ab>] ovl_create_or_link+0x10b/0x500
[1500169.921056]  [<ffffffff8123dddd>] ovl_create_object+0x3d/0x60
[1500169.921056]  [<ffffffff8125d533>] ? selinux_inode_create+0x13/0x20
[1500169.921056]  [<ffffffff8123deb1>] ovl_create+0x21/0x30
[1500169.921056]  [<ffffffff811c9e2a>] vfs_create+0xca/0x130
[1500169.921056]  [<ffffffff811cc3f1>] path_openat+0xab1/0x13e0
[1500169.921056]  [<ffffffff811cce9b>] ? putname+0x5b/0x60
[1500169.921056]  [<ffffffff81090f6f>] ? wake_up_q+0x2f/0x70
[1500169.921056]  [<ffffffff811a4499>] ? kmem_cache_alloc+0x179/0x1f0
[1500169.921056]  [<ffffffff811cdddb>] do_filp_open+0x7b/0xe0
[1500169.921056]  [<ffffffff811daeb9>] ? __alloc_fd+0x89/0x110
[1500169.921056]  [<ffffffff811bd27c>] do_sys_open+0x12c/0x210
[1500169.921056]  [<ffffffff81021b4f>] ? syscall_trace_enter_phase1+0xff/0x150
[1500169.921056]  [<ffffffff811bd37e>] SyS_open+0x1e/0x20
[1500169.921056]  [<ffffffff8152bbae>] entry_SYSCALL_64_fastpath+0x12/0x71
[1500169.921056] Code: 4c 89 ef ff 14 25 c8 b8 a2 81 48 8b 7d d0 45 31 ed 4c 39 e7 48 8b 1f 74 17 e8 04 f1 ff ff 48 89 d8 49 83 c5 01 48 89 df 4c 39 e0 <48> 8b 1b 75 e9 48 83 c4 18 4c 89 e8 5b 41 5c 41 5d 5d c3 f3 90 
[1500169.921056] RIP  [<ffffffffa00f4fb9>] mb_cache_shrink+0x2c9/0x3a0 [mbcache]
[1500169.921056]  RSP <ffff8800270ff358>
[1500169.921056] CR2: 0000000000000000
[1500170.273210] ---[ end trace 76bceb77fead570b ]---
[1500170.278279] Kernel panic - not syncing: Fatal exception
[1500170.282063] Kernel Offset: disabled


Additional information collected after reboot:

cat /proc/version
Linux version 4.2.2-coreos-r2 (buildbot@ip-10-204-3-57) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.2, pie-0.6.3) ) #2 SMP Tue Feb 2 13:27:19 UTC 2016

cat /proc/meminfo
MemTotal:       31419640 kB
MemFree:         1313584 kB
MemAvailable:   13120824 kB
Buffers:         1164008 kB
Cached:          9589260 kB
SwapCached:            0 kB
Active:         11727272 kB
Inactive:        7908784 kB
Active(anon):    8903468 kB
Inactive(anon):      344 kB
Active(file):    2823804 kB
Inactive(file):  7908440 kB
Unevictable:     8990544 kB
Mlocked:         8990544 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                96 kB
Writeback:             0 kB
AnonPages:      17873320 kB
Mapped:           373056 kB
Shmem:               696 kB
Slab:            1337196 kB
SReclaimable:    1158852 kB
SUnreclaim:       178344 kB
KernelStack:        8816 kB
PageTables:        41244 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15709820 kB
Committed_AS:   24212340 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       61388 kB
VmallocChunk:   34359668736 kB
HardwareCorrupted:     0 kB
AnonHugePages:  17457152 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      180224 kB
DirectMap2M:    31950848 kB

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x428
cpu MHz		: 2494.012
cache size	: 25600 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
bugs		:
bogomips	: 4988.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
....

Java application that triggered the crash: Cassandra 2.1.12
Comment 1 nickkrause 2016-03-15 02:11:55 UTC
Have you tried a newer rc candidate to see if this bug has been fixed.
Comment 2 Johnny 2016-03-15 13:07:54 UTC
Unfortunately not, as I don't know how to reproduce the issue and it happened in a production environment where we rely on the version that the CoreOS distribution provides.
Comment 3 nickkrause 2016-03-15 15:47:17 UTC
If you tell me what Cassandra was doing I may be able to find the issue through reading the code carefully but I would like to still test it to make sure.
Comment 4 Johnny 2016-04-11 13:21:15 UTC
Another crash today with a similar trace output:

```
[511806.488629] general protection fault: 0000 [#1] SMP
[511806.489335] Modules linked in: xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 vxlan ip6_udp_tunnel udp_tunnel iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat nf_conntrack bridge stp llc xfs libcrc32c nls_ascii nls_cp437 vfat fat xenfs xen_privcmd ext4 crc16 mbcache jbd2 crc32c_intel hmac drbg aesni_intel ata_piix aes_x86_64 glue_helper libata lrw mousedev gf128mul ablk_helper cryptd xen_blkfront microcode i2c_piix4 firmware_class scsi_mod psmouse i2c_core ixgbevf evdev acpi_cpufreq button sch_fq_codel ip_tables autofs4
[511806.520082] CPU: 2 PID: 57829 Comm: java Not tainted 4.2.2-coreos-r2 #2
[511806.529094] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[511806.529094] task: ffff8801636e0000 ti: ffff88015aaf0000 task.ti: ffff88015aaf0000
[511806.529094] RIP: 0010:[<ffffffff812c3bf9>]  [<ffffffff812c3bf9>] strnlen+0x9/0x40
[511806.529094] RSP: 0018:ffff88015aaf3128  EFLAGS: 00010086
[511806.529094] RAX: ffffffff817c48ce RBX: ffffffff8356e003 RCX: 0000000000000000
[511806.529094] RDX: 017fff0000080078 RSI: ffffffffffffffff RDI: 017fff0000080078
[511806.529094] RBP: ffff88015aaf3128 R08: 000000000000ffff R09: 000000000000ffff
[511806.529094] R10: ffff880770658f80 R11: ffff88072d51e888 R12: 017fff0000080078
[511806.529094] R13: ffffffff8356e3a0 R14: 00000000ffffffff R15: 0000000000000000
[511806.529094] FS:  00007ff4a85f8700(0000) GS:ffff880770640000(0000) knlGS:0000000000000000
[511806.529094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[511806.529094] CR2: 00007ff65800e000 CR3: 00000006e0ff1000 CR4: 00000000001406e0
[511806.529094] Stack:
[511806.529094]  ffff88015aaf3168 ffffffff812c58ff 0000000000000296 ffffffff8356e003
[511806.529094]  ffffffff8356e3a0 ffff88015aaf32b0 ffffffff817c9288 ffffffff817c9288
[511806.529094]  ffff88015aaf31e8 ffffffff812c73b3 ffff88015aaf31b8 ffffffff81154868
[511806.529094] Call Trace:
[511806.529094]  [<ffffffff812c58ff>] string.isra.4+0x3f/0xd0
[511806.529094]  [<ffffffff812c73b3>] vsnprintf+0x163/0x510
[511806.529094]  [<ffffffff81154868>] ? free_hot_cold_page_list+0x48/0xa0
[511806.529094]  [<ffffffff812c7771>] vscnprintf+0x11/0x40
[511806.529094]  [<ffffffff810bd548>] vprintk_emit+0x128/0x530
[511806.529094]  [<ffffffff810bda9f>] vprintk_default+0x1f/0x30
[511806.529094]  [<ffffffff815250d3>] printk+0x46/0x48
[511806.529094]  [<ffffffff811a318a>] kmem_cache_free+0x13a/0x1f0
[511806.529094]  [<ffffffff810afc01>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[511806.529094]  [<ffffffffa003f0ce>] 0xffffffffa003f0ce
[511806.529094]  [<ffffffffa003ffac>] mb_cache_shrink+0x2bc/0x3a0 [mbcache]
[511806.529094]  [<ffffffff8115ef3d>] shrink_slab+0x1ed/0x370
[511806.529094]  [<ffffffff8109cd00>] ? enqueue_entity+0x3e0/0xdc0
[511806.529094]  [<ffffffff81163283>] shrink_zone+0x283/0x290
[511806.529094]  [<ffffffff811633ec>] do_try_to_free_pages+0x15c/0x430
[511806.529094]  [<ffffffff8116377a>] try_to_free_pages+0xba/0x130
[511806.529094]  [<ffffffff8115658a>] __alloc_pages_nodemask+0x56a/0x970
[511806.529094]  [<ffffffff81199221>] alloc_pages_current+0x91/0x100
[511806.529094]  [<ffffffff811a3d9c>] new_slab+0x34c/0x440
[511806.529094]  [<ffffffff810afc01>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[511806.529094]  [<ffffffff811a4239>] __slab_alloc+0x3a9/0x490
[511806.529094]  [<ffffffffa017aa6f>] ? ext4_orphan_del+0x47ff/0xda20 [ext4]
[511806.529094]  [<ffffffff8126818c>] ? hashtab_search+0x5c/0x80
[511806.529094]  [<ffffffff81274787>] ? mls_level_isvalid+0x57/0x60
[511806.529094]  [<ffffffffa017aa6f>] ? ext4_orphan_del+0x47ff/0xda20 [ext4]
[511806.529094]  [<ffffffff811a44b1>] kmem_cache_alloc+0x191/0x1f0
[511806.529094]  [<ffffffffa017aa6f>] ext4_orphan_del+0x47ff/0xda20 [ext4]
[511806.529094]  [<ffffffff811d7a9d>] alloc_inode+0x1d/0x90
[511806.529094]  [<ffffffff811d98a1>] new_inode_pseudo+0x11/0x60
[511806.529094]  [<ffffffff811d990b>] new_inode+0x1b/0x40
[511806.529094]  [<ffffffffa0163c7f>] __ext4_new_inode+0x7f/0x1190 [ext4]
[511806.529094]  [<ffffffffa017463c>] ext4_insert_dentry+0x188c/0x1900 [ext4]
[511806.529094]  [<ffffffff811c9e2a>] vfs_create+0xca/0x130
[511806.529094]  [<ffffffff8123c748>] ovl_create_real+0xb8/0x230
[511806.529094]  [<ffffffff8123d9ab>] ovl_create_or_link+0x10b/0x500
[511806.529094]  [<ffffffff8123dddd>] ovl_create_object+0x3d/0x60
[511806.529094]  [<ffffffff8125d533>] ? selinux_inode_create+0x13/0x20
[511806.529094]  [<ffffffff8123deb1>] ovl_create+0x21/0x30
[511806.529094]  [<ffffffff811c9e2a>] vfs_create+0xca/0x130
[511806.529094]  [<ffffffff811cc3f1>] path_openat+0xab1/0x13e0
[511806.529094]  [<ffffffff811cce9b>] ? putname+0x5b/0x60
[511806.529094]  [<ffffffff81090f6f>] ? wake_up_q+0x2f/0x70
[511806.529094]  [<ffffffff811a4499>] ? kmem_cache_alloc+0x179/0x1f0
[511806.529094]  [<ffffffff811cdddb>] do_filp_open+0x7b/0xe0
[511806.529094]  [<ffffffff811daeb9>] ? __alloc_fd+0x89/0x110
[511806.529094]  [<ffffffff811bd27c>] do_sys_open+0x12c/0x210
[511806.529094]  [<ffffffff81021b4f>] ? syscall_trace_enter_phase1+0xff/0x150
[511806.529094]  [<ffffffff811bd37e>] SyS_open+0x1e/0x20
[511806.529094]  [<ffffffff8152bbae>] entry_SYSCALL_64_fastpath+0x12/0x71
[511806.529094] Code: 00 00 80 3f 00 55 48 89 e5 74 11 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 5d c3 31 c0 5d c3 66 90 55 48 85 f6 48 89 e5 74 2d <80> 3f 00 74 28 48 8d 47 01 48 01 fe eb 0a 48 83 c0 01 80 78 ff
[511806.529094] RIP  [<ffffffff812c3bf9>] strnlen+0x9/0x40
[511806.529094]  RSP <ffff88015aaf3128>
[511806.529094] ---[ end trace 045dada6ce1782d4 ]---
[511806.529094] Kernel panic - not syncing: Fatal exception
[511806.529094] Kernel Offset: disabled
```

It could possibly be related to making backups of the data files of cassandra at the same time. As there are no logs from cassandra at the moment of the crash, it's hard to know exactly what it's trying to do.
A general observation is that in both traces there is something mentioning deleting files on ext4 while the cassandra storage is supposed to use xfs in our mount table. Also cassandra is doing file compactions moving data around pretty much all the time but there are no extra ordinary readings from the disk statistics at the time of the crash.

Additional note is that the version of cassandra is 2.1.11-1, not .12 as previously mentioned.
Also linux version is higher this time:
Linux version 4.3.6-coreos (buildbot@ip-10-204-3-57) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.3, pie-0.6.3) ) #2 SMP Tue Apr 5 10:32:16 UTC 2016
Comment 5 Johnny 2016-04-27 07:26:39 UTC
And another with cassandra 2.1.13 and again kernel 4.3.6:

```
[121437.908906] general protection fault: 0000 [#1] SMP
[121437.912476] Modules linked in: veth xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 vxlan ip6_udp_tunnel udp_tunnel iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter nf_nat nf_conntrack br_netfilter bridge stp llc overlay xfs libcrc32c crc32c_generic nls_ascii nls_cp437 vfat fat xenfs xen_privcmd ext4 crc16 mbcache jbd2 crc32c_intel hmac drbg aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper ata_piix cryptd mousedev libata xen_blkfront microcode firmware_class psmouse evdev scsi_mod i2c_piix4 ixgbevf i2c_core acpi_cpufreq tpm_tis tpm button sch_fq_codel ip_tables autofs4
[121437.936337] CPU: 2 PID: 66 Comm: kswapd0 Not tainted 4.3.6-coreos #2
[121437.936337] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[121437.936337] task: ffff8803bde79d00 ti: ffff8803b9ef0000 task.ti: ffff8803b9ef0000
[121437.936337] RIP: 0010:[<ffffffffad1ac714>]  [<ffffffffad1ac714>] kmem_cache_free+0x74/0x1e0
[121437.936337] RSP: 0018:ffff8803b9ef3bf8  EFLAGS: 00010246
[121437.936337] RAX: 017fff0000000080 RBX: ffff8803819614e0 RCX: 000000010027001b
[121437.936337] RDX: 000077ff80000000 RSI: ffff8803819614e0 RDI: ffffea0006c579a0
[121437.936337] RBP: ffff8803b9ef3c10 R08: 0000000081961401 R09: ffffffffc02a001a
[121437.936337] R10: ffffea000934e060 R11: ffffea000e065840 R12: ffffea0006c579a0
[121437.936337] R13: 0000000000000059 R14: 0000000000000080 R15: ffffffffc02a3000
[121437.936337] FS:  0000000000000000(0000) GS:ffff8803cfc40000(0000) knlGS:0000000000000000
[121437.936337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[121437.936337] CR2: 00007f3084000000 CR3: 000000002da0b000 CR4: 00000000001406e0
[121437.936337] Stack:
[121437.936337]  ffffea000934e040 ffff8803b9ef3c38 0000000000000059 ffff8803b9ef3c28
[121437.936337]  ffffffffc02a001a ffff880381961270 ffff8803b9ef3c68 ffffffffc02a103f
[121437.936337]  ffff880128c393a8 ffff8802663be680 00000000a184fd14 0000000000000088
[121437.936337] Call Trace:
[121437.936337]  [<ffffffffc02a001a>] 0xffffffffc02a001a
[121437.936337]  [<ffffffffc02a103f>] mb_cache_entry_find_next+0x17f/0x270 [mbcache]
[121437.936337]  [<ffffffffad1669ae>] shrink_slab.part.42+0x1de/0x370
[121437.936337]  [<ffffffffad16aa8d>] shrink_zone+0x28d/0x2d0
[121437.936337]  [<ffffffffad16ba91>] kswapd+0x551/0x9e0
[121437.936337]  [<ffffffffad16b540>] ? mem_cgroup_shrink_node_zone+0x190/0x190
[121437.936337]  [<ffffffffad08e178>] kthread+0xd8/0xf0
[121437.936337]  [<ffffffffad08e0a0>] ? kthread_park+0x60/0x60
[121437.936337]  [<ffffffffad54749f>] ret_from_fork+0x3f/0x70
[121437.936337]  [<ffffffffad08e0a0>] ? kthread_park+0x60/0x60
[121437.936337] Code: 01 d8 48 0f 42 15 1d 59 86 00 4c 8b 4d 08 48 01 d0 48 c1 e8 0c 48 c1 e0 06 49 01 c3 49 8b 03 f6 c4 80 0f 85 56 01 00 00 4c 8b 17 <65> 49 8b 52 08 65 4c 03 15 e7 d9 e5 52 4d 3b 5a 10 0f 85 29 01
[121437.936337] RIP  [<ffffffffad1ac714>] kmem_cache_free+0x74/0x1e0
[121437.936337]  RSP <ffff8803b9ef3bf8>
[121438.134600] ---[ end trace 199019257ae805c3 ]---
[121438.137849] Kernel panic - not syncing: Fatal exception
[121438.138841] Kernel Offset: 0x2c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
```
Comment 6 Johnny 2016-04-27 07:36:35 UTC
What's new in latest trace is that the comm is kswapd0 and not java.
Comment 7 Theodore Tso 2016-04-27 21:35:28 UTC
The comm field doesn't really matter all that much.  The crash is in the mbcache slab shrinker, which gets called from the VM when the system us under memory pressure.

It looks like the crash is in the extended attribute cache which is in turn triggered by SELinux.   (As far as I know Cassandra doesn't use extended attributes.)

Note that the 4.3.x kernel is not a long-term supported kernel, and it's no longer automatically getting bug fixes ported to it, at least not in the upstream.  If CoreOS is providing their own security updates, then you should really ask them for support because this would be a distro-kernel that has changes not seen or supported by usptream developers.

Note You need to log in before you can comment on or make changes to this bug.