Created attachment 218291 [details] Diff of kernel configs between 4.5.4 and 4.6.0 reveals no relevant changes Kdump still not start due to /sys/kernel/security/securelevel missing on kernel 4.6.0 (only), 4.5.4 did not have this issue. I initially thought this was a packaging error from the kernel-ml team, but upon closer inspection of the kernel configs and RPM SPECs there are no relevant differences between kernel 4.5.4 where kdump works and 4.6.0 where it does not. Working on 4.5.4: ``` root@int-kube-01:~ # uname -a Linux int-kube-01 4.5.4-1.el7.elrepo.x86_64 #1 SMP Thu May 12 12:17:54 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux root@int-kube-01:~ # getenforce Enforcing root@int-kube-01:~ # service kdump status Redirecting to /bin/systemctl status kdump.service ● kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled) Active: active (exited) since Sun 2016-05-29 16:01:49 AEST; 1 day 18h ago Main PID: 1133 (code=exited, status=0/SUCCESS) CGroup: /system.slice/kdump.service May 29 16:01:47 int-kube-01 systemd[1]: Starting Crash recovery kernel arming... May 29 16:01:49 int-kube-01. kdumpctl[1133]: cat: /sys/kernel/security/securelevel: No such file or directory May 29 16:01:49 int-kube-01 kdumpctl[1133]: kexec: loaded kdump kernel May 29 16:01:49 int-kube-01 kdumpctl[1133]: Starting kdump: [OK] May 29 16:01:49 int-kube-01 systemd[1]: Started Crash recovery kernel arming. ``` Kdump will not start on 4.6.0 as /sys/kernel/security/securelevel is missing: ``` root@int-kube-01:~ # uname -a Linux int-kube-01 4.6.0-1.el7.elrepo.x86_64 #1 SMP Mon May 16 10:54:52 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux root@int-kube-01:~ # getenforce Enforcing root@int-kube-01:~ # service kdump status Redirecting to /bin/systemctl status kdump.service ● kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2016-05-31 10:39:24 AEST; 14s ago Process: 1091 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE) Main PID: 1091 (code=exited, status=1/FAILURE) May 31 10:39:24 int-kube-01 dracut[3585]: lrwxrwxrwx 1 root root 6 May 31 10:38 var/run -> ../run May 31 10:39:24 int-kube-01 dracut[3585]: ======================================================================== May 31 10:39:24 int-kube-01 kdumpctl[1091]: cat: /sys/kernel/security/securelevel: No such file or directory May 31 10:39:24 int-kube-01 kdumpctl[1091]: Cannot load /boot/vmlinuz-4.6.0-1.el7.elrepo.x86_64 May 31 10:39:24 int-kube-01 kdumpctl[1091]: kexec: failed to load kdump kernel May 31 10:39:24 int-kube-01 kdumpctl[1091]: Starting kdump: [FAILED] May 31 10:39:24 int-kube-01 systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE May 31 10:39:24 int-kube-01 systemd[1]: Failed to start Crash recovery kernel arming. May 31 10:39:24 int-kube-01 systemd[1]: Unit kdump.service entered failed state. May 31 10:39:24 int-kube-01 systemd[1]: kdump.service failed. ```
Can you please post the complete dmesg of the 2 kernels for both 4.5.4 and 4.6 ?
Created attachment 218511 [details] 4.5.4 dmesg output
Created attachment 218521 [details] 4.6.0 dmesg output
(In reply to Navin from comment #1) > Can you please post the complete dmesg of the 2 kernels for both 4.5.4 and > 4.6 ? Thanks Navin, I have attached the output from the kernels to this ticket.
What is the output of service kdumpctl status in both cases ? What is that output of cat /sys/kernel/kexec_* in both cases ? Most probably your kexec failed because of the crashkernel size increasing it to 256M worked for me , maybe for your too it will work or 384M. Please try crashkernel=256M since i was short for memory and 384M didn't work for me. Since you have 16GB (from dmesg) 384M may also work for you.
Thanks Navin, sorry for my slow response I've been off sick and am way behind on emails. There is no change in the output between the two kernels. Kernel 4.6.2: root@int-kube-01:~ # cat /sys/kernel/kexec_* 0 134217728 0 Kernel 4.5.4: root@int-kube-01:~ # cat /sys/kernel/kexec_* 1 134217728 0 The only issue I have with making the crashkernel memory as large as say 384M is that when you've got a small / lightweight server such as a lightweight host that just runs nginx, it may only have 512MB of memory but still be important enough to want to run kdump. I probably need to do some reading up on the behaviour of crashkernel=auto these days I remember some time long ago this causing issues with allocating too much or too little memory to kdump and thus was forced to 128 or 256MB across all servers.
The /sys/kernel/kexec_loaded is 0 in 4.6.2 and 4.5.4 it is 1. So it loads the crashkernel in 4.5.4 whereas it fails in 4.6.2 mostly due to memory size. You could still try 256M or 384M to be certain that the problem is due to RAM size and then take a call whether it is the issue or are we overlooking some other thing .Since it is a VM increasing memory is just a matter of command line argument/parameter change before you start.Also i don't think crashkernel=auto worked for me rather 256M /384M did actually work. Once you are sure it is a memory size parameter problem you can do things like cat /proc/iomem | grep -i RAM and see the memory mappings from root(uid=0). For ex in my case on a 8G machine i have last 2 entries as 100000000-1fbffffff : System RAM 200000000-23bffffff : System RAM These has sizes fbffffff and 3bffffff respectively ie (approx 4G and 1G sized at offsets 4G and 8G ) so you can have Y@X format like crashkernel=384M@4096M . The last system RAM is a little more than what i have as physical RAM (8G) . So i used the second last in my case 100000000-1fbffffff .Also if have KSM (same page merging) running on host you don't usually need to be worried about the RAM lost in each of the VMs unless each VM has all the pages different from each other. You would find in journalctl somewhere usable memory not found or hole not found error in journalctl .
Hello, Is there any resolution on this issue? I am noticing same issue on 4.6.3 and 4.7.0 kernel. If I go back and boot my system in 4.5.0 kernel I am able to see kdump service as operational. Let me know if any relevant logs are needed. Thanks
Here's some more details # service kdump status Kdump is operational # uname -a Linux dut4062 4.5.0+ #35 SMP Fri Jul 22 10:27:17 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux However, when I boot into 4.6.3 kernel, I keep error # uname -a Linux dut4062 4.6.3+ #1 SMP Mon Jul 11 22:19:47 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux # service kdump status Kdump is not operational # service kdump start Memory for crashkernel is not reserved Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel Starting kdump: [FAILED] # cat /proc/cmdline ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=384M rd_LVM_LV=VolGroup/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb console=ttyS0 console=ttyS0,115200n8
FYI - We notice that this is still an issue in Kernel 4.6.4 and 4.7.0, the security policies still seem to be missing. Changing the crashkernel size even right up to 512M makes no difference.
I've noticed same failure of reserving memory for kernel on 4.6.3 and 4.7.0. I played around with changing crashkernel size upto 1024M and still not able to get kdump service to start. Let me know if you need any specific information from my setup.
Hello, I tried again today with 4.8.0-rc2+ kernel and it seems like kexec is ignoring "crashkenrel" parameter. Here's information from my setup. # uname -r 4.8.0-rc2+ # cat /proc/cmdline ro root=/dev/mapper/vg_dut4110-lv_root rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=512M rd_LVM_LV=vg_dut4110/lv_swap rd_LVM_LV=vg_dut4110/lv_root rd_NO_DM rhgb quiet # service kdump status Kdump is not operational # service kdump start Memory for crashkernel is not reserved Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel Starting kdump: [FAILED] message file confirms that kexec was not able to start the service Aug 15 10:41:17 dut4110 kdump: kexec: failed to load kdump kernel Aug 15 10:41:17 dut4110 kdump: failed to start up It looks to me that kexec is not able to parse crashkernel parameter. Note that same option is able to load kdump service for 4.3.0 and 4.5.7 kernel. Let me know if there is any other details that I can provide to make forward progress on this issue. Thanks
I'm getting similar symptoms with 4.8-rc5 on s390x, but I don't think this is related to /sys/kernel/security/securelevel missing. In my case this turned out to be related to: 51d7b120418e "/proc/iomem: only expose physical resource addresses to privileged users" When systemd runs kdump service, kexec process can't see any of the addresses in /proc/iomem, instead it reads only zeroes: ... 08:52:18.275793 open("/proc/iomem", O_RDONLY) = 3 08:52:18.275955 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 08:52:18.276031 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3ff8e576000 08:52:18.276116 read(3, "00000000-00000000 : System RAM\n 00000000-00000000 : Kernel code\n 00000000-00000000 : Kernel data\n 00000000-00000000 : Kernel bss\n00000000-00000000 : Crash kernel\n", 1024) = 165 08:52:18.276474 read(3, "", 1024) = 0 08:52:18.276549 close(3) = 0 08:52:18.276630 munmap(0x3ff8e576000, 4096) = 0 08:52:18.276725 write(2, "Memory for crashkernel is not reserved\nPlease reserve memory by passing\"crashkernel=X@Y\" parameter to kernel\nThen try to loading kdump kernel\n", 142) = 142 08:52:18.276827 exit_group(1) = ? 08:52:18.277038 +++ exited with 1 +++ As result of that kexec fails with "Memory for crashkernel is not reserved" and there's an AVC denial in audit.log: type=AVC msg=audit(1473425733.404:102): avc: denied { sys_admin } for pid=5176 comm="kexec" capability=21 scontext=system_u:system_r:kdump_t:s0 tcontext=system_u:system_r:kdump_t:s0 tclass=capability permissive=0 You could confirm this is your case as well by running (as root): 1. kdumpctl restart or 2. setenforce 0; systemctl restart kdump