Bug 119291 - Kdump fails to start on Kernel 4.6.0 as /sys/kernel/security/securelevel is missing
Summary: Kdump fails to start on Kernel 4.6.0 as /sys/kernel/security/securelevel is m...
Status: NEW
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-31 01:01 UTC by Sam McLeod
Modified: 2017-07-11 11:37 UTC (History)
8 users (show)

See Also:
Kernel Version: 4.7.0
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Diff of kernel configs between 4.5.4 and 4.6.0 reveals no relevant changes (7.63 KB, application/octet-stream)
2016-05-31 01:01 UTC, Sam McLeod
Details
4.5.4 dmesg output (44.65 KB, text/plain)
2016-05-31 23:19 UTC, Sam McLeod
Details
4.6.0 dmesg output (45.11 KB, text/plain)
2016-05-31 23:19 UTC, Sam McLeod
Details

Description Sam McLeod 2016-05-31 01:01:57 UTC
Created attachment 218291 [details]
Diff of kernel configs between 4.5.4 and 4.6.0 reveals no relevant changes

Kdump still not start due to /sys/kernel/security/securelevel missing on kernel 4.6.0 (only), 4.5.4 did not have this issue.

I initially thought this was a packaging error from the kernel-ml team, but upon closer inspection of the kernel configs and RPM SPECs there are no relevant differences between kernel 4.5.4 where kdump works and 4.6.0 where it does not.

Working on 4.5.4:

```
root@int-kube-01:~ # uname -a
Linux int-kube-01 4.5.4-1.el7.elrepo.x86_64 #1 SMP Thu May 12 12:17:54 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

root@int-kube-01:~ # getenforce
Enforcing

root@int-kube-01:~ # service kdump status
Redirecting to /bin/systemctl status kdump.service
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: active (exited) since Sun 2016-05-29 16:01:49 AEST; 1 day 18h ago
 Main PID: 1133 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump.service

May 29 16:01:47 int-kube-01 systemd[1]: Starting Crash recovery kernel arming...
May 29 16:01:49 int-kube-01. kdumpctl[1133]: cat: /sys/kernel/security/securelevel: No such file or directory
May 29 16:01:49 int-kube-01 kdumpctl[1133]: kexec: loaded kdump kernel
May 29 16:01:49 int-kube-01 kdumpctl[1133]: Starting kdump: [OK]
May 29 16:01:49 int-kube-01 systemd[1]: Started Crash recovery kernel arming.
```

Kdump will not start on 4.6.0 as /sys/kernel/security/securelevel is missing:

```
root@int-kube-01:~ # uname -a
Linux int-kube-01 4.6.0-1.el7.elrepo.x86_64 #1 SMP Mon May 16 10:54:52 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

root@int-kube-01:~ # getenforce
Enforcing

root@int-kube-01:~ # service kdump status
Redirecting to /bin/systemctl status kdump.service
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2016-05-31 10:39:24 AEST; 14s ago
  Process: 1091 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 1091 (code=exited, status=1/FAILURE)

May 31 10:39:24 int-kube-01 dracut[3585]: lrwxrwxrwx 1 root root 6 May 31 10:38 var/run -> ../run
May 31 10:39:24 int-kube-01 dracut[3585]: ========================================================================
May 31 10:39:24 int-kube-01 kdumpctl[1091]: cat: /sys/kernel/security/securelevel: No such file or directory
May 31 10:39:24 int-kube-01 kdumpctl[1091]: Cannot load /boot/vmlinuz-4.6.0-1.el7.elrepo.x86_64
May 31 10:39:24 int-kube-01 kdumpctl[1091]: kexec: failed to load kdump kernel
May 31 10:39:24 int-kube-01 kdumpctl[1091]: Starting kdump: [FAILED]
May 31 10:39:24 int-kube-01 systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
May 31 10:39:24 int-kube-01 systemd[1]: Failed to start Crash recovery kernel arming.
May 31 10:39:24 int-kube-01 systemd[1]: Unit kdump.service entered failed state.
May 31 10:39:24 int-kube-01 systemd[1]: kdump.service failed.
```
Comment 1 Navin 2016-05-31 20:11:53 UTC
Can you please post the complete dmesg of the 2 kernels for both 4.5.4 and 4.6 ?
Comment 2 Sam McLeod 2016-05-31 23:19:01 UTC
Created attachment 218511 [details]
4.5.4 dmesg output
Comment 3 Sam McLeod 2016-05-31 23:19:18 UTC
Created attachment 218521 [details]
4.6.0 dmesg output
Comment 4 Sam McLeod 2016-05-31 23:19:53 UTC
(In reply to Navin from comment #1)
> Can you please post the complete dmesg of the 2 kernels for both 4.5.4 and
> 4.6 ?

Thanks Navin, I have attached the output from the kernels to this ticket.
Comment 5 Navin 2016-06-01 07:26:40 UTC
What is the output of service kdumpctl status in both cases ?
What is that output of cat /sys/kernel/kexec_* in both cases ?



Most probably your kexec failed because of the crashkernel size increasing it to 256M worked for me , maybe for your too it will work or 384M. Please try crashkernel=256M since i was short for memory and 384M didn't work for me. Since you have 16GB (from dmesg)  384M may also work for you.
Comment 6 Sam McLeod 2016-06-20 23:48:59 UTC
Thanks Navin, sorry for my slow response I've been off sick and am way behind on emails.

There is no change in the output between the two kernels.

Kernel 4.6.2:
root@int-kube-01:~  # cat /sys/kernel/kexec_*
0
134217728
0

Kernel 4.5.4:
root@int-kube-01:~  # cat /sys/kernel/kexec_*
1
134217728
0

The only issue I have with making the crashkernel memory as large as say 384M is that when you've got a small / lightweight server such as a lightweight host that just runs nginx, it may only have 512MB of memory but still be important enough to want to run kdump.

I probably need to do some reading up on the behaviour of crashkernel=auto these days I remember some time long ago this causing issues with allocating too much or too little memory to kdump and thus was forced to 128 or 256MB across all servers.
Comment 7 Navin 2016-06-21 05:14:24 UTC
The /sys/kernel/kexec_loaded is 0 in 4.6.2 and 4.5.4 it is 1. So it loads the crashkernel in 4.5.4 whereas it fails in 4.6.2 mostly due to memory size.

You could still try 256M or 384M to be certain that the problem is due to RAM size and then take a call whether it is the issue or are we overlooking some other thing .Since it is a VM increasing memory is just a matter of command line argument/parameter change before you start.Also i don't think crashkernel=auto worked for me rather 256M /384M did actually work. 
Once you are sure it is a memory size parameter problem you can do things like 

cat /proc/iomem | grep -i RAM and see the memory mappings from root(uid=0). For ex in my case on a 8G machine i have last 2 entries as 
100000000-1fbffffff : System RAM
200000000-23bffffff : System RAM
These has sizes fbffffff and 3bffffff  respectively ie (approx 4G and 1G sized at offsets 4G and 8G ) so you can have Y@X format like crashkernel=384M@4096M .
The last system RAM is a little more than what i have as physical RAM (8G) . So i used the second last in my case 100000000-1fbffffff .Also if have KSM (same page merging) running on host you don't usually need to be worried about the RAM lost in each of the VMs unless each VM has all the pages different from each other.

You would find in journalctl somewhere usable memory not found or hole not found error in journalctl .
Comment 8 himanshu.madhani@cavium.com 2016-07-26 23:02:21 UTC
Hello, 

Is there any resolution on this issue? 

I am noticing same issue on 4.6.3 and 4.7.0 kernel. If I go back and boot my system in 4.5.0 kernel I am able to see kdump service as operational. 

Let me know if any relevant logs are needed. 

Thanks
Comment 9 himanshu.madhani@cavium.com 2016-07-26 23:04:46 UTC
Here's some more details 

# service kdump status
Kdump is operational
# uname -a
Linux dut4062 4.5.0+ #35 SMP Fri Jul 22 10:27:17 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux


However, when I boot into 4.6.3 kernel, I keep error 

# uname -a
Linux dut4062 4.6.3+ #1 SMP Mon Jul 11 22:19:47 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
# service kdump status
Kdump is not operational

# service kdump start
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Starting kdump:                                            [FAILED]

# cat /proc/cmdline
ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=384M rd_LVM_LV=VolGroup/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb console=ttyS0 console=ttyS0,115200n8
Comment 10 Sam McLeod 2016-07-26 23:10:07 UTC
FYI - We notice that this is still an issue in Kernel 4.6.4 and 4.7.0, the security policies still seem to be missing.

Changing the crashkernel size even right up to 512M makes no difference.
Comment 11 himanshu.madhani@cavium.com 2016-07-27 16:59:39 UTC
I've noticed same failure of reserving memory for kernel on 4.6.3 and 4.7.0. I played around with changing crashkernel size upto 1024M and still not able to get kdump service to start. 

Let me know if you need any specific information from my setup.
Comment 12 himanshu.madhani@cavium.com 2016-08-15 17:50:20 UTC
Hello, 

I tried again today with 4.8.0-rc2+ kernel and it seems like kexec is ignoring "crashkenrel" parameter. 

Here's information from my setup.

# uname -r
4.8.0-rc2+

# cat /proc/cmdline 
ro root=/dev/mapper/vg_dut4110-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=512M rd_LVM_LV=vg_dut4110/lv_swap rd_LVM_LV=vg_dut4110/lv_root rd_NO_DM rhgb quiet

# service kdump status
Kdump is not operational

# service kdump start
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Starting kdump:                                            [FAILED]

message file confirms that kexec was not able to start the service 

Aug 15 10:41:17 dut4110 kdump: kexec: failed to load kdump kernel
Aug 15 10:41:17 dut4110 kdump: failed to start up

It looks to me that kexec is not able to parse crashkernel parameter. 

Note that same option is able to load kdump service for 4.3.0 and 4.5.7 kernel.  

Let me know if there is any other details that I can provide to make forward progress on this issue. 

Thanks
Comment 13 Jan Stancek 2016-09-09 13:17:49 UTC
I'm getting similar symptoms with 4.8-rc5 on s390x, but I don't think this is related to /sys/kernel/security/securelevel missing.

In my case this turned out to be related to:
51d7b120418e "/proc/iomem: only expose physical resource addresses to privileged users"

When systemd runs kdump service, kexec process can't see any of the addresses in /proc/iomem, instead it reads only zeroes:
...
08:52:18.275793 open("/proc/iomem", O_RDONLY) = 3
08:52:18.275955 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
08:52:18.276031 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3ff8e576000
08:52:18.276116 read(3, "00000000-00000000 : System RAM\n  00000000-00000000 : Kernel code\n  00000000-00000000 : Kernel data\n  00000000-00000000 : Kernel bss\n00000000-00000000 : Crash kernel\n", 1024) = 165
08:52:18.276474 read(3, "", 1024)       = 0
08:52:18.276549 close(3)                = 0
08:52:18.276630 munmap(0x3ff8e576000, 4096) = 0
08:52:18.276725 write(2, "Memory for crashkernel is not reserved\nPlease reserve memory by passing\"crashkernel=X@Y\" parameter to kernel\nThen try to loading kdump kernel\n", 142) = 142
08:52:18.276827 exit_group(1)           = ?
08:52:18.277038 +++ exited with 1 +++

As result of that kexec fails with "Memory for crashkernel is not reserved" and there's an AVC denial in audit.log:
type=AVC msg=audit(1473425733.404:102): avc:  denied  { sys_admin } for  pid=5176 comm="kexec" capability=21  scontext=system_u:system_r:kdump_t:s0 tcontext=system_u:system_r:kdump_t:s0 tclass=capability permissive=0

You could confirm this is your case as well by running (as root):
1. kdumpctl restart
or
2. setenforce 0; systemctl restart kdump

Note You need to log in before you can comment on or make changes to this bug.