Bug 203837
Summary: | Booting kernel under KVM immediately freezes host | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Shawn Anastasio (shawn) |
Component: | PPC-64 | Assignee: | platform_ppc-64 |
Status: | NEW --- | ||
Severity: | blocking | CC: | dan, paulus |
Priority: | P1 | ||
Hardware: | PPC-64 | ||
OS: | Linux | ||
Kernel Version: | v5.2-rc2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Guest kernel config |
I have tried but not succeeded in replicating this problem. I have tried 5.2-rc3 in the host with the config I usually use, plus 5.2-rc3 in the guest with that same config. That boots just fine. With 5.2-rc3 in the host and my usual config, and 5.2-rc3 in the guest compiled with the config attached to this bug, the guest gets a kernel panic due to being unable to mount root. It looks like it never manages to load virtio-blk for some reason. With the config attached to this bug, I did once see the guest stop outputting messages after the message about bringing up CPUs. The host was still running just fine, and top in the host showed the qemu-system-ppc64 process using 100% of a CPU, consistent with the guest being in an infinite loop. I think we need more details about the machine where the crash is occurring - host kernel config, details of VM config (qemu command line or libvirt xml), etc. Just tried 5.1.7 in the host and got the guest locking up during boot. In xmon I see one cpu in pmdp_invalidate and another in handle_mm_fault. It seems very possible this is the bug that Nick Piggin's recent patch series fixes ("powerpc/64s: Fix THP PMD collapse serialisation"): http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=112348 bugzilla-daemon@bugzilla.kernel.org's on June 7, 2019 4:29 pm: > https://bugzilla.kernel.org/show_bug.cgi?id=203837 > > --- Comment #2 from Paul Mackerras (paulus@ozlabs.org) --- > Just tried 5.1.7 in the host and got the guest locking up during boot. In > xmon > I see one cpu in pmdp_invalidate and another in handle_mm_fault. It seems > very > possible this is the bug that Nick Piggin's recent patch series fixes > ("powerpc/64s: Fix THP PMD collapse serialisation"): > > http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=112348 It's worth a try, although the bug was introduced around 4.20 and I wasn't able to trigger it on radix, but other timing changes could cause it to trigger I suppose. pdbg (https://github.com/open-power/pdbg) is a useful tool for your BMC that can often get the CPU registers out even for bad crashes, this might help to narrow down the problem without bisecting. Thanks, Nick I have applied Nick's patchset to 5.1.7 but the issue still occurs. As for using pdbg, I'm aware of the tool's existence but I'm not sure how I would effectively use it to diagnose this issue. If anybody has some pointers, it'd be appreciated. |
Created attachment 283133 [details] Guest kernel config When booting kernel v5.2-rc2 (and confirmed up to 156c05917) in a VM on a POWER9 host running kernel 5.1.7, the host immediately locks up and becomes unresponsive to the point of requiring a hard reset. The last guest kernel message printed to the screen before the host locks up is: [ 0.013940] smp: Bringing up secondary CPUs ... Due to the nature of the bug, it is very difficult to bisect, since a manual host reset is required each time the bug is encountered. Also, my only POWER machine is my primary workstation. The bug has also been confirmed on other host kernel versions (down to 5.0.x). When downgrading the guest kernel to 5.1.0, the issue is not present. The guest kernel .config is attached.