Bug 101971
Summary: | Regression with 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' on AMD i686 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Philip Müller (philm) |
Component: | i386 | Assignee: | platform_i386 |
Status: | NEW --- | ||
Severity: | high | CC: | linux, philm, pomidorabelisima |
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
URL: | https://github.com/manjaro/packages-core/issues/14 | ||
Kernel Version: | 4.1, 4.1.x, 4.2-rcX | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Bisect (0d55ba4)
Kernel panic (part1) Kernel panic (part2) Fix NULL pointer dereference in the error/cleanup path Fix cache_shared_cpu_map_remove() checking for check sib_cpu_ci->info_list Final upstream patch |
Description
Philip Müller
2015-07-26 08:15:53 UTC
Created attachment 183671 [details]
Kernel panic (part1)
Created attachment 183681 [details]
Kernel panic (part2)
Created attachment 183741 [details] Fix NULL pointer dereference in the error/cleanup path That's a trivial NULL pointer dereference in the error/cleanup path. Patch below should fix it. Thanks, tglx https://lkml.org/lkml/2015/7/26/16 Created attachment 183751 [details] Fix cache_shared_cpu_map_remove() checking for check sib_cpu_ci->info_list Well, I got a bit different, and of course totally untested possible solution: cache_shared_cpu_map_setup() does check sib_cpu_ci->info_list before setting cpumask bits while cache_shared_cpu_map_remove() doesn't. Ballancing this out would mean (see attachment). -- Regards/Gruss, Boris. https://lkml.org/lkml/2015/7/26/20 Created attachment 183881 [details]
Final upstream patch
Philip Müller reported a hang when booting 32-bit 4.1 kernel on an AMD
box. A fragment of the splat was enough to pinpoint the issue:
task: f58e0000 ti: f58e8000 task.ti: f58e800
EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0
EIP is at free_cache_attributes+0x83/0xd0
EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000
ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0
cache_shared_cpu_map_setup() did check sibling CPUs cacheinfo descriptor
while the respective teardown path cache_shared_cpu_map_remove() didn't.
Fix that.
From tglx's version: to be on the safe side, move the cacheinfo
descriptor check to free_cache_attributes(), thus cleaning up the
hotplug path a little and making this even more robust.
--
Regards/Gruss,
Boris.
https://bugzilla.redhat.com/show_bug.cgi?id=1253566 Is fix falls in stable 4.1.6 and mainline 4.2-rc7? The quick answer is no. The long answer you can find here: https://lists.manjaro.org/pipermail/manjaro-dev/Week-of-Mon-20150803/000579.html "Final upstream patch" https://bugzilla.kernel.org/attachment.cgi?id=183881 [PATCH] cpu/cacheinfo: Fix teardown path Booting OK on: - lscpu | egrep op-mode\|Vendor CPU op-mode(s): 32-bit, 64-bit Vendor ID: AuthenticAMD - lscpu | egrep op-mode\|Vendor CPU op-mode(s): 32-bit Vendor ID: AuthenticAMD Tested on installed system (Fedora release 22) with patched kernels: - uname -r 4.1.5-201.fc22.i686 - uname -r 4.2.0-0.rc6.git0.4.fc22.i686 |