Created attachment 183661 [details] Bisect (0d55ba4) Hi all, 'x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure' (commit 0d55ba46bfbee64fd2b492b87bfe2ec172e7b056) creates an regression on AMD 32bit architecture. The facts are: - You can't boot on i686 with more than one CPU core on AMD hardware (x86_64 however works) - By reverting 0d55ba4[4] the kernel boots. On error it produces followed error message: Failed to access perfctr msr (MSR c0010007 is 0) task: f58e0000 ti: f58e8000 task.ti: f58e800 EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0 EIP is at free_cache_attributes+0x83/0xd0 EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000 ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0 More detailed info can be found here: https://github.com/manjaro/packages-core/issues/14 kind regards Philip Müller -------------------------- Manjaro Project-Lead
Created attachment 183671 [details] Kernel panic (part1)
Created attachment 183681 [details] Kernel panic (part2)
Created attachment 183741 [details] Fix NULL pointer dereference in the error/cleanup path That's a trivial NULL pointer dereference in the error/cleanup path. Patch below should fix it. Thanks, tglx https://lkml.org/lkml/2015/7/26/16
Created attachment 183751 [details] Fix cache_shared_cpu_map_remove() checking for check sib_cpu_ci->info_list Well, I got a bit different, and of course totally untested possible solution: cache_shared_cpu_map_setup() does check sib_cpu_ci->info_list before setting cpumask bits while cache_shared_cpu_map_remove() doesn't. Ballancing this out would mean (see attachment). -- Regards/Gruss, Boris. https://lkml.org/lkml/2015/7/26/20
Created attachment 183881 [details] Final upstream patch Philip Müller reported a hang when booting 32-bit 4.1 kernel on an AMD box. A fragment of the splat was enough to pinpoint the issue: task: f58e0000 ti: f58e8000 task.ti: f58e800 EIP: 0060:[<c135a903>] EFLAGS: 00010206 CPU: 0 EIP is at free_cache_attributes+0x83/0xd0 EAX: 00000001 EBX: f589d46c ECX: 00000090 EDX: 360c2000 ESI: 00000000 EDI: c1724a80 EBP: f58e9ec0 ESP: f58e9ea0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 000000ac CR3: 01731000 CR4: 000006d0 cache_shared_cpu_map_setup() did check sibling CPUs cacheinfo descriptor while the respective teardown path cache_shared_cpu_map_remove() didn't. Fix that. From tglx's version: to be on the safe side, move the cacheinfo descriptor check to free_cache_attributes(), thus cleaning up the hotplug path a little and making this even more robust. -- Regards/Gruss, Boris.
https://bugzilla.redhat.com/show_bug.cgi?id=1253566 Is fix falls in stable 4.1.6 and mainline 4.2-rc7?
The quick answer is no. The long answer you can find here: https://lists.manjaro.org/pipermail/manjaro-dev/Week-of-Mon-20150803/000579.html
"Final upstream patch" https://bugzilla.kernel.org/attachment.cgi?id=183881 [PATCH] cpu/cacheinfo: Fix teardown path Booting OK on: - lscpu | egrep op-mode\|Vendor CPU op-mode(s): 32-bit, 64-bit Vendor ID: AuthenticAMD - lscpu | egrep op-mode\|Vendor CPU op-mode(s): 32-bit Vendor ID: AuthenticAMD Tested on installed system (Fedora release 22) with patched kernels: - uname -r 4.1.5-201.fc22.i686 - uname -r 4.2.0-0.rc6.git0.4.fc22.i686
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/base/cacheinfo.c?id=2110d70