Bug 23952
Summary: | Bisected regression: kernel won't boot (MCP55 bridge) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Mathieu Bérard (mathieu) |
Component: | PCI | Assignee: | Neil Horman (nhorman) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | florian, nhorman, ozan |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.37-rc1+ | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 21782 | ||
Attachments: |
lspci -vvv
dmesg with 66db60eaf15 reverted patch to filter out mcp55 chips w/o hyperthread interface capabilities |
Description
Mathieu Bérard
2010-11-28 21:46:30 UTC
We can't just yank out that commit, without it kdump will not work on systems with the listed revisions of MCP55. Perhaps that the availability of that register might be more specific what we can tell withjust pci device and vendor id. Can you please post the output of lspci -vvv so that I can compare your Bridge to the one I worked with here? Thanks! Created attachment 38822 [details]
lspci -vvv
Created attachment 38832 [details]
dmesg with 66db60eaf15 reverted
complete dmesg log from 2.6.37-rc with commit 66db60eaf15 reverted
Well, I agree that fixing kdump on some systems is a very good thing. But if you think in terms of regression, that patch is introducing a very serious one: you just break Linux on some other systems, that seems a quite unfavorable balance to me. As a side note, kexec as always worked very well on this particular system. Would a dump of the pci config space of that PCI device on my system be useful to you ? You're correct, but its important to remember that the MCP55 is: 1) A fairly widespread northbridge 2) relatively old (nvidia no longer manufactures it as I understand it) Given those facts, along with the fact that you're the first to report this (this patch has been in RHEL since 5.3, over 1.5 years), I feel like this report may be a corner case, and as such something that we can fine tune the existing patch for. I'm really hoping that when i compare your pci output to what I have here we'll be able to do some filtering on pci sub-device id and get this working. If you could please, yes, your pci config space might be handy in comparison to my system here. thanks! Neil Okay, here is the conf space of my 10de:0360 device, on a kernel with the patch reverted: 10de 0360 000f 00a0 00a2 0601 0000 0080 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1043 8239 0000 0000 0000 0000 0000 0000 00ff 0000 1043 8239 f000 feff 3efa 00ff 3efa 00ff 3efa 00ff 5a00 0262 0000 0100 0000 ffff 0000 0000 0000 0000 0000 0000 0000 fff9 0010 ffff 80c5 0000 0000 1944 0000 0330 8009 1200 d201 d000 00f0 0100 00f0 0000 0800 0000 0000 0000 4721 8695 cdef 00ab 0001 c030 0000 0000 0000 0000 0000 0000 0290 02ef 0800 085f 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 2a50 fe00 e1fd b000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000 ok, so some good news. It appears that there are 2 major differences between your system and the ones I was testing against. The first is that this is a rev a1 while everything I tested on is a rev a2, so its possible that the register I need is non-existant on your system, and we can key off that to know if we should set it or not. Also, on the systems I test with, the MCP55 has a hypertransport capable interface to the cpu, whereas yours does not. Given that the problem that I sought to fix only occurs on AMD systems in which a hypertransport bus was used, I think the best solution is to key off that fact in the quirk, and not do anything if there is no ht bus. I'll have a patch for you later today Created attachment 39192 [details]
patch to filter out mcp55 chips w/o hyperthread interface capabilities
Hey, here you go, as promised. The problem that this quirk adresses was only noted on AMD systems with hypertransport busses, so this patch should cause the quirk to not be applied on MCP55 chips without the HT capability (which I think makes sense, as I would imagine this register may be non-existant on those versions of the chip). Anywho, I don't have a non-ht system here to test on, but with this patch, kdump still works on my mcp55 based server. If you could test on your non-hyperbus mcp55 based system and see if a post 2.6.36 system boots on it, I'd appreciate it. If you give it a thumbs up, I'll post it asap.
Thanks!
The patch allows latest git pull of 2.6.37-rc to boot on my system. Thanks ! Ok good, I'll clean it up and post it in the AM. Thanks! http://marc.info/?l=linux-kernel&m=129181976528659&w=2 posted for review This is fixed in mainline. fixed in .37-rc7 by commit 49c2fa08a77a7eefa4cbc73601f64984aceacfa7 Author: Neil Horman <nhorman@tuxdriver.com> Date: Wed Dec 8 09:47:48 2010 -0500 PCI: Update MCP55 quirk to not affect non HyperTransport variants |