Bug 14124
Summary: | Boot failure with ICH6R in AHCI mode | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Thomas Jarosch (thomas.jarosch) |
Component: | Serial ATA | Assignee: | Tejun Heo (tj) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alexander.huemer, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.30.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Log files from serial console + kernel config
dmidecode and lspci output |
Description
Thomas Jarosch
2009-09-04 18:22:39 UTC
Looks like an IRQ delivery problem. Does irqpoll or pci=noacpi help? Yes, it really feels like an IRQ problem: Once the system hangs and I start to hit the return key, it will show one more line of output for every key pressed (=generated interrupt). Unfortunately, irqpoll, pci=noacpi or pci=nomsi didn't help. I've created boot logs with a serial console for every variant. I'll also attach my kernel config and a working boot log from kernel 2.6.29. The "pci=noacpi" boot log contains a backtrace related to IRQs. Maybe this helps a little: The box is packed with 6 network interfaces, so I guess it will have more IRQ sharing than the other boxes. It's a productive system, so I can only trash it on the weekend or in the evening. Created attachment 23014 [details]
Log files from serial console + kernel config
I've silently replaced the productive box with a "spare" one and now can play around with it. What I've tested so far: 2.6.29 vanilla -> ok 2.6.30 vanilla -> fails 2.6.31-rc9 -> fails I'll try to bisect 2.6.29 <-> 2.6.30. Here's my current bisect log: [root@intradev linux-2.6]# git bisect log git bisect start # bad: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 git bisect bad 07a2039b8eb0af4ff464efd3dfd95de5c02648c6 # good: [8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84] Linux 2.6.29 git bisect good 8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84 # bad: [3c6fae67d026d57f64eb3da9c0d0e76983e39ae3] Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6 git bisect bad 3c6fae67d026d57f64eb3da9c0d0e76983e39ae3 # bad: [a8416961d32d8bb757bcbb86b72042b66d044510] Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip git bisect bad a8416961d32d8bb757bcbb86b72042b66d044510 # good: [fe85ff8299538c8488645e7d72539079dad5bae6] usbnet: convert dms9601 driver to net_device_ops git bisect good fe85ff8299538c8488645e7d72539079dad5bae6 # good: [928a726b0e12184729900c076e13dbf1c511c96c] Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 git bisect good 928a726b0e12184729900c076e13dbf1c511c96c # bad: [d3f12d36f148f101c568bdbce795e41cd9ceadf3] Merge branch 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm git bisect bad d3f12d36f148f101c568bdbce795e41cd9ceadf3 # good: [61a091827e273650b39eb87c799a6d260913fa0b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 git bisect good 61a091827e273650b39eb87c799a6d260913fa0b # good: [71450f78853b82d55cda4e182c9db6e26b631485] KVM: Report IRQ injection status for MSI delivered interrupts git bisect good 71450f78853b82d55cda4e182c9db6e26b631485 # bad: [39f15003c7b268e4199d5ddce60a6944a74a14b7] Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 git bisect bad 39f15003c7b268e4199d5ddce60a6944a74a14b7 # bad: [9223d01b2fdf638a73888ad73a1784fca3454c1e] pata-rb532-cf: platform_get_irq() fix ignored failure git bisect bad 9223d01b2fdf638a73888ad73a1784fca3454c1e # good: [6be976e79db3ba691b657476a8bf4a635e5586f9] pata-rb532-cf: drop custom freeze and thaw git bisect good 6be976e79db3ba691b657476a8bf4a635e5586f9 I'm very close to the issue and have to abort for today :o) Maybe it's commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 "ahci: drop intx manipulation on msi enable"? Reverting a5bfc4714b3f01365aef89a92673f2ceb1ccf246 "ahci: drop intx manipulation on msi enable" solved the issue, I can now boot 2.6.30 / 2.6.31-rc9. Would it hurt to revert this patch? It was already talked about: http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg31682.html Any short comment on this one? Can you please post the output of "lspci -nnvvvxxx" and "dmidecode"? Created attachment 23098 [details]
dmidecode and lspci output
The logs are from kernel 2.6.30.5 + the reverted code in ahci.c.
Hmmm... for now I think reverting the offending commit is the right thing to do. I'll post a patch to revert it. Longer term, I think this is something the pci layer should take care of not individual drivers. I'll see whether there's a good place I can hook into. Thanks. > Thanks.
Thank -you- :)
Patch has been applied upstream. please see [1]. it seems like this issue also appears under other circumstances, my environment does not match the one reported earlier in this bug. it seems commit [2] did not make it in linux-2.6.31.1. i assume the first kernel releases not affected by the issue will be 2.6.30.9 (if that will be released) and 2.6.31.2, right ? [1] http://thread.gmane.org/gmane.linux.kernel/894187 [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc |