Bug 15822
Summary: | Host bridge reported broken by .34-rc5 kernel | ||
---|---|---|---|
Product: | Drivers | Reporter: | Sten Heinze (sten.heinze) |
Component: | PCI | Assignee: | Bjorn Helgaas (bjorn.helgaas) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | akpm, bjorn.helgaas |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34-rc5 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg output from 2.6.34-rc5
debug patch patch to revert warning |
Mutter. Bjorn, that was a fairly unuseful message you added. Could we have made it a bit more helpful to the testers? Like, tell them to mail bjorn.helgaas@hp.com :) I fear that I'm going to get showered with reports of this message coming out, I'll dutifully direct these emails to yourself and Jesse and nothing will happen and the reports will keep coming :( Am I wrong? In a way I guess this should really go into the post-2.6.33 regressions bucket - those warnings didn't come out in 2.6.33. But if we really want these messages coming out of 2.6.34 then there's no point in treating it as a regression. Thanks for the report, Sten. I added that message after spending a few days debugging a device that turned out to be physically broken, so that it might be easier to debug next time. Obviously in this case, the device is NOT broken, so I need to make that test smarter or remove the message altogether. Thanks for the replies. Good to know the device is not broken. Let me know if I can help and test a new patch. Created attachment 26083 [details]
debug patch
Sten, would you mind trying this patch? If you just respond with the output of "dmesg | grep 0000:00:00.0", that should be enough.
I think what probably happened is that we read 0x8 from the BAR when sizing it. The spec says we calculate the size by clearing the encoding bits (0xf in this case), inverting what's left, and incrementing by one. That would be (~(0x8 & ~0xf)) + 1 == 0, so that seems sensible (a 32-bit prefetchable memory BAR of size zero).
Assuming your test results confirm this, I think I'll just revert that patch for now and revisit it after 2.6.34. The warning was useful for a device where we read 0x7fffe000 from the BAR. That's clearly invalid (because the MSB is not set), and we should be able to distinguish that from the case you're seeing, but it will require a little too much work for this stage of the 2.6.34 release.
dmesg | grep 0000:00:00.0 [ 0.275450] pci 0000:00:00.0: reg 10: l 0x8 (original) [ 0.275456] pci 0000:00:00.0: reg 10: sz 0x8 (original) [ 0.275464] pci 0000:00:00.0: reg 10: type 2 flags 0x42208 l 0x0 mask 0xfffffff0 [ 0.275471] pci 0000:00:00.0: reg 14: l 0x0 (original) [ 0.275477] pci 0000:00:00.0: reg 14: sz 0x0 (original) [ 0.275483] pci 0000:00:00.0: reg 18: l 0x0 (original) [ 0.275489] pci 0000:00:00.0: reg 18: sz 0x0 (original) [ 0.275495] pci 0000:00:00.0: reg 1c: l 0x0 (original) [ 0.275502] pci 0000:00:00.0: reg 1c: sz 0x0 (original) [ 0.275508] pci 0000:00:00.0: reg 20: l 0x0 (original) [ 0.275514] pci 0000:00:00.0: reg 20: sz 0x0 (original) [ 0.275520] pci 0000:00:00.0: reg 24: l 0x0 (original) [ 0.275526] pci 0000:00:00.0: reg 24: sz 0x0 (original) [ 0.275532] pci 0000:00:00.0: reg 30: l 0x0 (original) [ 0.275539] pci 0000:00:00.0: reg 30: sz 0x0 (original) [ 1.954969] agpgart-intel 0000:00:00.0: Intel 855GM Chipset [ 1.955435] agpgart-intel 0000:00:00.0: detected 8060K stolen memory [ 1.964026] agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0xe0000000 Hope that helps. Let me know if you need me to try more. Created attachment 26095 [details]
patch to revert warning
Here's the patch I just posted to revert the warning for now.
This warning was removed by commit 45aa23b4cb, which was included in 2.6.34. |
Created attachment 26072 [details] dmesg output from 2.6.34-rc5 dmesg reports on booting a 2.6.34-rc5 kernel: pci 0000:00:00.0: reg 10: invalid size (l 0x0 sz 0x8 mask 0xfffffff0); broken device? (Complete dmesg is attached.) The device is: 00:00.0 Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02) Subsystem: IBM ThinkPad R50e Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at <unassigned> (32-bit, prefetchable) Capabilities: <access denied> Kernel driver in use: agpgart-intel (The hardware is actually a Thinkpad X40.) Since I do not experience anything being not usable, is this a bug? What can I do to verify either the device being broken or it being a bug in the kernel? This message is not printed using a 2.6.33 kernel.