Bug 12168

Summary: PCI ROM access through sysfs does not always work - 0 bytes read
Product: Drivers Reporter: Alex Villacis Lasso (avillaci)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, alexdeucher, ancoron, compnerd, greg, ingmar.stdin, romieu, wayland
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26.6-49.fc8 Subsystem:
Regression: No Bisected commit-id:
Attachments: perl script used to read ROM to file, as documented in kernel docs
Shell script with paths to corresponding PCI devices for tests
Add validation to ROM reading operation, fail with -EIO if validation fails
Add debugging information to PCI ROM mapping
dmesg output for 2.6.28-rc7 with patches applied - AGP/savage as primary, PCI/spitfire as secondary
Script output - AGP/savage as primary, PCI/spitfire as secondary
dmesg output for 2.6.28-rc7 with patches applied - AGP/savage as secondary, PCI/spitfire as primary
Script output - AGP/savage as secondary, PCI/spitfire as primary
Example of patch for error checking on rom size
Patch that helps debugging, but does not solve the problem in any way
Another possible patch for rom size?
Fixed version of readrom.pl
Makes pci-sysfs fail when it can't read rom, and emit warnings too
Patch that solves the problem; I know of no problems with it
pci: Return/dmesg errors when reading PCI ROMs
[PATCH] pci: Return/dmesg errors when reading PCI ROMs (take 2)

Description Alex Villacis Lasso 2008-12-05 14:03:20 UTC
Latest working kernel version:
Earliest failing kernel version:

2.6.26.6-49.fc8 for these examples, also seen with 2.6.28-rc6

Distribution:

Fedora 8, Fedora 10

Hardware Environment:

/proc/cpuinfo
-------------
[alex@srv64 ~]$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Celeron(R) CPU 2.80GHz
stepping        : 9
cpu MHz         : 2800.411
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up pebs bts pni monitor ds_cpl tm2 cid cx16 xtpr lahf_lm
bogomips        : 5606.05
clflush size    : 64
power management:


Output of /sbin/lspci -v
------------------------
00:00.0 Host bridge: ATI Technologies Inc Radeon Xpress 200 Host Bridge (rev 01)
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64
	Memory at <ignored> (64-bit, non-prefetchable)

00:01.0 PCI bridge: ATI Technologies Inc RS480 PCI Bridge (prog-if 00 [Normal decode])
	Flags: bus master, 66MHz, medium devsel, latency 99
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=68
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: fde00000-fdefffff
	Prefetchable memory behind bridge: d8000000-dfffffff
	Capabilities: [b0] Subsystem: Intel Corporation Unknown device d600

00:11.0 IDE interface: ATI Technologies Inc 437A Serial ATA Controller (rev 80) (prog-if 8f [Master SecP SecO PriP PriO])
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 23
	I/O ports at ff00 [size=8]
	I/O ports at fe00 [size=4]
	I/O ports at fd00 [size=8]
	I/O ports at fc00 [size=4]
	I/O ports at fb00 [size=16]
	Memory at fe02f000 (32-bit, non-prefetchable) [size=512]
	Expansion ROM at 40000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 2
	Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
	Kernel driver in use: sata_sil
	Kernel modules: sata_sil

00:12.0 IDE interface: ATI Technologies Inc 4379 Serial ATA Controller (rev 80) (prog-if 8f [Master SecP SecO PriP PriO])
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 22
	I/O ports at fa00 [size=8]
	I/O ports at f900 [size=4]
	I/O ports at f800 [size=8]
	I/O ports at f700 [size=4]
	I/O ports at f600 [size=16]
	Memory at fe02e000 (32-bit, non-prefetchable) [size=512]
	Expansion ROM at 40080000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 2
	Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
	Kernel driver in use: sata_sil
	Kernel modules: sata_sil

00:13.0 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller (rev 80) (prog-if 10 [OHCI])
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
	Memory at fe02d000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
	Kernel driver in use: ohci_hcd
	Kernel modules: ohci-hcd

00:13.1 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller (rev 80) (prog-if 10 [OHCI])
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
	Memory at fe02c000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
	Kernel driver in use: ohci_hcd
	Kernel modules: ohci-hcd

00:13.2 USB Controller: ATI Technologies Inc IXP SB400 USB2 Host Controller (rev 80) (prog-if 20 [EHCI])
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
	Memory at fe02b000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [dc] Power Management version 2
	Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
	Kernel driver in use: ehci_hcd
	Kernel modules: ehci-hcd

00:14.0 SMBus: ATI Technologies Inc IXP SB400 SMBus Controller (rev 82)
	Subsystem: Intel Corporation Unknown device d600
	Flags: 66MHz, medium devsel
	I/O ports at 0b00 [size=16]
	Memory at fe02a000 (32-bit, non-prefetchable) [size=1K]
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c-piix4

00:14.1 IDE interface: ATI Technologies Inc Standard Dual Channel PCI IDE Controller (rev 80) (prog-if 8a [Master SecP PriP])
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
	I/O ports at 01f0 [size=8]
	I/O ports at 03f4 [size=1]
	I/O ports at 0170 [size=8]
	I/O ports at 0374 [size=1]
	I/O ports at f400 [size=16]
	Capabilities: [70] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
	Kernel driver in use: pata_atiixp
	Kernel modules: pata_atiixp

00:14.2 Audio device: ATI Technologies Inc SB450 HDA Audio (rev 01)
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, slow devsel, latency 64, IRQ 16
	Memory at fe024000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 2
	Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
	Kernel driver in use: HDA Intel
	Kernel modules: snd-hda-intel

00:14.3 ISA bridge: ATI Technologies Inc IXP SB400 PCI-ISA Bridge (rev 80)
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 0

00:14.4 PCI bridge: ATI Technologies Inc IXP SB400 PCI-PCI Bridge (rev 80) (prog-if 01 [Subtractive decode])
	Flags: bus master, VGA palette snoop, 66MHz, medium devsel, latency 64
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
	I/O behind bridge: 0000d000-0000dfff
	Memory behind bridge: fdd00000-fddfffff
	Prefetchable memory behind bridge: fdc00000-fdcfffff

01:05.0 VGA compatible controller: ATI Technologies Inc RC410 [Radeon Xpress 200] (prog-if 00 [VGA controller])
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 17
	Memory at d8000000 (32-bit, prefetchable) [size=128M]
	I/O ports at ee00 [size=256]
	Memory at fdef0000 (32-bit, non-prefetchable) [size=64K]
	[virtual] Expansion ROM at fde00000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 2
	Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
	Kernel driver in use: radeon

02:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
	Subsystem: Intel Corporation Unknown device d600
	Flags: bus master, medium devsel, latency 64, IRQ 21
	I/O ports at de00 [size=256]
	Memory at fddff000 (32-bit, non-prefetchable) [size=256]
	Capabilities: [50] Power Management version 2
	Kernel driver in use: 8139too
	Kernel modules: 8139cp, 8139too


Software Environment:
Problem Description:

The file Documentation/filesystems/sysfs-pci.txt describes the method to be used in order to read the ROM resource of a PCI chipset installed in the system. However, this method does not always work. When it doesn't work, the read from the 'rom' file returns 0 bytes. This problem causes recent xserver builds to hang when trying to boot a secondary video card, because they fail to read the VGA ROM (https://bugs.freedesktop.org/show_bug.cgi?id=18160).

Steps to reproduce (with perl installed):

[root@srv64 /]# find /sys/devices/ -name rom
/sys/devices/pci0000:00/0000:00:01.0/0000:01:05.0/rom
/sys/devices/pci0000:00/0000:00:11.0/rom
/sys/devices/pci0000:00/0000:00:12.0/rom
[root@srv64 /]# cd /sys/devices/pci0000:00/0000:00:01.0/0000:01:05.0/
[root@srv64 0000:01:05.0]# ls -l
total 0
-rw-r--r-- 1 root root      4096 dic  5 16:52 broken_parity_status
lrwxrwxrwx 1 root root         0 dic  5 16:52 bus -> ../../../../bus/pci
-r--r--r-- 1 root root      4096 dic  5 16:28 class
-rw-r--r-- 1 root root       256 dic  4 14:34 config
-r--r--r-- 1 root root      4096 dic  5 16:28 device
lrwxrwxrwx 1 root root         0 dic  5 16:28 driver -> ../../../../bus/pci/drivers/radeon
lrwxrwxrwx 1 root root         0 dic  5 16:52 drm:card0 -> ../../../../class/drm/card0
-rw------- 1 root root      4096 dic  5 16:52 enable
-r--r--r-- 1 root root      4096 dic  5 16:28 irq
-r--r--r-- 1 root root      4096 dic  5 16:52 local_cpulist
-r--r--r-- 1 root root      4096 dic  5 16:52 local_cpus
-r--r--r-- 1 root root      4096 dic  5 16:52 modalias
-rw-r--r-- 1 root root      4096 dic  5 16:52 msi_bus
drwxr-xr-x 2 root root         0 dic  5 16:27 power
-r--r--r-- 1 root root      4096 dic  5 16:28 resource
-rw------- 1 root root 134217728 dic  5 16:52 resource0
-rw------- 1 root root 134217728 dic  4 14:34 resource0_wc
-rw------- 1 root root       256 dic  5 16:52 resource1
-rw------- 1 root root     65536 dic  4 14:34 resource2
-r-------- 1 root root    131072 dic  5 16:33 rom
lrwxrwxrwx 1 root root         0 dic  5 16:52 subsystem -> ../../../../bus/pci
-r--r--r-- 1 root root      4096 dic  5 16:52 subsystem_device
-r--r--r-- 1 root root      4096 dic  5 16:52 subsystem_vendor
-rw-r--r-- 1 root root      4096 dic  5 16:52 uevent
-r--r--r-- 1 root root      4096 dic  5 16:28 vendor
[root@srv64 0000:01:05.0]# perl -w -e '
> use IO::Handle;
> sysopen(ROM, "rom", 2) or die($!); 
> binmode(ROM); 
> syswrite(ROM, "1", 1) or die($!);
> sysseek(ROM, 0, 0); 
> $buffer = ""; 
> while (sysread(ROM, $buffer, 1024) > 0) {
>     print $buffer;
> };
> sysseek(ROM, 0, 0);
> syswrite(ROM, "0", 1) or die($!);
> close(ROM);' > /tmp/rom.bin
[root@srv64 0000:01:05.0]# ls -l /tmp/rom.bin 
-rw-r--r-- 1 root root 49152 dic  5 16:53 /tmp/rom.bin
[root@srv64 0000:01:05.0]# rm /tmp/rom.bin 
rm: ¿borrar el fichero regular «/tmp/rom.bin»? (s/n) s
[root@srv64 0000:01:05.0]# cd /
[root@srv64 /]# cd /sys/devices/pci0000:00/0000:00:11.0/
[root@srv64 0000:00:11.0]# ls -l
total 0
-rw-r--r-- 1 root root   4096 dic  5 16:54 broken_parity_status
lrwxrwxrwx 1 root root      0 dic  5 16:54 bus -> ../../../bus/pci
-r--r--r-- 1 root root   4096 dic  5 16:28 class
-rw-r--r-- 1 root root    256 dic  5 16:28 config
-r--r--r-- 1 root root   4096 dic  5 16:28 device
lrwxrwxrwx 1 root root      0 dic  5 16:28 driver -> ../../../bus/pci/drivers/sata_sil
-rw------- 1 root root   4096 dic  5 16:54 enable
drwxr-xr-x 4 root root      0 dic  5 16:27 host0
drwxr-xr-x 3 root root      0 dic  5 16:27 host1
-r--r--r-- 1 root root   4096 dic  5 16:28 irq
-r--r--r-- 1 root root   4096 dic  5 16:54 local_cpulist
-r--r--r-- 1 root root   4096 dic  5 16:54 local_cpus
-r--r--r-- 1 root root   4096 dic  5 16:54 modalias
-rw-r--r-- 1 root root   4096 dic  5 16:54 msi_bus
drwxr-xr-x 2 root root      0 dic  5 16:27 power
-r--r--r-- 1 root root   4096 dic  5 16:28 resource
-rw------- 1 root root      8 dic  5 16:54 resource0
-rw------- 1 root root      4 dic  5 16:54 resource1
-rw------- 1 root root      8 dic  5 16:54 resource2
-rw------- 1 root root      4 dic  5 16:54 resource3
-rw------- 1 root root     16 dic  5 16:54 resource4
-rw------- 1 root root    512 dic  5 16:54 resource5
-r-------- 1 root root 524288 dic  5 16:54 rom
lrwxrwxrwx 1 root root      0 dic  5 16:54 subsystem -> ../../../bus/pci
-r--r--r-- 1 root root   4096 dic  5 16:54 subsystem_device
-r--r--r-- 1 root root   4096 dic  5 16:54 subsystem_vendor
-rw-r--r-- 1 root root   4096 dic  5 16:54 uevent
-r--r--r-- 1 root root   4096 dic  5 16:28 vendor
[root@srv64 0000:00:11.0]# perl -w -e '
> use IO::Handle;
> sysopen(ROM, "rom", 2) or die($!); 
> binmode(ROM); 
> syswrite(ROM, "1", 1) or die($!);
> sysseek(ROM, 0, 0); 
> $buffer = ""; 
> while (sysread(ROM, $buffer, 1024) > 0) {
>     print $buffer;
> };
> sysseek(ROM, 0, 0);
> syswrite(ROM, "0", 1) or die($!);
> close(ROM);' > /tmp/rom.bin
[root@srv64 0000:00:11.0]# ls -l /tmp/rom.bin 
-rw-r--r-- 1 root root 0 dic  5 16:54 /tmp/rom.bin
[root@srv64 0000:00:11.0]# 

Here the Radeon ROM can be read, but the SATA ROM cannot. I notice that the radeon video chipset is shown as behind a bridge (may or may not be relevant).

Will compile a 2.6.28-rc7 kernel on Fedora 10 for another test case.
Comment 1 Greg Kroah-Hartman 2008-12-05 14:27:54 UTC
On Fri, Dec 05, 2008 at 02:03:23PM -0800, bugme-daemon@bugzilla.kernel.org wrote:
> Latest working kernel version:
> Earliest failing kernel version:
> 
> 2.6.26.6-49.fc8 for these examples, also seen with 2.6.28-rc6

So this has never worked?

Please bring this up on the linux-pci mailing list, that is the proper
place for it.
Comment 2 Grant Grundler 2008-12-07 22:03:34 UTC
Any chance this is a BIOS option that can be configured (or not)?
e.g. hit F2 or F12 to enable the SATA device as a Boot device.

I had to do this recently so linux would find some SATA drives I added to my desktop machine.

I don't believe the problem with the SATA controller is the same thing as having a problem reading the Expansion ROM from a second Video card.

The difference is VGA gets special MMIO routing by Bridges. We can only enable VGA routing for one device (enabled in one PCI host bridge) and I expect that one gets the Expansion ROM enabled. I'm not 100% sure about it but it would be worth checking.
Comment 3 Tim Nelson 2008-12-08 02:40:11 UTC
Is that one set of VGA routing per bridge, or per machine?  Well, what I'm really getting at is, my video cards are:

(--) PCI: (0@1:0:0) nVidia Corporation NV17 [GeForce4 MX 440] rev 163, Mem @
0xf0000000/0, 0xe0000000/0, 0xe8000000/0, BIOS @ 0x??
??????/131072
(--) PCI:*(0@2:2:0) Silicon Integrated Systems [SiS] 86C326 5598/6326 rev 11,
Mem @ 0xf4000000/0, 0xf3000000/0, I/O @ 0x00009800/0
, BIOS @ 0x????????/65536

...and they both used to work in this exact hardware configuration; all I did to stop it working was upgrade the software.  

It seems to me that they're on different PCI busses.  Does that make a difference?  Also (in case it helps), if I understand correctly, the ROM-reading code only gets called on the secondary video card, not the primary.  

HTH
Comment 4 Alex Deucher 2008-12-08 07:05:21 UTC
(In reply to comment #3)
> Is that one set of VGA routing per bridge, or per machine?  Well, what I'm
> really getting at is, my video cards are:
> 
> (--) PCI: (0@1:0:0) nVidia Corporation NV17 [GeForce4 MX 440] rev 163, Mem @
> 0xf0000000/0, 0xe0000000/0, 0xe8000000/0, BIOS @ 0x??
> ??????/131072
> (--) PCI:*(0@2:2:0) Silicon Integrated Systems [SiS] 86C326 5598/6326 rev 11,
> Mem @ 0xf4000000/0, 0xf3000000/0, I/O @ 0x00009800/0
> , BIOS @ 0x????????/65536
> 
> ...and they both used to work in this exact hardware configuration; all I did
> to stop it working was upgrade the software.  

If you are using the Xorg xserver, we switched from using X's PCI implementation to the kernel's when we switched to libpciaccess.
Comment 5 Alex Villacis Lasso 2008-12-08 08:27:04 UTC
For additional examples, I compiled 2.6.28-rc7 at home. This machine has no SATA interfaces, but it has an integrated AGP chipset (ProSavage-DDR-K), and also a PCI VGA card plugged in (Oak Spitfire 64111). For these tests, I prepared a pair of patches. The first patch for drivers/pci/pci-sysfs.c makes the kernel fail the read with EIO, instead of returning a "success" of 0 bytes, if either the ROM mapping is NULL, or its detected size is zero. The second patch adds debugging to drivers/pci/rom.c in order to locate the source of failure.

What I conclude is that the sysfs implementation is only able to read the ROM from the VGA card that is selected as the primary boot-time card. In addition, the failure mode is that the mapping of the ROM apparently succeeds, but the contents are invalid, and fail the test to determine the size of the ROM. I performed the tests by selecting the BIOS option "Initialize video card first" with options of AGP (Savage) or PCI (Oak Spitfire). In either case, the selected card showed the boot process, as expected, and remained the only card on which the ROM could be read.
Comment 6 Alex Villacis Lasso 2008-12-08 08:28:55 UTC
Created attachment 19210 [details]
perl script used to read ROM to file, as documented in kernel docs

This program will report the reading status in addition to saving the ROM file to a specified destination.
Comment 7 Alex Villacis Lasso 2008-12-08 08:29:46 UTC
Created attachment 19211 [details]
Shell script with paths to corresponding PCI devices for tests
Comment 8 Alex Villacis Lasso 2008-12-08 08:31:08 UTC
Created attachment 19212 [details]
Add validation to ROM reading operation, fail with -EIO if validation fails

This patch lets userspace know that something wrong happened with the ROM reading.
Comment 9 Alex Villacis Lasso 2008-12-08 08:31:59 UTC
Created attachment 19213 [details]
Add debugging information to PCI ROM mapping

This patch adds debugging to know where in the ROM mapping process the failure occurs.
Comment 10 Alex Villacis Lasso 2008-12-08 08:33:31 UTC
Created attachment 19214 [details]
dmesg output for 2.6.28-rc7 with patches applied - AGP/savage as primary, PCI/spitfire as secondary

These are the results on dmesg when trying to use the supplied scripts to read both ROMs, with savage as primary and spitfire as secondary.
Comment 11 Alex Villacis Lasso 2008-12-08 08:34:22 UTC
Created attachment 19215 [details]
Script output - AGP/savage as primary, PCI/spitfire as secondary
Comment 12 Alex Villacis Lasso 2008-12-08 08:35:01 UTC
Created attachment 19216 [details]
dmesg output for 2.6.28-rc7 with patches applied - AGP/savage as secondary, PCI/spitfire as primary
Comment 13 Alex Villacis Lasso 2008-12-08 08:35:25 UTC
Created attachment 19217 [details]
Script output - AGP/savage as secondary, PCI/spitfire as primary
Comment 14 Tim Nelson 2008-12-08 14:31:39 UTC
Alex: I'm aware of the PCI switch; my point (which I didn't make well, sorry), was that it's hardware-possible, assuming appropriate software :).  

Bill Crawford made a point which I think is relevant in response to Alex on the xorg list; I'll paste the entire comment below.  

Bill Crawford wrote:

> On Friday 05 December 2008 22:08:13 Alex Villac�s Lasso wrote:
> 
> > The failure to read the PCI ROM as documented seems like a kernel bug. I
> > opened http://bugzilla.kernel.org/show_bug.cgi?id=12168 for it.
> 
> This affects kernel modesetting as well, since the cards aren't POSTed 
> properly in these situations.

HTH
Comment 15 Tim Nelson 2008-12-08 14:38:33 UTC
(The Alex addressed in the message above was supposed to be Alex Deucher, but the rest of the message is for everyone; sorry).  
Comment 16 Jesse Barnes 2008-12-09 13:56:49 UTC
So I guess you're not hitting the assign_resource path on the ROM that can't be read?  Maybe the rom_size code is getting things wrong somehow (image - rom == 0 possibly)?  Assuming it has I/O space allocated and it's being enabled properly, I don't see why else it would fail.
Comment 17 Tim Nelson 2008-12-28 20:02:10 UTC
Am I right in assuming that error-checking patches such as attachment 19212 [details] above are welcome?  And would it be useful to have most of the "break" statements in pci_get_rom_size instead be a "return -EIO"?  Or is that going to cause problems somewhere else?  Or should some more specific error be returned?  
Comment 18 Tim Nelson 2008-12-28 20:19:34 UTC
Created attachment 19513 [details]
Example of patch for error checking on rom size

This is a patch that is an example of what I was talking about above.
Comment 19 Jesse Barnes 2008-12-29 10:04:02 UTC
Yeah both are reasonable.  Can you send them out to the linux-pci@vger.kernel.org list for review?
Comment 20 Tim Nelson 2009-01-05 15:32:48 UTC
Created attachment 19667 [details]
Patch that helps debugging, but does not solve the problem in any way

Patch that helps debugging, but does not solve the problem in any way

...will be discussed in future comment
Comment 21 Tim Nelson 2009-01-05 15:39:14 UTC
Here's the output from the patch in my previous comment.  

 pci_map_rom: Trace A
 pci_map_rom: Trace C
 pci_map_rom: Trace E
 pci_map_rom: Trace F
 pci_get_rom_size: Trace pci_get_rom_size; breaking because 0x55 failed
 pci_get_rom_size: Trace pci_get_rom_size; image: -121896960, rom: -121896960; size: 131072
 pci_map_rom: pci_get_rom_size() returned 0 at 0xf8bc0000
 pci_map_rom: Trace G


This is naturally on the secondary video card, which is an AGP card.  The first video card works fine.  As you can see, there's a bug in my patch that causes the image and rom sizes to be reported negatively, but the real problem is that the byte that is supposed to be 0x55 isn't.  Unwisely, I didn't include code to print out what it actually is.  Maybe next time.  I'll keep fiddling with this, and, while I may push us a bit further along the road to understanding, I don't expect to come up with the final solution to the problem (as I have no real understanding of PCI, or the code outside the specific function I'm debugging); hopefully others can chime in usefully.  
Comment 22 Tim Nelson 2009-01-05 16:15:27 UTC
Created attachment 19668 [details]
Another possible patch for rom size?

This patch would mean that pci_get_rom_size would return something "usable" even when the content of the ROM appears to be wrong in some way.  However, I suspect the error is caused by some underlying problem, and that we should solve the problem instead of using this patch, but I attach it as an idea for discussion.
Comment 23 Tim Nelson 2009-01-06 20:28:20 UTC
More results:
 pci_map_rom: Trace A
 pci_map_rom: Trace C
 pci_map_rom: Trace E
 pci_map_rom: Trace F
 pci_get_rom_size: Trace pci_get_rom_size; breaking because 0x55 failed (got 255)
 pci_get_rom_size: Trace pci_get_rom_size; image: 4173332480, rom: 4173332480; size: 131072
 pci_map_rom: pci_get_rom_size() returned 0 at 0xf8c00000
 pci_map_rom: Trace G

I apologies that the 0x55 result was reported as 255 instead of 0xFF, but you get that.  Anyway, my point is, we now know what we're getting instead of 0x55, we just don't know why.  

I thought something that might also be useful is a trace from a working video card.  

 pci_map_rom: Trace A
 pci_map_rom: Trace B
 pci_map_rom: Trace F
 pci_get_rom_size: Trace pci_get_rom_size; last_image is 128
 pci_get_rom_size: Trace pci_get_rom_size; image: 3222044672, rom: 3222011904; size: 131072
 pci_map_rom: pci_get_rom_size() returned 32768 at 0xc00c0000
 pci_map_rom: Trace G

(and many more repetitions of similar things; the thing I find interesting is the ABF pattern instead of ACEF).  
Comment 24 Tim Nelson 2009-01-07 20:37:25 UTC
Btw, I'm stuck at the moment on this, because I don't really understand the code.  If someone could compare the patch in comment #20 and the output in comment #23 (which admittedly has added a little to one of the patch messages), and suggest further avenues for exploration (ie. as to why we're getting 0xFF instead of 0x55, or anything else you may deem useful), that would be helpful.  I'm happy to dig further into things and find out what values they're returning and that sort of thing, but I have no idea what could possibly be going wrong.  
Comment 25 Alex Villacis Lasso 2009-01-08 08:39:14 UTC
On the particular case of my home machine, in which a ProSavage DDR-K AGP chipset is the primary video device, and an old OAK OTI64111 PCI video card is the secondary device, the command "setpci -d VENDORID:DEVICEID COMMAND" shows the value 0007 for the ProSavage and 0000 for the OAK. After manually setting the value of COMMAND to 0007 for the OAK, the ROM reading from the OAK PCI card succeeds.

Unfortunately this is not a general solution for all PCI cards, since the SATA chipsets in my work machine already have COMMAND=0007 and still cannot be read.

Both work and home machines now run custom-built vanilla 2.6.28 from kernel.org.
Comment 26 Tim Nelson 2009-01-08 16:01:53 UTC
Works for me also with 2 video cards.  I agree it's disappointing that it's not a general solution, but it's better than nothing.  

As per Bjorn Helgaas' message to linux-pci on 15 December (which I now actually understand how to apply, thanks to Alex VL), I found that it can be set to 2 instead of 7 to enable this.  It seems to me that this bit (bit 1) should be permanently turned on in the case of all ROMs.  My questions are:
1. Should this be done in pci_enable_rom(), or somewhere else (maybe pci_bus_assign_resources() in setup-bus.c)?  
2. Does anyone know whether bits 2 and 0 should also be turned on?  
Comment 27 Tim Nelson 2009-01-08 16:15:45 UTC
Hmm.  After further examination, I presume pci_enable_device (in pci.c) is not getting run.  This kind of answers question 2 above, but I still need to know the answer to question 1.  
Comment 28 Tim Nelson 2009-01-09 19:33:18 UTC
It also seems like is_enabled_store() in pci-sysfs.c is trying to call pci_enable_device.  

*(&^#%&(

Ok, all, I've just discovered that, in addition to echoing 1 into the ROM before reading it, we also have to echo 1 into the /sys/bus/pci/devices/<deviceID>/enable as well.  
Comment 29 Tim Nelson 2009-01-09 21:05:24 UTC
Created attachment 19737 [details]
Fixed version of readrom.pl

I've attached a fixed version of readrom.pl
Comment 30 Alex Villacis Lasso 2009-01-10 09:29:13 UTC
(In reply to comment #29)
> Created an attachment (id=19737) [details]
> Fixed version of readrom.pl
> 
> I've attached a fixed version of readrom.pl
> 

Nice one, only I would not test it on the primary chipset (the one used for the current display) - it unconditionally disables the PCI chipset after reading the ROM.
Comment 31 Tim Nelson 2009-01-11 01:36:24 UTC
It disables the rom, but not the device.  And even if I disabled the device, the file called "enable" works like a counter, unlike the file "rom".  

HTH.  

Anyway, in my opinion, what needs to happen to close the display portion of this bug is:
- The patch from comment 8 needs to be applied
- The pci_enable_rom() function needs to printk an error if the card is not enabled

The other problems related to reading the VGA ROM can be dealt with in the Xorg code.  The problems with the SATA ROMs aren't something I'll be in a position to spend time on.  

Hopefully I'll get some time tomorrow or so to make up a combined patch to solve the VGA-related problems.  

Anything else?  
Comment 32 Tim Nelson 2009-01-11 22:10:24 UTC
Created attachment 19757 [details]
Makes pci-sysfs fail when it can't read rom, and emit warnings too

This patch should solve the problem, and give warnings to others to help them not waste the time that Alex and I did on this.
Comment 33 Jesse Barnes 2009-01-12 09:42:40 UTC
Can you send this patch out to linux-pci (if you haven't already, haven't checked that mailbox yet).  We may even go one step further and check whether the device is enabled directly.
Comment 34 Tim Nelson 2009-01-27 15:38:18 UTC
Created attachment 20020 [details]
Patch that solves the problem; I know of no problems with it

This patch appears to do exactly what I want, and I think also what Jesse requested as well.
Comment 35 Tim Nelson 2009-01-27 16:14:13 UTC
Oops, doesn't comply with coding standards.  I'll try again.  
Comment 36 Tim Nelson 2009-01-28 00:08:23 UTC
Created attachment 20023 [details]
pci: Return/dmesg errors when reading PCI ROMs

Above patch, corrected for coding style, etc.
Comment 37 Tim Nelson 2009-01-29 11:15:24 UTC
Created attachment 20039 [details]
[PATCH] pci: Return/dmesg errors when reading PCI ROMs (take 2)

Corrected according to Greg-KH's comments on the mailing list.