Bug 11196
Summary: | [sky2] System freeze upon PCI Express error | ||
---|---|---|---|
Product: | Drivers | Reporter: | Pascal BERNARD (pascal.bernard1) |
Component: | Network | Assignee: | Jeff Garzik (jgarzik) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | akpm, stephen |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.26.1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Picture of the console
Kernel message before freeze before resetting BIOS parameters after the reset |
Description
Pascal BERNARD
2008-07-31 11:53:14 UTC
2.6.18 is dreadfully old for the kernel.org developers. Perhaps your distro is still supporting it. Please try 2.6.26 if poss, thanks. I have a 2.6.22 under Mint and a 2.6.25 under Debian sid though ! I will work with Debian sid and confirm. I do not remember with what version of the kernel it occured with it. It was obviously not an old one in any case. Created attachment 17063 [details]
Picture of the console
Luckily enough, I could reproduce the problem while I was on the console.
I use a Debian kernel:
2.6.25-2-686 #1 SMP Fri Jul 18 17:46:56 UTC
To be more precise, I lost network connection while I was under X and I had time to switch to the console before the crash. Thus, you may consider corrupted memory. The fact that it happened with several different kernels pleas in favour of a corruption inside the sky2 driver itself. As directed by Teodor <mteodor@gmail.com> from Debian, I switched to 2.6.26-1 version of the kernel. I could work for a few hours without problem, but I do not see what can explain a better behaviour: pascal@moraes:~/linux-2.6/linux-2.6.26.y/drivers/net$ git log 'v2.6.25'.. sky2.c commit a3b4fcedee5cf1d1342b85f1318c0fe1ff1727a9 Author: Stephen Hemminger <shemminger@vyatta.com> Date: Sat Jun 14 10:32:15 2008 -0700 sky2: 88E8040T pci device id Missed one pci id for 88E8040T. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> commit 68c2889834602f6efed195f44439ef5d526683a8 Author: Ben Hutchings <bhutchings@solarflare.com> Date: Sat May 31 16:52:52 2008 +0100 sky2: Hold RTNL while calling dev_close() dev_close() must be called holding the RTNL. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> commit d494eacde8858f9b53f5c640692caf14eb3c8239 Author: Stephen Hemminger <shemminger@vyatta.com> Date: Wed May 14 17:04:13 2008 -0700 sky2: restore vlan acceleration on reset If device has to be reset by sky2_restart, then need to restore the VLAN acceleration settings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> pascal@moraes:~/linux-2.6/linux-2.6.26.y/drivers/net$ git log 'v2.6.25'.. sky2.h commit a300344ab9b77130310fc225fdc7677e129b1163 Author: Jesse Brandeburg <jesse.brandeburg@intel.com> Date: Tue May 6 14:34:35 2008 -0700 sky2: fix simple define thinko noticed while browsing code, apparent thinko. compile tested only. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> I got a kernel freeze with 2.6.26-1-686. I did not have the chance to catch a message on the console this time. Just in case it helps: - I got a kernel freeze with sky2 module loaded but no link through it - I have upgraded the BIOS of my motherboard (1212 which is not marked beta) Created attachment 18970 [details]
Kernel message before freeze
Here is a new instance of the issue:
- BIOS upgraded to last version 1236
- Linux Mint (ie kernel vmlinuz-2.6.24-21-generic which probably does not patch 2.5.24)
See attachment "Kernel message before freeze"
Do you have 4G or more of memory? I know of driver problems with 4G or more of memory, but some motherboards do not correctly wire the upper address bits, so the memory above 4G is not accessible and causes PCI error. You might be able to work around this with the kernel iommu=soft. Unfortunately, since sky2 driver is/was a strictly volunteer effort, and your seems to be the only current outstanding report of driver failures, so until other see the problem, I only really have the resources to give you hints to solve the problem yourself.. I "only" have 2G of memory, so I suppose it is not worth trying iommu=soft. The problem probably appeared after upgrading the BIOS. This can explain why others do not have the problem. I do not know what I could do to give more hints. On Sun, 23 Nov 2008 14:39:21 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11196 > > > > > > ------- Comment #10 from pascal.bernard1@free.fr 2008-11-23 14:39 ------- > I "only" have 2G of memory, so I suppose it is not worth trying iommu=soft. > > The problem probably appeared after upgrading the BIOS. This can explain why > others do not have the problem. > > I do not know what I could do to give more hints. > > You could capture register dump with ethtool -d for both old and new bios. It could be that the BIOS set up some part of the chip differently and could be fixed. The problem is that the driver generally tries not to fiddle with bits it doesn't need to because the BIOS often initializes stuff based on hardware and timing parameters based on the system bus speed etc... Created attachment 19057 [details]
before resetting BIOS parameters
Created attachment 19058 [details]
after the reset
I have run ethtool with the latest BIOS (1236), result in eth0.1236 I tried to load the latest stable release 1101, but the BIOS did not let me load an older version of it ! I then ask the BIOS to revert to default settings. I was quite surprised to see that the output of ethtool changed ! Result in eth0.afterdefault. The BIOS offers the possibility of overclocking. I tried with a +10% overclocking. Is it a possible cause of driver failure ? I will let you know if I observe kernel freeze or panic under nominal conditions. Thank you for your support. I does not happen so often, but the system got frozen twice. Even though I had not the chance to trap a message, it is likely to be a sky2. Unabling sky2 resulted in no problem. I will try to get more info. This was just to let you know not to close this ticket. Here is a more precise message before kernel panic: Jan 2 23:00:59 moraes kernel: [12337.606996] sky2 0000:02:00.0: error interrupt status=0x80000000 Jan 2 23:00:59 moraes kernel: [12337.606996] sky2 0000:02:00.0: PCI Express error (0x40000) Jan 2 23:00:59 moraes kernel: [12337.606996] sky2 0000:02:00.0: error interrupt status=0x80000000 Jan 2 23:00:59 moraes kernel: [12337.606996] sky2 0000:02:00.0: PCI Express error (0x40000) Jan 2 23:00:59 moraes kernel: [12337.606996] sky2 0000:02:00.0: error interrupt status=0x80000000 Jan 2 23:00:59 moraes kernel: [12337.606996] sky2 0000:02:00.0: PCI Express error (0x40000) This happened while: - the driver was not used ! I even blacklisted the module in modprobe.d but it was loaded anyway (does anyone know why ?) - the BIOS setting were as standard as possible (no overclocking) It looks like a hardware issue with power management on the motherboard. The freeze occures even if the driver is not loaded. this cannot be the driver. I change the status to resoved (maybe a status REJECTED should be more appropriate. Sorry to have bothered you with this problem. |