Bug 4010 - Timer panics after 2 days running; IDE/SCSI?
Summary: Timer panics after 2 days running; IDE/SCSI?
Status: REJECTED INVALID
Alias: None
Product: Timers
Classification: Unclassified
Component: Interval Timers (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Diego Calleja
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-09 05:30 UTC by ajpearce
Modified: 2006-08-05 07:51 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.10
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
screenshot of the crash (166 bytes, text/html)
2005-01-09 05:34 UTC, ajpearce
Details
kernel config causing the problem (172 bytes, text/html)
2005-01-09 05:34 UTC, ajpearce
Details
another config that causes problem (202 bytes, text/html)
2005-01-09 14:02 UTC, Carlos Rendon
Details

Description ajpearce 2005-01-09 05:30:27 UTC
Distribution: Gentoo
Hardware Environment: see bottom of description
Software Environment: see rest of description for glib and more. 2.6.10 vanilla
Problem Description: Regular kernel panic

Steps to reproduce: Run with the referenced config for 3 days. Usually crashes
after 2 days regularly. MAY need similar hardware to test with; ASRock
motherboard and a SCSI card or three.

Put severity high because for me it is, though I'm unsure if I should take
`number of users effected` into account - let me know.


Getting a regular kernel panic on my server.

* This bug will probably need reclassifying under the right section.
* Need to confirm it's not a dupe; not sure what I should search for.

Photos of the crash:
http://www.ajpearce.co.uk/files/kernelpanic4.JPG
http://www.ajpearce.co.uk/files/kernelpanic3.JPG
http://www.ajpearce.co.uk/files/kernelpanic2.JPG


My actions:
I have enabled kernel debugging and am waiting for it to crash again. I'll post
again with photo number 5 (didn't get a photo of 1st crash).

Any pattern to the kernel panic?:
Well, it doesn't do it when I perform a particular action but it regularllky
does it after about 2 days 
of operation as a fileserver. The system has 3 SCSI cards but seems to crash
when accessing my 80GB IDE
UATA hard drive.


Abbrieviated error message of the kernel panic:

EIP is at cascade+0x48/0x60
Process swapper (pid:0, threadinfo=c04b8000 task=c042db00)
Call Trace:
[addresses excluded; see photo for details]
run_timer_softirq+0x11f/0x170
__do_softirq+0x7d/0x90
do_softirq+0x26/0x30
do_IRQ+0x1e/0x30
common_interrupt+0x1a/0x20
default_idle+0x23/0x30
cpu_idle+0x2c/0x40
start_kernel+0138/0x160
unknown_bootoption+0x0/0x1b0

<0>Kernel panic - not syncing: Fatal exception in interrupt


GDB output of timer.o:

root@betty.net[/usr/src/linux] # gdb kernel/timer.o
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library
"/lib/libthread_db.so.1".

(gdb) list *cascade+0x48
0x328 is in cascade (kernel/timer.c:423).
418                     internal_add_timer(base, tmp);
419             }
420             INIT_LIST_HEAD(head);
421
422             return index;
423     }
424
425     /***
426      * __run_timers - run all expired timers (if any) on this CPU.
427      * @base: the timer vector to be processed.
(gdb)


Hardware info:

root@betty.net[/usr/src/linux] # lspci
00:00.0 Host bridge: Silicon Integrated Systems [SiS]: Unknown device 0746 (rev 02)
00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SG86C202
00:02.0 ISA bridge: Silicon Integrated Systems [SiS]: Unknown device 0963 (rev 25)
00:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016
00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE]
00:02.7 Multimedia audio controller: Silicon Integrated Systems [SiS] Sound
Controller (rev a0)
00:03.0 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f)
00:03.1 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f)
00:03.2 USB Controller: Silicon Integrated Systems [SiS] USB 2.0 Controller
00:04.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 10/100
Ethernet (rev 90)
00:09.0 SCSI storage controller: Adaptec AHA-2940/2940W / AIC-7871 (rev 03)
00:0a.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
00:0b.0 Unknown mass storage controller: American Megatrends Inc. MegaRAID 428
Ultra RAID Controller (rev 03)
00:0c.0 USB Controller: NEC Corporation USB (rev 41)
00:0c.1 USB Controller: NEC Corporation USB (rev 41)
00:0c.2 USB Controller: NEC Corporation USB 2.0 (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX 440]
(rev a3)
root@betty.net[/usr/src/linux] #

root@betty.net[/usr/src/linux] # lsscsi
[0:0:2:0]    cd/dvd  PIONEER  CD-ROM DR-124X   1.05  /dev/sr0
[0:0:4:0]    cd/dvd  NEC      CD-ROM DRIVE:502 2.0y  /dev/sr1
[0:0:5:0]    cd/dvd  COMPAQ   CD-ROM CR-503BCQ 1.1i  /dev/sr2
[1:0:6:0]    disk    IBM      IC35L018UWD210-0 S5CS  /dev/sda
root@betty.net[/usr/src/linux] #

root@betty.net[/usr/src/linux] # lsusb
Bus 006 Device 001: ID 0000:0000
Bus 005 Device 001: ID 0000:0000
Bus 004 Device 001: ID 0000:0000
Bus 003 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd
Bus 003 Device 001: ID 0000:0000
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 001: ID 0000:0000
root@betty.net[/usr/src/linux] #

Kernel config is here:
http://www.ajpearce.co.uk/files/crashing_config.txt


More software info:

*  sys-libs/glibc
      Latest version available: 2.3.4.20040808-r1
      Latest version installed: 2.3.4.20040808-r1
      Size of downloaded files: 15,381 kB
      Homepage:    http://sources.redhat.com/glibc/
      Description: GNU libc6 (also called glibc2) C library

*  net-fs/samba
      Latest version available: 3.0.9-r1
      Latest version installed: 3.0.2a-r2

root@betty.net[/usr/src/linux] # uname -a
Linux betty 2.6.10 #1 Sat Jan 9 00:39:02 GMT 2021 i686 AMD Athlon(tm) XP 3000+
AuthenticAMD GNU/Linux
root@betty.net[/usr/src/linux] #

root@betty.net[/usr/src/linux] # df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda6             9.7G  3.7G  5.5G  41% / 		(ext3)
/dev/hdb2              74G   44G   26G  63% /s 		(ext3)
/dev/sda3              17G  5.2G   11G  32% /usr 	(reiserfs)



Only cron job is updatedb and that doesn't seem to crash it.

Filesystems are ReiserFS and ext3.
Comment 1 ajpearce 2005-01-09 05:34:04 UTC
Created attachment 4358 [details]
screenshot of the crash
Comment 2 ajpearce 2005-01-09 05:34:41 UTC
Created attachment 4359 [details]
kernel config causing the problem
Comment 3 Carlos Rendon 2005-01-09 13:57:28 UTC
I believe that I am having this same error

Distribution: Gentoo

Steps to reproduce: do anything and it will crash in about 10 minutes, it
appears to be related to disk activity, i.e. if I just let it sit, it doesn't crash

abbrev txt of kernel panic:
---------------------------
Oops: 0000 [#1]
PREEMPT
Modules linked in: nvidia
CPU: 0
EIP is at __wake_up_bit+0x11/0x40

Call Trace:
end_buffer_async_write
__delay
end_bio_bh_io_sync
bio_endio
__ide_do_rw_disk
__end_that_request_first
__ide_end_request
ide_end_request
ide_dma_intr
ide_intr
ide_dma_intr
handle_IRQ_event
__do_IRQ
do_IRQ
====================
common_interrupt
default_idle
defautl_idle
cpu_idle
start_kernel
unknown_bootoption

<0>Kernel panic - not syncing: Fatal exception in interrupt


Software:
---------
sys-libs/glibc-2.3.4.20040808-r1
Linux icarus 2.6.9 #1 Sat Jan 8 01:00:19 PST 2005 i686 Intel(R) Pentium(R) 4 CPU
 2.66GHz GenuineIntel GNU/Linux

Hardware:
---------
from lspci:
0000:00:00.0 Host bridge: VIA Technologies, Inc. P4M266 Host Bridge
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP]
0000:00:09.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)
0000:00:09.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a)
0000:00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80)
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX
5200] (rev a1)
Comment 4 Carlos Rendon 2005-01-09 14:02:06 UTC
Created attachment 4361 [details]
another config that causes problem
Comment 5 ajpearce 2005-01-09 14:11:01 UTC
Interesting.
Save that config somewhere and try removing:

- preempt
- nvidia module

I disabled these too initially expecting that to fix it... but it didn't which
shocking me - embarrassing!

If it still crashes for you then quite likely it's a similar bug. It effected me
in 2.6.9 too. 
Comment 6 ajpearce 2005-01-09 14:13:45 UTC
oh and modules loaded:

sis900                 16516  0
snd_pcm_oss            48228  0
snd_mixer_oss          17600  1 snd_pcm_oss
snd_seq_oss            32256  0
snd_seq_midi_event      6080  1 snd_seq_oss
snd_seq                47632  4 snd_seq_oss,snd_seq_midi_event
uhci_hcd               29644  0
snd_emu10k1            91012  0
snd_intel8x0           27296  0
snd_rawmidi            19552  1 snd_emu10k1
snd_seq_device          7244  4 snd_seq_oss,snd_seq,snd_emu10k1,snd_rawmidi
snd_ac97_codec         70368  2 snd_emu10k1,snd_intel8x0
snd_pcm                79688  4 snd_pcm_oss,snd_emu10k1,snd_intel8x0,snd_ac97_codec
snd_timer              20228  2 snd_seq,snd_pcm
snd_page_alloc          7620  3 snd_emu10k1,snd_intel8x0,snd_pcm
snd_util_mem            3456  1 snd_emu10k1
snd_hwdep               7236  1 snd_emu10k1
ohci_hcd               27528  0
snd                    44388  12
snd_pcm_oss,snd_mixer_oss,snd_seq_oss,snd_seq,snd_emu10k1,snd_intel8x0,snd_rawmidi,snd_seq_device,snd_ac97_codec,snd_pcm,snd_timer,snd_hwdep
ehci_hcd               39364  0
usblp                  10944  0
dm_crypt               10248  0
w83781d                33192  0
i2c_isa                 1856  0
8250                   19812  0
serial_core            18112  1 8250
dm_mod                 51324  1 dm_crypt
i2c_sensor              3072  1 w83781d
i2c_core               17872  3 w83781d,i2c_isa,i2c_sensor
usb_storage            95952  0
hci_usb                11520  3
loop                   12488  0
bluetooth              41668  7 rfcomm,l2cap,hci_usb
ide_scsi               13380  0
8139too                20672  0
root@betty.net[~] #
Comment 7 Carlos Rendon 2005-01-09 15:07:25 UTC
i recomplied it with no preempt and forced the nvidia module not to load and it
still crashed (although i didn't get the error cause it just rebooted)
Comment 8 Carlos Rendon 2005-01-09 16:29:53 UTC
after thinking about this some more, I know for certain that this problem
probably also affects me in 2.6.9, however, it just reboots and i've never had
it stop to display the panic.  

What may be helpful is that it seems that I've started having this problem since
I removed my IDE floppy drive.  Hopefully that is helpful.
Comment 9 ajpearce 2005-01-10 00:28:16 UTC
I didn't relise floppy drives were IDE but I've done nothing like that 
recently.

It's crashed again:
http://www.ajpearce.co.uk/files/kernelPanic5.JPG

There's normally someone knowledgable on the case by now so maybe not many 
people are award of this bug because I haven't been able to classify it under 
the right section.
Comment 10 ajpearce 2005-01-11 15:18:03 UTC
Carlos could you post `uname -r` `cat /proc/cpuinfo` and `lspci` (or `cat
/proc/pci`) for me?

I'm going to start hardware testing tomorrow. The thing is testing is very slow
because to remove a variable from the equation I have to wait 2 days to see if
it crashes. And as far as I know I have to test:
- all modules
- all kernel options
- all hardware
- more probably

Guess I'll start by removing all non-core hardware apart from one SCSI card as
/usr is on a SCSI drive.


More information on my system:


betty.net[~] $ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 3000+
stepping        : 0
cpu MHz         : 2160.018
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips        : 4276.22

betty.net[~] $

betty.net[~] $ less /proc/interrupts
           CPU0
  0:    1903054          XT-PIC  timer
  1:          8          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  3:          3          XT-PIC  ehci_hcd
  5:        125          XT-PIC  aic7xxx, ohci_hcd, ohci_hcd
  8:          2          XT-PIC  rtc
 10:      82900          XT-PIC  aic7xxx, ehci_hcd, ohci_hcd, SiS SI7012, eth0
 11:          0          XT-PIC  ohci_hcd
 12:         66          XT-PIC  i8042
 14:       7021          XT-PIC  ide0
 15:         29          XT-PIC  ide1
NMI:          0
ERR:          0
betty.net[~] $

betty.net[~] $ less /proc/dma
 4: cascade

betty.net[~] $ less /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000cebff : Video ROM
000d5000-000d57ff : Adapter ROM
000f0000-000fffff : System ROM
00100000-1ffeffff : System RAM
  00100000-003cd87f : Kernel code
  003cd880-004b6eff : Kernel data
1fff0000-1fff7fff : ACPI Tables
1fff8000-1fffffff : ACPI Non-volatile Storage
bd900000-cdbfffff : PCI Bus #01
  c0000000-c7ffffff : 0000:01:00.0
  cdb80000-cdbfffff : 0000:01:00.0
cdd00000-cfefffff : PCI Bus #01
  ce000000-ceffffff : 0000:01:00.0
cfff7000-cfff7fff : 0000:00:0c.0
  cfff7000-cfff7fff : ohci_hcd
cfff8000-cfff8fff : 0000:00:0c.1
  cfff8000-cfff8fff : ohci_hcd
cfff9f00-cfff9fff : 0000:00:0c.2
  cfff9f00-cfff9fff : ehci_hcd
cfffa000-cfffafff : 0000:00:0a.0
  cfffa000-cfffafff : aic7xxx
cfffb000-cfffbfff : 0000:00:04.0
  cfffb000-cfffbfff : sis900
cfffc000-cfffcfff : 0000:00:03.0
  cfffc000-cfffcfff : ohci_hcd
cfffd000-cfffdfff : 0000:00:03.1
  cfffd000-cfffdfff : ohci_hcd
cfffe000-cfffefff : 0000:00:03.2
  cfffe000-cfffefff : ehci_hcd
cffff000-cfffffff : 0000:00:09.0
  cffff000-cfffffff : aic7xxx
d0000000-d0ffffff : 0000:00:00.0
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
ffee0000-ffefffff : reserved
fffc0000-ffffffff : reserved


betty.net[/proc] $ cat mtrr
reg00: base=0x00000000 (   0MB), size= 512MB: write-back, count=1
reg07: base=0xd0000000 (3328MB), size=  16MB: write-combining, count=1
betty.net[/proc] $ cat mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,noatime 0 0
none /proc proc rw,nodiratime 0 0
none /sys sysfs rw 0 0
none /dev ramfs rw 0 0
none /dev/pts devpts rw 0 0
none /mnt/ramfs ramfs rw 0 0
none /proc/bus/usb usbfs rw 0 0
/dev/hdb2 /s ext3 rw,noatime 0 0
/dev/sda3 /usr reiserfs rw,noatime 0 0


Comment 11 Carlos Rendon 2005-01-12 09:40:39 UTC
 cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.66GHz
stepping        : 7
cpu MHz         : 2680.516
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmovpat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
bogomips        : 5275.64


my lspci is already posted

uname -r just prints out my kernel version

hope this helps
Comment 12 ajpearce 2005-01-13 03:15:07 UTC
Crashed again but now with debugging enabled:

http://www.ajpearce.co.uk/files/kernelpanic6.JPG

Bridged text output:

 [<c0102662>] common_interrupt+0x1a/0x1a/0x20
 [<c028169e>] vgacon_scroll+0x10e/0x200
 [<c02a21b1>] scrup+0xe1/0x100
 [<02a3x81>] lf+0x71/0x80
 ... 
 [<c0145528>] vfs_write+0xb8/0x130
 [<x0145671>] sys_write+0x51/0x80
 [<c01024ed>] sysenter_past_esp+0x52/0x75

Code: f3 74 19 39 7b 18 89 d8 75 21 8b 1b 89 3c 24 89 44 24 04 e8 5b fd ff ff 39
...
00 00 00 00 8d bc 27 00
 <0>Kernel panic - not syncing: fatal exception in interrupt

(copied out by hand)
Comment 13 Diego Calleja 2006-08-04 13:53:15 UTC
Does it still happens in recent kernels?
Comment 14 Carlos Rendon 2006-08-04 14:02:05 UTC
I believe that I found that my motherboard was malfunctioning (after Windows XP
stopped working as well), a replacement motherboard fixed it for me.
Comment 15 Diego Calleja 2006-08-05 07:49:00 UTC
OK

Note You need to log in before you can comment on or make changes to this bug.