Distribution: Gentoo Hardware Environment: see bottom of description Software Environment: see rest of description for glib and more. 2.6.10 vanilla Problem Description: Regular kernel panic Steps to reproduce: Run with the referenced config for 3 days. Usually crashes after 2 days regularly. MAY need similar hardware to test with; ASRock motherboard and a SCSI card or three. Put severity high because for me it is, though I'm unsure if I should take `number of users effected` into account - let me know. Getting a regular kernel panic on my server. * This bug will probably need reclassifying under the right section. * Need to confirm it's not a dupe; not sure what I should search for. Photos of the crash: http://www.ajpearce.co.uk/files/kernelpanic4.JPG http://www.ajpearce.co.uk/files/kernelpanic3.JPG http://www.ajpearce.co.uk/files/kernelpanic2.JPG My actions: I have enabled kernel debugging and am waiting for it to crash again. I'll post again with photo number 5 (didn't get a photo of 1st crash). Any pattern to the kernel panic?: Well, it doesn't do it when I perform a particular action but it regularllky does it after about 2 days of operation as a fileserver. The system has 3 SCSI cards but seems to crash when accessing my 80GB IDE UATA hard drive. Abbrieviated error message of the kernel panic: EIP is at cascade+0x48/0x60 Process swapper (pid:0, threadinfo=c04b8000 task=c042db00) Call Trace: [addresses excluded; see photo for details] run_timer_softirq+0x11f/0x170 __do_softirq+0x7d/0x90 do_softirq+0x26/0x30 do_IRQ+0x1e/0x30 common_interrupt+0x1a/0x20 default_idle+0x23/0x30 cpu_idle+0x2c/0x40 start_kernel+0138/0x160 unknown_bootoption+0x0/0x1b0 <0>Kernel panic - not syncing: Fatal exception in interrupt GDB output of timer.o: root@betty.net[/usr/src/linux] # gdb kernel/timer.o GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) list *cascade+0x48 0x328 is in cascade (kernel/timer.c:423). 418 internal_add_timer(base, tmp); 419 } 420 INIT_LIST_HEAD(head); 421 422 return index; 423 } 424 425 /*** 426 * __run_timers - run all expired timers (if any) on this CPU. 427 * @base: the timer vector to be processed. (gdb) Hardware info: root@betty.net[/usr/src/linux] # lspci 00:00.0 Host bridge: Silicon Integrated Systems [SiS]: Unknown device 0746 (rev 02) 00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SG86C202 00:02.0 ISA bridge: Silicon Integrated Systems [SiS]: Unknown device 0963 (rev 25) 00:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016 00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] 00:02.7 Multimedia audio controller: Silicon Integrated Systems [SiS] Sound Controller (rev a0) 00:03.0 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f) 00:03.1 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f) 00:03.2 USB Controller: Silicon Integrated Systems [SiS] USB 2.0 Controller 00:04.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 10/100 Ethernet (rev 90) 00:09.0 SCSI storage controller: Adaptec AHA-2940/2940W / AIC-7871 (rev 03) 00:0a.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02) 00:0b.0 Unknown mass storage controller: American Megatrends Inc. MegaRAID 428 Ultra RAID Controller (rev 03) 00:0c.0 USB Controller: NEC Corporation USB (rev 41) 00:0c.1 USB Controller: NEC Corporation USB (rev 41) 00:0c.2 USB Controller: NEC Corporation USB 2.0 (rev 02) 01:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX 440] (rev a3) root@betty.net[/usr/src/linux] # root@betty.net[/usr/src/linux] # lsscsi [0:0:2:0] cd/dvd PIONEER CD-ROM DR-124X 1.05 /dev/sr0 [0:0:4:0] cd/dvd NEC CD-ROM DRIVE:502 2.0y /dev/sr1 [0:0:5:0] cd/dvd COMPAQ CD-ROM CR-503BCQ 1.1i /dev/sr2 [1:0:6:0] disk IBM IC35L018UWD210-0 S5CS /dev/sda root@betty.net[/usr/src/linux] # root@betty.net[/usr/src/linux] # lsusb Bus 006 Device 001: ID 0000:0000 Bus 005 Device 001: ID 0000:0000 Bus 004 Device 001: ID 0000:0000 Bus 003 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bus 003 Device 001: ID 0000:0000 Bus 002 Device 001: ID 0000:0000 Bus 001 Device 001: ID 0000:0000 root@betty.net[/usr/src/linux] # Kernel config is here: http://www.ajpearce.co.uk/files/crashing_config.txt More software info: * sys-libs/glibc Latest version available: 2.3.4.20040808-r1 Latest version installed: 2.3.4.20040808-r1 Size of downloaded files: 15,381 kB Homepage: http://sources.redhat.com/glibc/ Description: GNU libc6 (also called glibc2) C library * net-fs/samba Latest version available: 3.0.9-r1 Latest version installed: 3.0.2a-r2 root@betty.net[/usr/src/linux] # uname -a Linux betty 2.6.10 #1 Sat Jan 9 00:39:02 GMT 2021 i686 AMD Athlon(tm) XP 3000+ AuthenticAMD GNU/Linux root@betty.net[/usr/src/linux] # root@betty.net[/usr/src/linux] # df -h Filesystem Size Used Avail Use% Mounted on /dev/hda6 9.7G 3.7G 5.5G 41% / (ext3) /dev/hdb2 74G 44G 26G 63% /s (ext3) /dev/sda3 17G 5.2G 11G 32% /usr (reiserfs) Only cron job is updatedb and that doesn't seem to crash it. Filesystems are ReiserFS and ext3.
Created attachment 4358 [details] screenshot of the crash
Created attachment 4359 [details] kernel config causing the problem
I believe that I am having this same error Distribution: Gentoo Steps to reproduce: do anything and it will crash in about 10 minutes, it appears to be related to disk activity, i.e. if I just let it sit, it doesn't crash abbrev txt of kernel panic: --------------------------- Oops: 0000 [#1] PREEMPT Modules linked in: nvidia CPU: 0 EIP is at __wake_up_bit+0x11/0x40 Call Trace: end_buffer_async_write __delay end_bio_bh_io_sync bio_endio __ide_do_rw_disk __end_that_request_first __ide_end_request ide_end_request ide_dma_intr ide_intr ide_dma_intr handle_IRQ_event __do_IRQ do_IRQ ==================== common_interrupt default_idle defautl_idle cpu_idle start_kernel unknown_bootoption <0>Kernel panic - not syncing: Fatal exception in interrupt Software: --------- sys-libs/glibc-2.3.4.20040808-r1 Linux icarus 2.6.9 #1 Sat Jan 8 01:00:19 PST 2005 i686 Intel(R) Pentium(R) 4 CPU 2.66GHz GenuineIntel GNU/Linux Hardware: --------- from lspci: 0000:00:00.0 Host bridge: VIA Technologies, Inc. P4M266 Host Bridge 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP] 0000:00:09.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a) 0000:00:09.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a) 0000:00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) 0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge 0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 0000:00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) 0000:01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1)
Created attachment 4361 [details] another config that causes problem
Interesting. Save that config somewhere and try removing: - preempt - nvidia module I disabled these too initially expecting that to fix it... but it didn't which shocking me - embarrassing! If it still crashes for you then quite likely it's a similar bug. It effected me in 2.6.9 too.
oh and modules loaded: sis900 16516 0 snd_pcm_oss 48228 0 snd_mixer_oss 17600 1 snd_pcm_oss snd_seq_oss 32256 0 snd_seq_midi_event 6080 1 snd_seq_oss snd_seq 47632 4 snd_seq_oss,snd_seq_midi_event uhci_hcd 29644 0 snd_emu10k1 91012 0 snd_intel8x0 27296 0 snd_rawmidi 19552 1 snd_emu10k1 snd_seq_device 7244 4 snd_seq_oss,snd_seq,snd_emu10k1,snd_rawmidi snd_ac97_codec 70368 2 snd_emu10k1,snd_intel8x0 snd_pcm 79688 4 snd_pcm_oss,snd_emu10k1,snd_intel8x0,snd_ac97_codec snd_timer 20228 2 snd_seq,snd_pcm snd_page_alloc 7620 3 snd_emu10k1,snd_intel8x0,snd_pcm snd_util_mem 3456 1 snd_emu10k1 snd_hwdep 7236 1 snd_emu10k1 ohci_hcd 27528 0 snd 44388 12 snd_pcm_oss,snd_mixer_oss,snd_seq_oss,snd_seq,snd_emu10k1,snd_intel8x0,snd_rawmidi,snd_seq_device,snd_ac97_codec,snd_pcm,snd_timer,snd_hwdep ehci_hcd 39364 0 usblp 10944 0 dm_crypt 10248 0 w83781d 33192 0 i2c_isa 1856 0 8250 19812 0 serial_core 18112 1 8250 dm_mod 51324 1 dm_crypt i2c_sensor 3072 1 w83781d i2c_core 17872 3 w83781d,i2c_isa,i2c_sensor usb_storage 95952 0 hci_usb 11520 3 loop 12488 0 bluetooth 41668 7 rfcomm,l2cap,hci_usb ide_scsi 13380 0 8139too 20672 0 root@betty.net[~] #
i recomplied it with no preempt and forced the nvidia module not to load and it still crashed (although i didn't get the error cause it just rebooted)
after thinking about this some more, I know for certain that this problem probably also affects me in 2.6.9, however, it just reboots and i've never had it stop to display the panic. What may be helpful is that it seems that I've started having this problem since I removed my IDE floppy drive. Hopefully that is helpful.
I didn't relise floppy drives were IDE but I've done nothing like that recently. It's crashed again: http://www.ajpearce.co.uk/files/kernelPanic5.JPG There's normally someone knowledgable on the case by now so maybe not many people are award of this bug because I haven't been able to classify it under the right section.
Carlos could you post `uname -r` `cat /proc/cpuinfo` and `lspci` (or `cat /proc/pci`) for me? I'm going to start hardware testing tomorrow. The thing is testing is very slow because to remove a variable from the equation I have to wait 2 days to see if it crashes. And as far as I know I have to test: - all modules - all kernel options - all hardware - more probably Guess I'll start by removing all non-core hardware apart from one SCSI card as /usr is on a SCSI drive. More information on my system: betty.net[~] $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 10 model name : AMD Athlon(tm) XP 3000+ stepping : 0 cpu MHz : 2160.018 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow bogomips : 4276.22 betty.net[~] $ betty.net[~] $ less /proc/interrupts CPU0 0: 1903054 XT-PIC timer 1: 8 XT-PIC i8042 2: 0 XT-PIC cascade 3: 3 XT-PIC ehci_hcd 5: 125 XT-PIC aic7xxx, ohci_hcd, ohci_hcd 8: 2 XT-PIC rtc 10: 82900 XT-PIC aic7xxx, ehci_hcd, ohci_hcd, SiS SI7012, eth0 11: 0 XT-PIC ohci_hcd 12: 66 XT-PIC i8042 14: 7021 XT-PIC ide0 15: 29 XT-PIC ide1 NMI: 0 ERR: 0 betty.net[~] $ betty.net[~] $ less /proc/dma 4: cascade betty.net[~] $ less /proc/iomem 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cebff : Video ROM 000d5000-000d57ff : Adapter ROM 000f0000-000fffff : System ROM 00100000-1ffeffff : System RAM 00100000-003cd87f : Kernel code 003cd880-004b6eff : Kernel data 1fff0000-1fff7fff : ACPI Tables 1fff8000-1fffffff : ACPI Non-volatile Storage bd900000-cdbfffff : PCI Bus #01 c0000000-c7ffffff : 0000:01:00.0 cdb80000-cdbfffff : 0000:01:00.0 cdd00000-cfefffff : PCI Bus #01 ce000000-ceffffff : 0000:01:00.0 cfff7000-cfff7fff : 0000:00:0c.0 cfff7000-cfff7fff : ohci_hcd cfff8000-cfff8fff : 0000:00:0c.1 cfff8000-cfff8fff : ohci_hcd cfff9f00-cfff9fff : 0000:00:0c.2 cfff9f00-cfff9fff : ehci_hcd cfffa000-cfffafff : 0000:00:0a.0 cfffa000-cfffafff : aic7xxx cfffb000-cfffbfff : 0000:00:04.0 cfffb000-cfffbfff : sis900 cfffc000-cfffcfff : 0000:00:03.0 cfffc000-cfffcfff : ohci_hcd cfffd000-cfffdfff : 0000:00:03.1 cfffd000-cfffdfff : ohci_hcd cfffe000-cfffefff : 0000:00:03.2 cfffe000-cfffefff : ehci_hcd cffff000-cfffffff : 0000:00:09.0 cffff000-cfffffff : aic7xxx d0000000-d0ffffff : 0000:00:00.0 fec00000-fec00fff : reserved fee00000-fee00fff : reserved ffee0000-ffefffff : reserved fffc0000-ffffffff : reserved betty.net[/proc] $ cat mtrr reg00: base=0x00000000 ( 0MB), size= 512MB: write-back, count=1 reg07: base=0xd0000000 (3328MB), size= 16MB: write-combining, count=1 betty.net[/proc] $ cat mounts rootfs / rootfs rw 0 0 /dev/root / ext3 rw,noatime 0 0 none /proc proc rw,nodiratime 0 0 none /sys sysfs rw 0 0 none /dev ramfs rw 0 0 none /dev/pts devpts rw 0 0 none /mnt/ramfs ramfs rw 0 0 none /proc/bus/usb usbfs rw 0 0 /dev/hdb2 /s ext3 rw,noatime 0 0 /dev/sda3 /usr reiserfs rw,noatime 0 0
cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.66GHz stepping : 7 cpu MHz : 2680.516 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmovpat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe bogomips : 5275.64 my lspci is already posted uname -r just prints out my kernel version hope this helps
Crashed again but now with debugging enabled: http://www.ajpearce.co.uk/files/kernelpanic6.JPG Bridged text output: [<c0102662>] common_interrupt+0x1a/0x1a/0x20 [<c028169e>] vgacon_scroll+0x10e/0x200 [<c02a21b1>] scrup+0xe1/0x100 [<02a3x81>] lf+0x71/0x80 ... [<c0145528>] vfs_write+0xb8/0x130 [<x0145671>] sys_write+0x51/0x80 [<c01024ed>] sysenter_past_esp+0x52/0x75 Code: f3 74 19 39 7b 18 89 d8 75 21 8b 1b 89 3c 24 89 44 24 04 e8 5b fd ff ff 39 ... 00 00 00 00 8d bc 27 00 <0>Kernel panic - not syncing: fatal exception in interrupt (copied out by hand)
Does it still happens in recent kernels?
I believe that I found that my motherboard was malfunctioning (after Windows XP stopped working as well), a replacement motherboard fixed it for me.
OK