Distribution: Debian sarge Hardware Environment: - Celeron 2.4GHz 2GB, - 2x Promise Technology, Inc. PDC20268 - VIA VT6420 SATA RAID Controller Software Environment: - Software raid and LVM, ext3 Problem Description: Following kernel Oops when commiting changes on mailbox in mutt. Mailbox is 815MB. /tmp may have been full when oops occured (don't know). Steps to reproduce: Did not try
Created attachment 12391 [details] Kernel Oops in syslog
Same Oops occurred this morning. Same stack trace; while unzipping a ~200MB zip file on another filesystem. None of the filesystem are full.
Can we see the second oops trace please? The trace you've attached here is for the second oops: Aug 15 12:37:40 barberine kernel: [135275.066924] EFLAGS: 00010216 (2.6.22.2.skc2 #2) See that "#2" there? It's important that we see the first oops which occurs after the machine boots. Also, something seems to have fed yur oops trace through ksymoops, which is no longer needed. Maybe syslog did it, maybe you did it manually?
Sorry, I have nothing logged on disk this time, but I think you have both oops in my attached file: [135275.066789] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000050 [135275.066898] Oops: 0000 [#1] and [135275.071823] kernel BUG at fs/jbd/transaction.c:272! [135275.071905] invalid opcode: 0000 [#2] I did not run ksymoops, syslog probably did it. Do you want me to try to stop it? System stop responding quickly (less than a minute) after this error (Network NAT also stop working). Yesterday I have time to do SysRq-Sync, tErm, kIll and Unmount on the console. But today I was connected remotely with ssh and can not go on the console. Syslog send me Oops on my ssh-term, first lines were lost (too small buffer) but I checked the attachment and the last part was the same: For #1: EIP: [ext3_get_inode_block+36/254] ext3_get_inode_block+0x24/0xfe SS:ESP 0068:ce437bec For #2: Assertion failure in journal_start() at fs/jbd/transaction.c:272 What can help you ? .config ? dmesg at startup ?
New oops this night, don't know if it's related or not. Bad hardware? Wrong gcc version? (Debian sarge 3.3.5-3)
Created attachment 12609 [details] New oops unable to handle kernel NULL pointer dereference at virtual address 00000040
Created attachment 12610 [details] New oops unable to handle kernel NULL pointer dereference at virtual address 00000040 Here is the full dmesg from startup with oops. I do not have any binary named ksymoops on the computer. Just klogd and syslogd but klogd complain at startup: Aug 25 16:09:09 barberine kernel: klogd 1.4.1#17, log source = /proc/kmsg started. Aug 25 16:09:09 barberine kernel: Cannot find map file. Aug 25 16:09:09 barberine kernel: No module symbols loaded - kernel modules not enabled. Do you want me to put the right /boot/System.map?
Created attachment 12611 [details] .config Linux barberine 2.6.22.3.skc4 #4 Thu Aug 16 10:00:51 CEST 2007 i686 GNU/Linux Gnu C 3.3.5 Gnu make 3.80 binutils 2.15 util-linux 2.12p mount 2.12p module-init-tools 3.2-pre1 e2fsprogs 1.37 nfs-utils 1.0.6 Linux C Library 2.3.2 Dynamic linker (ldd) 2.3.2 Procps 3.2.1 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.2.1 udev 056 seb barberine:/usr/src/linux [1093]% zgrep CONFIG_KALLSYMS /proc/config.gz CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set
Created attachment 12723 [details] New Oops: invalid opcode: 0000 Upgraded kernel to 2.6.22.5 Upgraded GCC to 4.1.1
These oopses are all different and point at memory corruption in various places (page allocator, slab). I'd be suspecting faulty hardware, or some bug in some piece of code (probably a driver) which few other people use.
Created attachment 12725 [details] New Oops: kernel BUG at mm/slab.c:2980! invalid opcode: 0000 (Last file was generated using messages on console, here is oops trace from kern.log) Upgraded kernel to 2.6.22.5 Upgraded GCC to 4.1 Linux barberine 2.6.22.5.skc6 #6 Thu Aug 30 19:57:36 CEST 2007 i686 GNU/Linux Gnu C 4.1.2 (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.3-pre2 e2fsprogs 1.40-WIP Linux C Library 2.3.6 Dynamic linker (ldd) 2.3.6 Procps 3.2.7 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.97 udev 105
There is something strange: Each time, I have to do a hard reboot with the reboot switch. partitions are not unmounted. At boot, ext3 journal is used to recover, but RAID Array are all detected as synchronized: [ 156.411425] md: considering sdb2 ... [ 156.411505] md: adding sdb2 ... [ 156.411574] md: sda2 has different UUID to sdb2 [ 156.411646] md: hdi3 has different UUID to sdb2 [ 156.411721] md: adding hde2 ... [ 156.411788] md: created md0 [ 156.411852] md: bind<hde2> [ 156.411941] md: bind<sdb2> [ 156.412013] md: running: <sdb2><hde2> [ 156.412308] raid1: raid set md0 active with 2 out of 2 mirrors (same for md1) [ 156.413494] md: ... autorun DONE. [ 156.464550] EXT3-fs: INFO: recovery required on readonly filesystem. [ 156.464625] EXT3-fs: write access will be enabled during recovery. [ 157.525557] kjournald starting. Commit interval 5 seconds [ 157.525652] EXT3-fs: md0: orphan cleanup on readonly fs [ 157.574160] EXT3-fs: md0: 9 orphan inodes deleted [ 157.575451] EXT3-fs: recovery complete. [ 157.600352] EXT3-fs: mounted filesystem with ordered data mode. [ 157.600449] VFS: Mounted root (ext3 filesystem) readonly. They really are: [176180.076542] md: data-check of RAID array md0 [176180.076560] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [176180.076572] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. ... [176309.260073] md: md0: data-check done.
You are right; this night memtest found an error. Sorry for disturbing.