[please reassign to the correct component if my guess was not right. might be ram, disk or xfs related, among other things.] Problem Description: Oops: 0000 [#1] Modules linked in: ipt_LOG iptable_filter ip_tables bttv video_buf firmware_class i2c_algo_bit v4l2_common btcx_risc tveeprom videodev i pip ide_cd cdrom uhci_hcd sd_mod sata_sil libata scsi_mod ehci_hcd usbcore snd_intel8x0 snd_ac97_codec sis_agp agpgart isofs it87 i2c_se nsor i2c_isa snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore tun hisax isdn slhc psmouse i2c_viapro i2c_core ip v6 8139too sundance mii crc32 CPU: 0 EIP: 0060:[<c0115b0b>] Not tainted VLI EFLAGS: 00010002 (2.6.13) EIP is at do_page_fault+0xcb/0x6dd eax: c8b4f000 ebx: 0000000b ecx: 0000000d edx: 07070707 esi: 0000000e edi: c040e2d8 ebp: c8b4f1dc esp: c8b4f10c ds: 007b es: 007b ss: 0068 Unable to handle kernel paging request at virtual address 08080880 [....] Lots of oops (attaching to the report) when disk is stressed (usually bonnie++ freezes at "writing intelligently" phase). memtest86+ was running for hours without error. machine freezes with or without swap. disks are believed to be good (smart, badblocks) CPU is not overclocked and temp is resonable. I cannot point to any hardware fault (not being familiar with kernel internals this deep), but not impossible. Bonnie++ can crash it almost anytime. Did under 2.6.8.1, 2.6.12.* and does under 2.6.13 too. Distribution: kernel.org Hardware Environment: Intel(R) Celeron(R) CPU 2.00GHz, 512M RAM # lspci 0000:00:00.0 Host bridge: Silicon Integrated Systems [SiS] SiS645DX Host & Memory & AGP Controller 0000:00:01.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI bridge (AGP) 0000:00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS962 [MuTIOL Media IO] (rev 04) 0000:00:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016 0000:00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] 0000:00:02.7 Multimedia audio controller: Silicon Integrated Systems [SiS] Sound Controller (rev a0) 0000:00:03.0 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f) 0000:00:03.1 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f) 0000:00:03.2 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f) 0000:00:03.3 USB Controller: Silicon Integrated Systems [SiS] USB 2.0 Controller 0000:00:08.0 Network controller: Eicon Networks Corporation Diva 2.01 S/T PCI (rev 01) 0000:00:09.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02) 0000:00:0a.0 VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 01) 0000:00:0c.0 Ethernet controller: D-Link System Inc DL10050 Sundance Ethernet # mount /dev/hda7 on / type xfs (rw,usrquota) proc on /proc type proc (rw,gid=104) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/hdc2 on /var/lib/backuppc type ext3 (rw) none on /dev type tmpfs (rw,size=5M,mode=0755) none on /proc/bus/usb type usbfs (rw) /dev/mapper/vg0-backup on /mnt/lvm/backup type xfs (rw) /dev/mapper/vg0-db on /mnt/lvm/db type xfs (rw) Software Environment: Debian GNU/Linux unstable (SID) Steps to reproduce: start bonnie++ on / [EIDE] or ..vg0-db/ [SATA], wait, crash.
Created attachment 5947 [details] Serial console capture
Hmpf, are you using CONFIG_4KSTACKS? Could you pass over your .config please
Created attachment 5974 [details] .config Yes, this machine happens to use 4Kstacks. Config attached. (I'll try to crash without 4kstacks, reporting back a bit later.)
I switched off 4kstacks. I cannot freeze the machine anymore. :-/ Okay, nice hint, thank you very much; seems to be fixed (I go and switch off 4kstacks everywhere right now). I still wonder why, but that's just out of curiousity...
The combination of xfs and device-mapper plus sata (I don't know how stack-hungry sata is, but it does use the scsi layer). This is unfortunate, but I'm sure there are people who are interested in this. What would be very nice is if you could turn on CONFIG_4KSTACKS and turn on "Stack utilization instrumentation" under "Kernel hacking". That should give us a trace of what makes the stack overflow (trace will be on the console). Either that or, describe your environment so that someone else can reproduce the problem. Thanks
I'll try to get a stack util dump while I'm in the crashing mood. (It is fun to do on a remote server *wink* *wink* [makes operators finally work for their money].) BTW crash happened without lvm and sata, too. Maybe it's just harder to reproduce, I don't know, since I didn't try bonnie++ until I got the new sata drive, and freezes were intermittent.
Created attachment 5979 [details] Oops with requested flag Well, I hope it's helpful, stack trace seems to shoot itself in the foot... I can switch on any debugging flag (as long as I don't need further magic than the serial console) if it'd help...
As far as I'm concerned, by now XFS has solved their stack issues and should be safe to use with CONFIG_4KSTACKS, so I'm closing this bug. If you can reproduce this problem with the latest stable kernel version, please reopen it.