Bug 5140
Summary: | none of the 2.6.13-rc? kernels survives longer than 4 days. | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Danny ter Haar (osdl) |
Component: | SCSI | Assignee: | Diego Calleja (diegocg) |
Status: | REJECTED DUPLICATE | ||
Severity: | high | ||
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13-rc7-git1 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Danny ter Haar
2005-08-27 11:08:31 UTC
DevQ(0:1:0): NMI Watchdog detected LOCKUP on CPU0CPU 0 Modules linked in: rawfs rtc evdev hw_random i2c_amd8111 tg3 e100 mii w83627hf eeprom lm85 i2c_sensor i2c_isa i2c_amd756 i2c_core psmouse Pid: 168, comm: scsi_eh_0 Not tainted 2.6.13-rc7-git1 RIP: 0010:[<ffffffff802644f9>] <ffffffff802644f9>{serial_in+105} RSP: 0018:ffff81007fc17b80 EFLAGS: 00000002 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff80473a40 RBP: 0000000000002705 R08: 0000000000000020 R09: 0000000000007930 R10: 0000000000000034 R11: 000000000000000a R12: ffffffff80473a40 R13: ffffffff8045f6fe R14: 000000000000000d R15: 000000000000000d FS: 00002aaaab3cbe90(0000) GS:ffffffff80485800(0000) knlGS:00000000556ada40 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000515970 CR3: 000000007dc27000 CR4: 00000000000006e0 Process scsi_eh_0 (pid: 168, threadinfo ffff81007fc16000, task ffff8100033607c0) Stack: ffffffff8026682d 0000000500000002 ffffffff803ebc60 0000000000007931 000000000000000d 0000000000000096 0000000000000010 0000000000000046 ffffffff8012ed9c 000000000000793e Call Trace:<ffffffff8026682d>{serial8250_console_write+413} <ffffffff8012ed9c>{__call_console_drivers+76} <ffffffff8012f053>{release_console_sem+339} <ffffffff8012fbc9>{vprintk+601} <ffffffff8012fbc9>{vprintk+601} <ffffffff8012fc3e>{printk+78} <ffffffff80325a40>{thread_return+0} <ffffffff8012fc3e>{printk+78} <ffffffff8028c235>{ahd_print_register+261} <ffffffff802abc34>{ahd_platform_dump_card_state+100} <ffffffff80296b0d>{ahd_dump_card_state+8973} <ffffffff802ad320>{ahd_linux_abort+624} <ffffffff802aa590>{ahd_linux_sem_timeout+0} <ffffffff80284f5c>{scsi_error_handler+1324} <ffffffff8010e396>{child_rip+8} <ffffffff80284a30>{scsi_error_handler+0} <ffffffff8010e38e>{child_rip+0} Hello to all, I was victim of the same problem on my newly acquired AMD 64 dual core. I'm using SuSE 10. The kernel was stable when I installed it. Suddenly, after a kernel update from SuSE using Yast, me system crashed everyday, sometimes even several times a day. Nothing was found in the logs. This usually happened like this. After some minutes or hours, a hard disk switched off, then the hard disk LED switched on and remained always on. By hearing the fans running at maximal speed, I guess that this would trigger an infinite loop. Everything was frozen and only a hardware reboot worked. I thought logically that this could come from a nVidia driver bug. I simply started the system without X-Window, but it crashed also. I also removed acpi, apm, microcode, mce, anything that I could imagine. I continued to crash daily. I then suspected an unfinished/beta/experimental driver. So I recompiled the kernel to remove unnecessary code. I recompiled it 20 times each time removing unnecessary drivers or features in a hope to be able to know which part fails an d report it. I noticed that the kernel was faster and smaller, but this didn't fix anything. It continued to crash daily or even several times a day. Since the xfs filesystem doesn't compile cleanly and the system freezes usually involved a hard disk switching off and remaining blocked, I thought xfs could corrupt kernel memory. So I converted all my partitions to reiserfs and finally removed xfs from the kernel. It continued to crash. (By the way, it's not easy to make reliable backups when the kernel crashes randomly... I had to use find and cmp to make sure all files were backed up and that they were really identical) Finally, I decided to update my kernel to the latest stable version, although it's apparently not supported by SuSE. I installed and compiled the version 2.6.15.6 using my old configuration and it works perfectly now... So to sum up: A bug has been introduced with the kernel provided in the latest SuSE 10 security update and has been fixed in 2.6.15.6. |