Pre Scriptum: Sorry for totally wrong category, but this problem can't be easily categorized by me. Most recent kernel where this bug did not occur: Cannot say exactly, because I was plagued by various ACPI- and SATA-related bugs since the switch to generic PCI routines on i386 (it was in 2.6.14, IIRC). The 2.6.16-rc1 version was the first one that allowed me to boot without irqpoll, but the problem remained. Distribution: Gentoo Linux Hardware Environment: ASUS P5WD2-Premium motherboard (ICH7 for IDE and SATA storage, other controllers unused); /* My previous MB got struck by 220V current, so I coudn't compare :) */ Pentium D 3GHz; 1GB RAM; Marvell 88E8001 [sk98lin] and Intel 82573V [e1000] NICS, both onboard; nVidia GeForce FX5900 on PCIe16x slot; Creative SB Audigy2 ZS; 3 SATA hard drives connected to ICH7; 1 ATAPI combo-drive connected to ICH7 as primary master; Software Environment: ACPI 2.0 tables and APIC enabled in M/B BIOS; GRUB 0.97; Kernel 2.6.16-rc1-mm3 (also tried mm2, mm1, and various 2.6.15 mm's); No distro or external patches except SquashFS and CDFS; GCC 4.0.2 with ordinary Gentoo patchset; Using udev 084; On ext2, ext3, reiserfs and reiser4 partitions; Problem Description: During ACPI init, I can see this (maybe irrelevant) message: -- -- CUT HERE -- -- ACPI: OEMB (v001 A M I AMI_OEM 0x10000530 MSFT 0x00000097) @ 0x3ff8e040 >>> ERROR: Invalid checksum ACPI: MCFG (v001 A M I OEMMCFG 0x10000530 MSFT 0x00000097) @ 0x3ff887c0 -- -- CUT HERE -- -- Later on, after the kernel loads, and initscripts start to load modules, the next (more relevant, IMHO) thing appears: -- -- CUT HERE -- -- drivers/usb/serial/usb-serial.c: USB Serial Driver core irq 209: nobody cared (try booting with the "irqpoll" option) <c103fbf4> __report_bad_irq+0x24/0x7f <c103fcd0> note_interrupt+0x81/0x231 <c100ecda> mark_offset_pmtmr+0x9e/0x10d <c103f6cd> handle_IRQ_event+0x2e/0x5a <c103f7d6> __do_IRQ+0xdd/0xe7 <c100515c> do_IRQ+0x3a/0x52 ======================= <c1003652> common_interrupt+0x1a/0x20 handlers: [<c11c6585>] (ide_intr+0x0/0x1f5) Disabling IRQ #209 drivers/usb/serial/usb-serial.c: USB Serial support registered for Handspring -- -- CUT HERE -- -- The IRQ209 is almost always set up (by APIC, I suppose?) to be shared between ide0 (ICH7 IDE) and EMU10K1 (Audigy using ALSA). Once I've seen this backtrace coming from within some sk98lin function, and that time IRQ209 was on sk98lin NIC (maybe, I don't remember, and I could not reproduce that). After that backtrace was printed, PC finished loading modules and proceeded to runlevel scripts, started some programs, and after 10 seconds or so I saw this: -- -- CUT HERE -- -- input: Bluetooth HID Boot Protocol Device as /class/input/input7 hda: lost interrupt hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) [numerous repating lines cut out] hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) hda: packet command error: status=0xd0 { Busy } ide: failed opcode was: unknown hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) irq 209: nobody cared (try booting with the "irqpoll" option) <c103fbf4> __report_bad_irq+0x24/0x7f <c103fcd0> note_interrupt+0x81/0x231 <c100ecda> mark_offset_pmtmr+0x9e/0x10d <c103f6cd> handle_IRQ_event+0x2e/0x5a <c103f7d6> __do_IRQ+0xdd/0xe7 <c100515c> do_IRQ+0x3a/0x52 ======================= <c1003652> common_interrupt+0x1a/0x20 <c1001a97> mwait_idle+0x2a/0x34 <c1001a50> cpu_idle+0x61/0x7e <c13ca4e1> start_kernel+0x31f/0x3f2 <c13ca5b4> unknown_bootoption+0x0/0x24c handlers: [<c11c6585>] (ide_intr+0x0/0x1f5) [<f8b8af00>] (snd_emu10k1_interrupt+0x0/0x450 [snd_emu10k1]) Disabling IRQ #209 hda: lost interrupt hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) hda: cdrom_pc_intr: The drive appears confused (ireason = 0x01) irq 209: nobody cared (try booting with the "irqpoll" option) <c103fbf4> __report_bad_irq+0x24/0x7f <c103fcd0> note_interrupt+0x81/0x231 <c103f6cd> handle_IRQ_event+0x2e/0x5a <c103f7d6> __do_IRQ+0xdd/0xe7 <c100515c> do_IRQ+0x3a/0x52 ======================= <c1003652> common_interrupt+0x1a/0x20 <c1001a97> mwait_idle+0x2a/0x34 <c1001a50> cpu_idle+0x61/0x7e <c13ca4e1> start_kernel+0x31f/0x3f2 <c13ca5b4> unknown_bootoption+0x0/0x24c handlers: [<c11c6585>] (ide_intr+0x0/0x1f5) [<f8b8af00>] (snd_emu10k1_interrupt+0x0/0x450 [snd_emu10k1]) Disabling IRQ #209 -- -- CUT HERE -- -- The funny thing is that although my CD-ROM stops working, I can still use the sound card without any problems and glitches. Didn't try irqpoll yet, but previous experience tells me that it will only hide stuff from dmesg and not fix the issue. As for occasional lockups, I can not say anything right now. It can be related, or it can be my own fault :) Full dmesg and /proc/interrupts will follow shortly. Steps to reproduce: Compile and install any m#2\.6\.1[56]-mm\d+# kernel. Boot it. See oopses passing by. Grep dmesg. Think a bit. File a bug :) Thanks in advance, Unik.
Created attachment 7193 [details] Full dmesg log from boot-up to running state. Produced by combining on-boot-saved dmesg with current one.
Created attachment 7194 [details] cat /proc/interrupts See IRQ209.
Created attachment 7195 [details] Output of dmidecode 2.7
Created attachment 7196 [details] Output of acpidump I had a warning "Wrong checksum for generic table!" just before OEMB table. That's probably the same as the checksum message in dmesg.
Oh, and one thing I forgot: the M/B BIOS version is 0606, the latest one released by ASUS. By reading their support site I can tell that this is no beta, so checksum error is strange, considering BIOS image image passed the CRC check at the time of flashing.
Begin forwarded message: Date: Tue, 31 Jan 2006 23:29:20 -0500 From: "Brown, Len" <len.brown@intel.com> To: "Andrew Morton" <akpm@osdl.org>, "Jeff Garzik" <jgarzik@pobox.com>, "Bartlomiej Zolnierkiewicz" <B.Zolnierkiewicz@elka.pw.edu.pl> Subject: RE: [Bugme-new] [Bug 5987] New: Oopses at boot, inability to use IDE CD-ROM drive and a lockup once-a-day. > ICH7: 100% native mode on irq 209 Has 100% native mode *ever* worked on *any* system? This BIOS clearly has some ACPI related issues, but it isn't immediately clear that they're the cause of the failure. -Len
please boot with "acpi=off", attach the dmesg -s64000 output and paste a copy of /proc/interrupts. Please also attach the output from lspci -vv
Begin forwarded message: Date: Tue, 31 Jan 2006 23:56:58 -0500 From: Jeff Garzik <jgarzik@pobox.com> To: "Brown, Len" <len.brown@intel.com> Cc: Andrew Morton <akpm@osdl.org>, Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Subject: Re: [Bugme-new] [Bug 5987] New: Oopses at boot, inability to use IDE CD-ROM drive and a lockup once-a-day. Brown, Len wrote: >>ICH7: 100% native mode on irq 209 > > > Has 100% native mode *ever* worked on *any* system? For libata, definitely. I could have sworn it worked in IDE driver too... Jeff
> ACPI: OEMB (v001 A M I AMI_OEM 0x10000530 MSFT 0x00000097) @ 0x3ff8e040 > >>> ERROR: Invalid checksum Evidence of a shoddy BIOS, but not related to the failure at hand. The acpidump output shows that this table claims a length of 102 bytes. But the checksum across 102 bytes is non-zero. It is possible that the BIOS writer got the checksum right but the length wrong, as the checksum after 46 bytes is zero. Perhaps some buggy proprietary OS recognizes the OEM-specific "OEMB" as a fixed length structure of 46 bytes and errantly lets this BIOS through its test suite... Linux ignores any table with a bad checksum. But as Linux doesn't recognize an OEMB table, it would ignore it even if the checksum were valid. Whatever this table is, it is non-volatile: BIOS-e820: 000000003ff8e000 - 000000003ffe0000 (ACPI NVS)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Another sign of a shoddy BIOS -- duplicate entries in the MADT. I believe that Linux should survivie this -- programming these IRQs twice. Try this: # /etc/init.d/acpid stop press the power button a few times and see if the acpi entry on IRQ9 in /proc/interrupts increments appropriately.
To simplify matters, please reproduce this failure with CONFIG_PCI_MSI=n, and with "nvidia" excluded from the kernel.
possibly a duplicate of bug 5084
Created attachment 7215 [details] Full dmesg with CONFIG_PCI_MSI=n and acpi=off on boot.
Created attachment 7216 [details] /proc/interrupts with CONFIG_PCI_MSI=n and acpi=off on boot.
Created attachment 7217 [details] lspci -vv with CONFIG_PCI_MSI=n and acpi=off on boot.
As requested, I uploaded some logs from kernel compiled without CONFIG_PCI_MSI, with acpi=off boot parameter and without nvidia module. I also have the same set of logs/files but without acpi=off. I'll upload them if you need it. The problem can be reproduced in all 3 cases, with IRQ numbers and backtraces changing a bit. In case of acpi=off, I noticed that there are less "The drive appears confused" messages. mount /mnt/cdrom always freezes and cannot be killed even with killall -9. One other thing: IDE in BIOS is set to "enchanced mode". Is it related to "100% native mode" as written in dmesg? The bad thing is that I cannot switch it, because I need 3xSATA + 1xIDE devices, and "compat. mode" only allows 2-2 split.
Pressing power button without acpid increments IRQ counter by 1 each time on CPU0 in /proc/interrupts.
Created attachment 7246 [details] Combined dmesg of 2.6.16-rc1-mm5 Kernel version: 2.6.16-rc1-mm5 Without proprietary modules (nvidia). With CONFIG_PCI_MSI. With ACPI debug.
Any updates on this problem? Thanks.
I have what appears to be very similar problems, verified on Asus P5W DH Deluxe (ICH7 SATA/PATA) and kernel version 2.6.22.1. Experienced problem: DVD drive and sound starts acting up at a random point after the system has been working for a while Kernel version: Linux 2.6.22.1 Hardware: DVD drive is on a ICH7 based PATA controller, sound card is an Audigy 2 ZS From dmesg: hdb: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover by ending request. hdb: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover by ending request. irq 23: nobody cared (try booting with the "irqpoll" option) [<c014be9a>] __report_bad_irq+0x36/0x75 [<c014c091>] note_interrupt+0x1b8/0x1f3 [<c014b5d6>] handle_IRQ_event+0x1a/0x3f [<c014c613>] handle_fasteoi_irq+0x8a/0xab [<c01063ff>] do_IRQ+0x57/0x70 [<c0104773>] common_interrupt+0x23/0x28 [<c01021a6>] mwait_idle_with_hints+0x3b/0x3f [<c01021aa>] mwait_idle+0x0/0xa [<c0102389>] cpu_idle+0x96/0xcb [<c035b93c>] start_kernel+0x318/0x320 [<c035b17b>] unknown_bootoption+0x0/0x202 ======================= handlers: [<f88a7602>] (ide_intr+0x0/0x1c1 [ide_core]) [<f8a922b4>] (snd_emu10k1_interrupt+0x0/0x3cc [snd_emu10k1]) Disabling IRQ #23 hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt ide-cd: cmd 0x1e timed out hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt ide-cd: cmd 0x1e timed out hdb: lost interrupt From /proc/interrupts: 23: 500046 0 IO-APIC-fasteoi ide0, EMU10K1
I think something is wrong about ide0 using level triggered irq line and share it with the other device; IDE is usually edge triggered and therefore non shareable. Can you please post your dmesg.
Created attachment 12291 [details] Bootup dmesg output
Created attachment 12292 [details] Bootup dmesg output Sorry, I thought attaching an URL would create a result that made sense, but that just seemed to be a bad idea.
No activity since 2007: Closing