Bug 5987
Summary: | hda: cdrom_pc_intr: The drive appears confused - ICH7: 100% native mode - irq 209: nobody cared - ASUS P5WD2-Premium | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Alex Unigovsky (unik) |
Component: | i386 | Assignee: | platform_i386 |
Status: | CLOSED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | acpi-bugzilla, alan, lindqvist, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.16-rc1-mm3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Full dmesg log from boot-up to running state.
cat /proc/interrupts Output of dmidecode 2.7 Output of acpidump Full dmesg with CONFIG_PCI_MSI=n and acpi=off on boot. /proc/interrupts with CONFIG_PCI_MSI=n and acpi=off on boot. lspci -vv with CONFIG_PCI_MSI=n and acpi=off on boot. Combined dmesg of 2.6.16-rc1-mm5 Bootup dmesg output Bootup dmesg output |
Description
Alex Unigovsky
2006-01-31 19:35:35 UTC
Created attachment 7193 [details]
Full dmesg log from boot-up to running state.
Produced by combining on-boot-saved dmesg with current one.
Created attachment 7194 [details]
cat /proc/interrupts
See IRQ209.
Created attachment 7195 [details]
Output of dmidecode 2.7
Created attachment 7196 [details]
Output of acpidump
I had a warning "Wrong checksum for generic table!" just before OEMB table.
That's probably the same as the checksum message in dmesg.
Oh, and one thing I forgot: the M/B BIOS version is 0606, the latest one released by ASUS. By reading their support site I can tell that this is no beta, so checksum error is strange, considering BIOS image image passed the CRC check at the time of flashing. Begin forwarded message: Date: Tue, 31 Jan 2006 23:29:20 -0500 From: "Brown, Len" <len.brown@intel.com> To: "Andrew Morton" <akpm@osdl.org>, "Jeff Garzik" <jgarzik@pobox.com>, "Bartlomiej Zolnierkiewicz" <B.Zolnierkiewicz@elka.pw.edu.pl> Subject: RE: [Bugme-new] [Bug 5987] New: Oopses at boot, inability to use IDE CD-ROM drive and a lockup once-a-day. > ICH7: 100% native mode on irq 209 Has 100% native mode *ever* worked on *any* system? This BIOS clearly has some ACPI related issues, but it isn't immediately clear that they're the cause of the failure. -Len please boot with "acpi=off", attach the dmesg -s64000 output and paste a copy of /proc/interrupts. Please also attach the output from lspci -vv Begin forwarded message: Date: Tue, 31 Jan 2006 23:56:58 -0500 From: Jeff Garzik <jgarzik@pobox.com> To: "Brown, Len" <len.brown@intel.com> Cc: Andrew Morton <akpm@osdl.org>, Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Subject: Re: [Bugme-new] [Bug 5987] New: Oopses at boot, inability to use IDE CD-ROM drive and a lockup once-a-day. Brown, Len wrote: >>ICH7: 100% native mode on irq 209 > > > Has 100% native mode *ever* worked on *any* system? For libata, definitely. I could have sworn it worked in IDE driver too... Jeff > ACPI: OEMB (v001 A M I AMI_OEM 0x10000530 MSFT 0x00000097) @ 0x3ff8e040
> >>> ERROR: Invalid checksum
Evidence of a shoddy BIOS, but not related to the failure at hand.
The acpidump output shows that this table claims a length of 102 bytes.
But the checksum across 102 bytes is non-zero. It is possible that the
BIOS writer got the checksum right but the length wrong, as the
checksum after 46 bytes is zero. Perhaps some buggy proprietary OS
recognizes the OEM-specific "OEMB" as a fixed length structure
of 46 bytes and errantly lets this BIOS through its test suite...
Linux ignores any table with a bad checksum. But as Linux doesn't
recognize an OEMB table, it would ignore it even if the checksum were valid.
Whatever this table is, it is non-volatile:
BIOS-e820: 000000003ff8e000 - 000000003ffe0000 (ACPI NVS)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Another sign of a shoddy BIOS -- duplicate entries in the MADT.
I believe that Linux should survivie this -- programming
these IRQs twice.
Try this:
# /etc/init.d/acpid stop
press the power button a few times and see if the
acpi entry on IRQ9 in /proc/interrupts increments appropriately.
To simplify matters, please reproduce this failure with CONFIG_PCI_MSI=n, and with "nvidia" excluded from the kernel. possibly a duplicate of bug 5084 Created attachment 7215 [details]
Full dmesg with CONFIG_PCI_MSI=n and acpi=off on boot.
Created attachment 7216 [details]
/proc/interrupts with CONFIG_PCI_MSI=n and acpi=off on boot.
Created attachment 7217 [details]
lspci -vv with CONFIG_PCI_MSI=n and acpi=off on boot.
As requested, I uploaded some logs from kernel compiled without CONFIG_PCI_MSI, with acpi=off boot parameter and without nvidia module. I also have the same set of logs/files but without acpi=off. I'll upload them if you need it. The problem can be reproduced in all 3 cases, with IRQ numbers and backtraces changing a bit. In case of acpi=off, I noticed that there are less "The drive appears confused" messages. mount /mnt/cdrom always freezes and cannot be killed even with killall -9. One other thing: IDE in BIOS is set to "enchanced mode". Is it related to "100% native mode" as written in dmesg? The bad thing is that I cannot switch it, because I need 3xSATA + 1xIDE devices, and "compat. mode" only allows 2-2 split. Pressing power button without acpid increments IRQ counter by 1 each time on CPU0 in /proc/interrupts. Created attachment 7246 [details]
Combined dmesg of 2.6.16-rc1-mm5
Kernel version: 2.6.16-rc1-mm5
Without proprietary modules (nvidia).
With CONFIG_PCI_MSI.
With ACPI debug.
Any updates on this problem? Thanks. I have what appears to be very similar problems, verified on Asus P5W DH Deluxe (ICH7 SATA/PATA) and kernel version 2.6.22.1. Experienced problem: DVD drive and sound starts acting up at a random point after the system has been working for a while Kernel version: Linux 2.6.22.1 Hardware: DVD drive is on a ICH7 based PATA controller, sound card is an Audigy 2 ZS From dmesg: hdb: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover by ending request. hdb: cdrom_pc_intr: The drive appears confused (ireason = 0x01). Trying to recover by ending request. irq 23: nobody cared (try booting with the "irqpoll" option) [<c014be9a>] __report_bad_irq+0x36/0x75 [<c014c091>] note_interrupt+0x1b8/0x1f3 [<c014b5d6>] handle_IRQ_event+0x1a/0x3f [<c014c613>] handle_fasteoi_irq+0x8a/0xab [<c01063ff>] do_IRQ+0x57/0x70 [<c0104773>] common_interrupt+0x23/0x28 [<c01021a6>] mwait_idle_with_hints+0x3b/0x3f [<c01021aa>] mwait_idle+0x0/0xa [<c0102389>] cpu_idle+0x96/0xcb [<c035b93c>] start_kernel+0x318/0x320 [<c035b17b>] unknown_bootoption+0x0/0x202 ======================= handlers: [<f88a7602>] (ide_intr+0x0/0x1c1 [ide_core]) [<f8a922b4>] (snd_emu10k1_interrupt+0x0/0x3cc [snd_emu10k1]) Disabling IRQ #23 hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt ide-cd: cmd 0x1e timed out hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt hdb: lost interrupt ide-cd: cmd 0x1e timed out hdb: lost interrupt From /proc/interrupts: 23: 500046 0 IO-APIC-fasteoi ide0, EMU10K1 I think something is wrong about ide0 using level triggered irq line and share it with the other device; IDE is usually edge triggered and therefore non shareable. Can you please post your dmesg. Created attachment 12291 [details]
Bootup dmesg output
Created attachment 12292 [details]
Bootup dmesg output
Sorry, I thought attaching an URL would create a result that made sense, but that just seemed to be a bad idea.
No activity since 2007: Closing |