Bug 11086
Summary: | 2.6.26 (sata_nv = irq 21 problem after halting) | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Dan (dan76) |
Component: | Serial ATA | Assignee: | Tejun Heo (htejun) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | rjw, trenn |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.26 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 10492 | ||
Attachments: | nv-hardreset.patch |
Description
Dan
2008-07-14 12:31:53 UTC
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 14 Jul 2008 12:31:54 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11086 > > Summary: 2.6.26 (sata_nv = irq 21 problem after halting) > Product: IO/Storage > Version: 2.5 > KernelVersion: 2.6.26 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Serial ATA > AssignedTo: jgarzik@pobox.com > ReportedBy: fragabr@gmail.com > > > Latest working kernel version: 2.6.25 > Earliest failing kernel version: 2.6.26 > Distribution: Linux from scratch > Hardware Environment: athlon64 x2 5600+, Asus M2N-E > Software Environment: > Problem Description: > > Something happened with ncq's sata_nv support. When I halt my > system (Athlon64, Asus M2N-E), I get the following (I had to copy by > hand, since the system is halted, so it's just an excerpt): > > IRQ 21: nobody cared (try booting with the "irqpool" option) > Pid 0, comm: swapper not tainted 2.6.26 #2 > > Call trace: > ........... nv_swncq_interrupt > ........... __report_bad_irq > ........... note_interrupt > ........... handle_fasteoi_irq > ........... do_irq > ........... ret_from_intr > > Handles: > > [<f88888888803a5b90>] (nv_swncq_interrupt+0x0/0xd0) > > Disabling IRQ #21 > > This just happens after a halt. > > If I use irqpool, this error doesn't happen. > > Thanks. > > Steps to reproduce: > > Just shutdown the system (halt). > It's a 2.6.26 regression. It happens at halt-time, so it won't be the usual ACPI stuff. I think this is the second report of ata going splat in this manner at shutdown time. Did we get the order of something wrong? Hmmm... libata doesn't do much on shutdown. sd spins down disks and that's about it. All the rest keeps running as usual until the power is cut, so there isn't much order which can go wrong. That said, I think it's the third report somewhat related to libata and shutdown, so it smells fishy. Daniel, does the machine power off after that? Or does the problem prevent the machine from powering off? Hi Tejun, the problem prevents the machine from powering off (it stays on after the problem). If you need more information or tests, just ask. Thanks. Hi again! Now testing with 2.6.27-rc8 on the same machine, I get these lines (I can't see everything, since the I only have 60 console text lines): 08c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 090: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 094: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 098:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 09c:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0a0:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0a4:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0a8:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0ac:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0b0:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0b4:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0b8:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0bc:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0c0:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0c4:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0c8:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0cc:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0d0:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0d4:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0d8:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0dc:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0e0:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0e4:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0e8:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0ec:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0f0:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0f4:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0f8:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 0fc:00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 And the machine stays powered on, of course (otherwise I couldn't read this ;). Created attachment 18169 [details]
nv-hardreset.patch
Can you please test the attached patch on top of 2.6.27-rc8? Thanks.
Of course! I applied your patch, but the result was the same as before. Full of "00000000" on screen... I can't see if there's something before this (between System halted and the zeroes). Maybe if there is a way to prevent this zeroes from being displayed, it would help. Thanks. Cc'ing Thomas and Rafael. The machine doesn't shut down and dumps strange messages. Any ideas? Does turning off swncq make any difference? Tejun, no. Passing "sata_nv.swncq=0" to kernel does not help. Please attach /proc/interrupts from the working system. Here it's the /proc/interrupts, but please notice that there're different messages between 2.6.26 and 2.6.27-rc8 (above). The swncq error message happened in 2.6.26, while the 2.6.27-rc8 gives lots of "00000000": CPU0 CPU1 0: 24 2 IO-APIC-edge timer 1: 0 115 IO-APIC-edge i8042 7: 1 0 IO-APIC-edge 9: 0 0 IO-APIC-fasteoi acpi 14: 1 51 IO-APIC-edge pata_amd 15: 0 0 IO-APIC-edge pata_amd 16: 112 16417 IO-APIC-fasteoi nvidia 18: 0 0 IO-APIC-fasteoi cx88[0] 20: 11 2131 IO-APIC-fasteoi ohci_hcd:usb1 21: 0 0 IO-APIC-fasteoi sata_nv 22: 0 730 IO-APIC-fasteoi sata_nv, HDA Intel 23: 215 45342 IO-APIC-fasteoi sata_nv, ehci_hcd:usb2 318: 259 29194 PCI-MSI-edge eth0 NMI: 0 0 Non-maskable interrupts LOC: 118067 117814 Local timer interrupts RES: 11173 4175 Rescheduling interrupts CAL: 3111 2107 function call interrupts TLB: 243 151 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts SPU: 0 0 Spurious interrupts ERR: 1 If you need any test or more informatin, feel free to ask. I tested with 2.6.28-rc2 and the strange messages after halt are gone. But the machine isn't being shut down. I only see System halted and the machine keeps powered on... Is this a kernel issue or something related to BIOS? Thanks. I answered too early. Yesterday I got a debug message (trace), but as my console had only 25 lines, I couldn't see the top line of the kernel trace. Hi Tejun. Now, with 2.6.28-rc2 (I managed to switch to 60 lines of console and could get the trace) I get the following (I had to copy more or less by hand): WARNING: at net/sched/sch_generic.c:226 dev_watchdog... Call Trace: IRQ .... warn_slowpath dequeue_task try_to_wake_up __next_cpu find_busiest_group getnstimeoftheday strlcpy dev_Watchdog cascade ... ... apic_timer_interrupt default_idle c1e_idle cpu_idle *** Does it help in some way? If you need it complete, just ask. Thank you again! That's from network interface watchdog. A packet was scheduled to go out but it couldn't go out. I can't determine whether it's somehow related to the shutdown problem or not. If turning of swncq doesn't make any change, it could be that the changes in libata isn't the cause of the problem. I don't have any idea here. The only sure thing would be trying bisecting and find out which commit exactly broke power off. Thanks. No bisect done so closing I didn't do any bisect because it's too much work and since the problem is harmless, I'm too lazy to do any bisect. Since I want to change my computer, I'll avoid anything based on nvidia sata... AHCI is the way to go, much better. (In reply to comment #16) > No bisect done so closing > Well, nevermind, it seems to be fixed in 2.6.29. No more strange messages after halt. |