Most recent kernel where this bug did *NOT* occur: 2.6.17.13 Distribution: Gentoo 2006.1 Hardware Environment: 00:00.0 Host bridge: nVidia Corporation nForce3 Host Bridge (rev a4) 00:01.0 ISA bridge: nVidia Corporation nForce3 LPC Bridge (rev a6) 00:01.1 SMBus: nVidia Corporation nForce3 SMBus (rev a4) 00:02.0 USB Controller: nVidia Corporation nForce3 USB 1.1 (rev a5) 00:02.1 USB Controller: nVidia Corporation nForce3 USB 1.1 (rev a5) 00:02.2 USB Controller: nVidia Corporation nForce3 USB 2.0 (rev a2) 00:08.0 IDE interface: nVidia Corporation nForce3 IDE (rev a5) 00:0a.0 PCI bridge: nVidia Corporation nForce3 PCI Bridge (rev a2) 00:0b.0 PCI bridge: nVidia Corporation nForce3 AGP Bridge (rev a4) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:08.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 [CrystalClear SoundFusion Audio Accelerator] (rev 01) 01:0a.0 Network controller: RaLink RT2500 802.11g Cardbus/mini-PCI (rev 01) 01:0c.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 50) 01:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 02:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4 MX 420] (rev a3) Software Environment: gcc-4.1.1, glibc-2.5-r0 Problem Description: On Kernel .18 and onwards, the machine wont boot. It reaches sata_via and tries probing and takes a long time stating that the device is being slow, please wait, it then starts repeating that the device is acting abnormally (echos this about 6 times) then pauses for about 30 seconds. It repeats this process about 3 times and takes about 5 minutes in all. It eventually continues but hasn't actually detected any drives and thus can't complete the boot process (as the machine can't see the drive). The controller works fine with kernel .17 and below though. Steps to reproduce: Try and use a VT6420 chipset with Kernel 2.6.18 or 2.6.19.
I can't tell without full dmesg but it seems to be via IRQ quirk problem. Ben, can you post boot dmesg? Netconsole comes handy when you can't mount root fs. Documentation/networking/netconsole.txt.
Can't I'm afraid, the machine can only access the network through the Serialmonkey rt2500 driver and it's kinda in a safe place not to be broken again anytime soon. I can boot the old kernel however and take a photo of the screen while it boots. Not perfect but it would be the same as dmesg would have outputted and contain the entire error strings.
That will be good enough.
Created attachment 9797 [details] Kernel output Sorry this took so long, only just managed to get around to robbing a camera to do it. Basically it's what you see just repeated 3 times.
Does giving 'irqpoll' option to kernel change anything?
Nope, I tried things like that first. irqpoll, pci=routeirq, irqbalance, etc etc I tried anything I could think of and any combination of them but nothing changed (well, the IRQ changed in the output but it still acted the same).
Could it be the same situation as I hit (bug #7415)?
Any word on this? I need to update the kernel to (hopefully) get my TV card working, but at the moment I can't due to this issue (else I wont be able to boot, heh).
Actually I think the above may be right, I'm checking the patches now to see if it's a duplicate of http://bugzilla.kernel.org/show_bug.cgi?id=7415.
Could you detect, which is the first bad commit in git-tree ? (Just accordingly to the post by Daniel Drake, described in #7415). Or, could you try just revert the Tejun's commit by dsd's patch (also in #7415)? The solution(s) mentioned by me works for me, without problems, for gentoo-2.6.18, -r2, -r4, -r5, -r6, -2.6.19, 2.6.19-r2 and for git 2.6.20-rc1 as well... Or, it is also possible, there is more than one source of troubles, but my system suffers just by ATA_NIEN problem..
There were a number of via detection failure reports which were caused by VIA IRQ quirk problem. Then, others probably with ->freeze() problem. Also, polling IDENTIFY seemed to fix some cases of mis-detections too. Can you give a shot at 2.6.20-rc2?
No, it got worse, now it froze up for a while before it even started giving the ATA error messages.
Tejun, I am afraid, things are even more complicated: Ben wrote, he uses Gentoo distro, and all gentoo-sources kernel are already patched agaings VIA quirks for long, long time ( dsd knows details better, but all gentoo-2.6.17, 2.6.18 and 2.6.19 kernels, which I tried, contain this patch). I think, there must be anoter reason with identical symptoms.
I wasn't using Gentoo sources, I was using the original ones from kernel.org.
Aaah,sorry, I was mistaken by your report (Distribution: Gentoo 2006.1).But the VIA quirk problem is quite probable then - I did not see the message like as: 'PCI: VIA IRQ fixup for 0000:00:0f.1, from 255 to 2' in your picture #4, immediately before VIA SATA start-up. So, did you try any of the gentoo-sources >= 2.6.18, or, eventually with the .freeze patch?
Ok, using Gentoo Sources .19 with this (http://bugzilla.kernel.org/attachment.cgi?id=9893&action=view) patch FIXED the issue. Huzzah. :)
If THIS works, then you really are in the same situation as me. But the using of gentoo-sources (with VIA quirks patch) AND my ata_irq_on(ap) patch should work as well. Because of .19, you should try the second part of the ata_irq_on patch (designed for patching of libata-sff.c). Or, you can try vanilla sources and patch them with VIA quirks patch AND ata_irq_on(ap) simultaneously... and it should work also.
This should have been fixed in v2.6.20 w/ svia_noop_freeze. Closing.