Bug 7415
Summary: | VIA SATA system cannot be initialized | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | j.taimr (j.taimr) |
Component: | Serial ATA | Assignee: | Tejun Heo (htejun) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | alan, albertcc, htejun, kernel, mehturt |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | >=2.6.18 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
dmesg after good boot of 2.6.17-gentoo-r8
dmesg from 'bad' boot of 2.6.18-gentoo dmesg from 'bad' boot of vanilla-2.6.19-rc2 lspci output lspci -n output lspci -v output patch dmesg of 2.6.18-gentoo-r4 after unrolling of the problematic commit if vt6420, don't use SCR at all during detection Modified Tejun's patch Set ATA_NIEN and reset ATA_NIEN patch turn on irq on both devices |
Description
j.taimr
2006-10-25 09:27:53 UTC
Created attachment 9349 [details]
dmesg after good boot of 2.6.17-gentoo-r8
Created attachment 9350 [details]
dmesg from 'bad' boot of 2.6.18-gentoo
Created attachment 9351 [details]
dmesg from 'bad' boot of vanilla-2.6.19-rc2
Created attachment 9352 [details]
lspci output
Created attachment 9353 [details]
lspci -n output
Created attachment 9354 [details]
lspci -v output
I have the same problem with vanilla 2.6.18.1 This patch fixed the problem for me http://marc.theaimsgroup.com/?l=git-commits-head&m=116121959622812&q=raw I tried the mentioned patch; it did NOT help for me, still the same situation. Original report is at http://bugs.gentoo.org/150773 After few weeks my situation remains unchanged; I tried new kernels gentoo-sources-2.6.18-r1, gentoo-sources-2.6.18-r2, gentoo-sources-2.6.18-r3, gentoo-sources-2.6.19 and gentoo-sources-2.6.19-r1. The symptoms are still the same as I wrote above, none of my SATA disks is accessible after a boot attempt with any of the kernels mentioned. Thanks for your continued interest here. Here's a process you could try which is likely to find the exact buggy commit, but it will probably be quite time consuming for you: http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt You need to start with 2.6.17 as the good kernel and 2.6.18 as the bad one. But, it's not quite that simple: we know that your hardware will *not* work on upstream 2.6.17 or 2.6.18 because of another issue separate to this one: it will fail to quirk the IRQ, i.e. it will not show this message: PCI: VIA IRQ fixup for 0000:00:0f.0, from 11 to 2 So, you need to ensure that one of the quirk fix patches is included in each iteration of kernel that you test with this process. I'd suggest using this one as a safe bet: http://dev.gentoo.org/~dsd/genpatches/trunk/2.6.17/2500_via-irq-quirk-revert.patch I am not sure if you will have to revert that before marking a specific iteration as good or bad (then re-applying it before compiling the next iteration), or if it is safe to leave in place, this is for you to find out ;) Here's another writeup on bisections which includes more of an introduction to obtaining the kernel tree via git: http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ Still trying, now with git. Looks as like as a kinda of torture, the infos, mentioned above are far too optimistic (the single bad patch isolation and identification in 13 steps e.t.c). I recompiled git kernels approx. 60 times during last week, unfortunately many times things went bad (I noted 5x kernel OOps during boot of new git-kernel after recompilation, once within SATA subsystem, 4 times inanother places, some kernels did not compile at all because of undefined symbols). So far I only know,that all 2.6.17, 2.6.17-rc1..rc6 kernels work properly, 2.6.18-rc1 and 2.6.18 suffer by this problem. Last version of libata,which works for me is 1.20. I think the problem starts with libata 2.0, so far I did not find a way, how to revert the patch 1.20->2.0 for master kernel (git complained that 'cannot revert this patch!'). I have to be more familiar with git and go manually from the last good revision patch-by-patch. And this flip-floping with VIA-quirks patch must be done every step until it is a part of the current kernel (git is not happy otherwise...). Then perhaps you could find the commit before the 1.2 --> 2.0 upgrade, check it out, confirm whether it works. Then check out the commit after, and confirm that the breakage appears. If you are plagued by compilation errors then try building a minimal kernel only. I have done several bisections myself and have only had minor problems. After simplifying of .config and cleaning of some mess, I have a good candidate for a troublemaker: Author: Tejun Heo <htejun@gmail.com> 2006-06-16 08:13:53 Committer: Jeff Garzik <jeff@garzik.org> 2006-06-20 11:12:15 Parent: c5fa46e175ccd02803031ea071060cdb01521736 ([libata] sata_nv: s/spin_lock_irqsave/spin_lock/ in irq handler) Branches: origin, master, bisect Follows: v2.6.17 Precedes: v2.6.18-rc1 [PATCH] sata_via: convert to new EH, take #3 Convert sata_via to new EH. vt6420 used ATA_FLAG_SRST while vt6421 used ATA_FLAG_SATA_RESET. This difference seems to be an accident rather than intended. This patch makes both flavors use ata_bmdma_error_handler() which makes use of both SRST and SATA hardreset. This behavior change is intended and if it breaks anything, it should be very easy to spot. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Commit #c5fa46e175ccd02803031ea071060cdb01521736 is good, commit #d7a80dad2fe19a2b8c119c8e9cba605474a75a2b is bad, and commint #40ef1d8d48e364dce689342adfdc475aa53f4808 is bad as well and this is the result of git bisect. Unfortunately, it cannot be reverted: First trying simple merge strategy to revert. Simple revert fails; trying Automatic revert. ERROR: drivers/scsi/sata_via.c: Not handling case 322890b400a6000a9d627fa44d69fcabdfe9f131 -> -> c6975c5580ef8c9e62d3b6660e6841a5b9575c69 fatal: merge program failed What should I try now? It is definitely #40ef1d8d48e364dce689342adfdc475aa53f4808. I was able to revert this commit in #71d530cd1b6d97094481002a04c77fea1c8e1c22 (which has the mentioned troubles and it was 'bad' in git bisect testing). After reverting kernel detects my SATA subsystem properly. Created attachment 9854 [details]
patch
Many thanks for all the testing! Here is a patch which should apply to recent
kernels. It reverts commit 40ef1d8d48e364dce689342adfdc475aa53f4808
Tejun, what are your thoughts? This is a VT6420 device.
If there are no additional bindings and consequencies - and they are (remeber - I was unable to revert that commit against2.6.18!) I made that patch by hand, it definitely does apply. It was made from a kernel which is just-about-2.6.20-rc1. Created attachment 9860 [details]
dmesg of 2.6.18-gentoo-r4 after unrolling of the problematic commit
So, I canconfirm. It works well. Created attachment 9884 [details]
if vt6420, don't use SCR at all during detection
Does this patch fix your problem? This is against v2.6.19.
Created attachment 9893 [details]
Modified Tejun's patch
Modified Tejun's patch - this helps, the oroginal one does NOT solve the
problem
Tejun, even your patch works, if and only if the line: .freeze = ata_bmdma_freeze, is NOT present. When commented out, the system boots. With this line it is frozen during VIA-SATA initializing as above mentioned. And, without the '.freeze' line, the original driver works as well, without any additional patching. Trouble is, I do not know the possible consequencies (= is it possible just remove the mentioned line without any disaster is future?) Created attachment 9903 [details]
Set ATA_NIEN and reset ATA_NIEN patch
Apparently, my system does not like, when ATA_NIEN bit is set and kept ON for
too long time. My system boots with the attached patch of libata-sff.c, when I
tried to reset ATA_NIEN bit immediately after ata_chk_status(ap) operation.
Just another info: libata uses IRQ 18 and this interrupt is not shared with
anything else. It does the same in non-MSI/MSI-X and in MSI/MSI-X mode, it has
no influence.
So, I used the following patches; it seems so far, so good: Patch for 2.6.18-gentoo-r4: ---------- snip ------------------------ diff -u libata-bmdma.c.orig libata-bmdma.c --- libata-bmdma.c,orig 2006-12-21 08:53:43.000000000 +0100 +++ libata-bmdma.c 2006-12-21 09:02:25.000000000 +0100 @@ -671,6 +671,8 @@ writeb(ap->ctl, (void __iomem *)ioaddr->ctl_addr); else outb(ap->ctl, ioaddr->ctl_addr); + ata_wait_idle(ap); + ata_irq_on(ap); } /** --------- snip ------------------------- Patch for linux-git-2.6.20-rc1 --------- snip ------------------------- diff -u libata-sff.c.orig libata-sff.c --- libata-sff.c.orig 2006-12-21 08:41:52.000000000 +0100 +++ libata-sff.c 2006-12-21 08:42:07.000000000 +0100 @@ -706,7 +706,7 @@ * previously pending IRQ on ATA_NIEN assertion. Clear it. */ ata_chk_status(ap); - + ata_irq_on(ap); ap->ops->irq_clear(ap); } -------- snip -------------------------- My system was used heavily in last 24 hours, I regularly tested the filesystem consistency (no errors so far). But I still have the same doubt: is it safe to use this modification (from the long term point of view?). Kernel 2.6.18-gentoo-r6: identical situation: it does not work as distributed, works well after patching as in #28. OIC. Thanks for finding this out. It's very surprising tho. It may have something to do with detection failures we're seeing in other controllers too. I'll investigate it more. For the time being, your change is not gonna destroy your data, so don't worry. I applied the "ata_irq_on(ap);" patch to the 0.9.20-rc1 GIT but it didn't work for me, still stuck at the same place, gave the same errors and eventually booted without detecting the drive :( Created attachment 9957 [details]
turn on irq on both devices
Can you please test whether this patch fixes via detection problem? It's
against 2.6.19.
It does not work, Tejun. Sorry for bad news. I applied the patch from comment #32 against 2.6.19.1 vanilla and it works for me. But the patch in comment #28 against 2.6.18 vanilla and debian sources did not work. No self compiled (unpatched) kernel since 2.6.18 worked, although the debian supplied kernel image 2.6.18-1-k7 does work. I have installed the sata disc just recently, so I don't know if older kernel would work. My mainboard is an Asus A7V600-X with the following VIA SATA Controller (according to lspci): 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 00:0f.0 0104: 1106:3149 (rev 80) acb, please report the result of 'dmesg' of successful and failed detection. Thanks. Fixed in 2.6.20. Closing. Please test 2.6.20.1 and reopen if it's still broken. |