Bug 9533 - 2.6.24-rc4: some ahci/acpi interaction causes delays during boot
2.6.24-rc4: some ahci/acpi interaction causes delays during boot
Status: CLOSED CODE_FIX
Product: ACPI
Classification: Unclassified
Component: Other
All Linux
: P1 normal
Assigned To: Tejun Heo
:
Depends on:
Blocks: 9243
  Show dependency treegraph
 
Reported: 2007-12-10 05:55 UTC by PaX Team
Modified: 2007-12-20 16:02 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.24-rc4
Tree: Mainline
Regression: Yes


Attachments
bug9320-dbg4.patch (20.28 KB, patch)
2007-12-12 00:23 UTC, Tejun Heo
Details | Diff

Description PaX Team 2007-12-10 05:55:17 UTC
Most recent kernel where this bug did not occur: didn't test but 2.6.23 worked fine
Distribution: gentoo but doesn't matter as the kernel doesn't get to userland yet ;)
Hardware Environment: hp nc8430
Software Environment:
Problem Description:
booting with 2.6.24-rc4 causes the following errors/messages while initializing sata/ahci:

[   26.494962] ahci 0000:00:1f.2: version 3.0
[   26.494982] ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 17
[   26.495344] ahci 0000:00:1f.2: nr_ports (4) and implemented port map (0x1) don't match, using nr_ports
[   26.495566] ahci 0000:00:1f.2: forcing PORTS_IMPL to 0xf
[   27.498361] ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
[   27.498583] ahci 0000:00:1f.2: flags: 64bit ncq ilck stag pm led clo pmp pio slum part
[   27.498806] PCI: Setting latency timer of device 0000:00:1f.2 to 64
[   27.499023] scsi0 : ahci
[   27.499241] scsi1 : ahci
[   27.499456] scsi2 : ahci
[   27.499667] scsi3 : ahci
[   27.499892] ata1: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705100 irq 219
[   27.500112] ata2: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705180 irq 219
[   27.500332] ata3: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705200 irq 219
[   27.501822] ata4: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705280 irq 219
[   28.137999] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

next comes the first failure:

[   28.140561] ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 failed (Emask=0x1 Stat=0x51 Err=0x04)
[   28.141091] ata1: failed to recover some devices, retrying in 5 secs
[   34.786461] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   34.789004] ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 failed (Emask=0x1 Stat=0x51 Err=0x04)

then the second failure:

[   34.789534] ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5)
[   34.789753] ata1.00: revalidation failed (errno=1)
[   34.789914] ata1: failed to recover some devices, retrying in 5 secs
[   41.434924] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   41.437002] ata1.00: configured for UDMA/100

after this the driver works and boot proceeds fine. so it's not a show stopper for me but something isn't right. also i'm not sure if it's really an ACPI problem, but it was some ACPI command that was last issued/logged so i thought it'd put this under ACPI, feel free to reclassify. if you need any extra info, let me know but i'd like to avoid bisecting if possible.

Steps to reproduce: just compile/boot with 2.6.24-rc4.
Comment 1 Anonymous Emailer 2007-12-10 12:53:17 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 10 Dec 2007 05:55:20 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9533

Another box-killing regression to track, please.  Either ATA or ACPI.

Comment 2 Anonymous Emailer 2007-12-10 12:55:45 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 10 Dec 2007 12:52:43 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 10 Dec 2007 05:55:20 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9533
> 
> Another box-killing regression to track, please.  Either ATA or ACPI.

er, no, not box-killing - just scary warnings.

It's not clear what "kernel doesn't get to userland yet" is referring to - 
something else I guess.

Your desire to avoid doing a bisection search is a good one - I've been
trying to do one for a couple of days on and off and there are so many
fatally buggy bisection points between 2.6.23 and 2.6.24-rc1 that I've
given up on the attempt.

Comment 3 PaX Team 2007-12-10 13:54:11 UTC
(In reply to comment #2)
> Reply-To: akpm@linux-foundation.org
> 
> It's not clear what "kernel doesn't get to userland yet" is referring to - 
> something else I guess.

it just means that the bug/phenomenon occurs during kernel boot/init when there's no userland yet at all, so it doesn't really matter what in userland is, it's clearly a kernel issue.

> Your desire to avoid doing a bisection search is a good one - I've been
> trying to do one for a couple of days on and off and there are so many
> fatally buggy bisection points between 2.6.23 and 2.6.24-rc1 that I've
> given up on the attempt.

for me it's just that i'm in the middle of setting up a fairly complex system here and i need all the time for that, not bisecting something that's not a show-stopper (else i'd have had to debug it already :). if you need to stuff in a printk somewhere or similar things, i'm glad to try.
Comment 4 Chuck Ebbert 2007-12-10 14:02:52 UTC
On 12/10/2007 03:52 PM, Andrew Morton wrote:
> On Mon, 10 Dec 2007 05:55:20 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=9533
> 
> Another box-killing regression to track, please.  Either ATA or ACPI.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Quite possibly the BIOS is buggy and handed a bad taskfile to the driver.

Comment 5 PaX Team 2007-12-10 14:41:13 UTC
(In reply to comment #4)
> Quite possibly the BIOS is buggy and handed a bad taskfile to the driver.

hi Chuck,

did you miss the part where i said that 2.6.23 had worked fine? somehow i doubt the BIOS would know which linux kernel is booting. not to mention that at this phase of booting the BIOS is long out of the picture, afaik.
Comment 6 Andrew Morton 2007-12-10 14:50:05 UTC
Since Chuck replied via email, he didn't see your comment.
Comment 7 Chuck Ebbert 2007-12-10 15:02:30 UTC
libata acpi was enabled by default in 2.6.24-rc1:

6ebe9d8667d3f449350cbc6686cd917ecf1f40dd

And yes the driver will ask ACPI (the BIOS) for taskfiles and send them to the devices.
Comment 8 PaX Team 2007-12-10 15:18:40 UTC
(In reply to comment #7)
> libata acpi was enabled by default in 2.6.24-rc1:
> 
> 6ebe9d8667d3f449350cbc6686cd917ecf1f40dd
> 
> And yes the driver will ask ACPI (the BIOS) for taskfiles and send them to the
> devices.

should libata.noacpi=1 fix it?
Comment 9 PaX Team 2007-12-10 15:41:09 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > libata acpi was enabled by default in 2.6.24-rc1:
> > 
> > 6ebe9d8667d3f449350cbc6686cd917ecf1f40dd
> > 
> > And yes the driver will ask ACPI (the BIOS) for taskfiles and send them to the
> > devices.
> 
> should libata.noacpi=1 fix it?

ok, it fixed my problem (well, apparently i got back the old behaviour which is good enough for me). consider updating Documentation/kernel-parameters.txt because right now it only says:

        libata.noacpi   [LIBATA] Disables use of ACPI in libata suspend/resume
                        when set.
                        Format: <int>

i.e., no mention of boot problems.
Comment 10 Tejun Heo 2007-12-12 00:20:57 UTC
Andrew Morton wrote:
> On Mon, 10 Dec 2007 12:52:43 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>> On Mon, 10 Dec 2007 05:55:20 -0800 (PST)
>> bugme-daemon@bugzilla.kernel.org wrote:
>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=9533
>> Another box-killing regression to track, please.  Either ATA or ACPI.
> 
> er, no, not box-killing - just scary warnings.
> 
> It's not clear what "kernel doesn't get to userland yet" is referring to - 
> something else I guess.
> 
> Your desire to avoid doing a bisection search is a good one - I've been
> trying to do one for a couple of days on and off and there are so many
> fatally buggy bisection points between 2.6.23 and 2.6.24-rc1 that I've
> given up on the attempt.

This one will be fixed by ACPI fixes which are currently being tested.
I just need to get one more test result.  I'll post the patch to 9533 too.

Thanks.

Comment 11 Tejun Heo 2007-12-12 00:23:31 UTC
Created attachment 13996 [details]
bug9320-dbg4.patch

Please apply the attached patch on top of -rc5 and report the result.  Thanks.
Comment 12 PaX Team 2007-12-12 04:24:30 UTC
(In reply to comment #11)
> Created an attachment (id=13996) [details]
> bug9320-dbg4.patch
> 
> Please apply the attached patch on top of -rc5 and report the result.  

thanks, it works now. in case you're interested, the new dmesg i got:

[   25.573366] ahci 0000:00:1f.2: version 3.0
[   25.573387] ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 17
[   25.573751] ahci 0000:00:1f.2: nr_ports (4) and implemented port map (0x1) don't match, using nr_ports
[   25.573972] ahci 0000:00:1f.2: forcing PORTS_IMPL to 0xf
[   26.575477] ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
[   26.575697] ahci 0000:00:1f.2: flags: 64bit ncq ilck stag pm led clo pmp pio slum part
[   26.575920] PCI: Setting latency timer of device 0000:00:1f.2 to 64
[   26.576143] scsi0 : ahci
[   26.576362] scsi1 : ahci
[   26.576575] scsi2 : ahci
[   26.576787] scsi3 : ahci
[   26.577013] ata1: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705100 irq 219
[   26.577234] ata2: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705180 irq 219
[   26.577454] ata3: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705200 irq 219
[   26.577674] ata4: SATA max UDMA/133 abar m1024@0xf4705000 port 0xf4705280 irq 219
[   27.215083] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   27.216757] ata1: XXX evaluating _GTF
[   27.217492] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
[   27.217656] ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 filtered out
[   27.217968] ata1.00: ACPI cmd c6/00:10:00:00:00:a0 succeeded
[   27.218134] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 succeeded
[   27.218991] ata1.00: ATA-7: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, max UDMA/100
[   27.219160] ata1.00: yyyyyyyyyyy sectors, multi 16: LBA48 NCQ (depth 31/32)
[   27.220965] ata1: XXX evaluating _GTF
[   27.221695] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
[   27.221859] ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 filtered out
[   27.222170] ata1.00: ACPI cmd c6/00:10:00:00:00:a0 succeeded
[   27.222336] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 succeeded
[   27.223208] ata1.00: configured for UDMA/100
[   27.225281] ata1.00: configured for UDMA/100
[   27.225443] ata1: EH complete
[   27.542894] ata2: SATA link down (SStatus 0 SControl 0)
[   27.862708] ata3: SATA link down (SStatus 0 SControl 0)
[   28.182523] ata4: SATA link down (SStatus 0 SControl 0)

then it proceeds fine with the disk it found. the corresponding parts from the disassembled DSDT:

        Method (C24E, 3, NotSerialized)
        {
            If (LOr (LEqual (Arg0, Zero), LEqual (Arg0, One)))
            {
                Store (Buffer (0x07)
                    {
                        /* 0000 */    0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0xEF
                    }, Local0)
                Store (Arg1, Index (Local0, One))
            }

            If (LEqual (Arg0, 0x02))
            {
                Store (Buffer (0x07)
                    {
                        /* 0000 */    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xF5
                    }, Local0)
            }

            If (LEqual (Arg0, 0x03))
            {
                Store (Buffer (0x07)
                    {
                        /* 0000 */    0xC1, 0x00, 0x00, 0x00, 0x00, 0x00, 0xB1
                    }, Local0)
            }

            If (LEqual (Arg0, 0x04))
            {
                Store (Buffer (0x07)
                    {
                        /* 0000 */    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xC6
                    }, Local0)
                Store (Arg1, Index (Local0, One))
            }

            If (LEqual (Arg0, 0x05))
            {
                Store (Buffer (0x07)
                    {
                        /* 0000 */    0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0xEF
                    }, Local0)
                Store (Arg1, Index (Local0, Zero))
            }

            Store (Or (0xA0, ShiftLeft (Arg2, 0x04)), Index (Local0,
                0x05))
            Return (Local0)
        }

        Method (C24F, 2, NotSerialized)
        {
            Store (Zero, Local0)
            Store (Zero, Local4)
            While (LLess (Local0, SizeOf (Arg0)))
            {
                If (LNotEqual (DerefOf (Index (Arg0, Local0)), Zero))
                {
                    Store (C24E (Local0, DerefOf (Index (Arg0, Local0)), Arg1), Local2)
                    If (LEqual (Local4, Zero))
                    {
                        Store (Local2, Local1)
                    {
                        Store (Local2, Local1)
                        Store (One, Local4)
                    }
                    Else
                    {
                        Concatenate (Local1, Local2, Local3)
                        Store (Local3, Local1)
                    }
                }

                Increment (Local0)
            }

            Return (Local1)
        }

(i haven't found any references to method C24F)
Comment 13 Tejun Heo 2007-12-13 19:23:41 UTC
Thanks for testing.  I'll forward patches upstream soon.
Comment 14 Rafael J. Wysocki 2007-12-20 16:02:56 UTC
Fixed by:

commit ededa4d396b15c282aa60d6aacddfc07f0142dbf
Merge: 64396ac... 140b5e5...
Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date:   Mon Dec 17 19:29:32 2007 -0800

    Merge branch 'upstream-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ededa4d396b15c282aa60d6aacddfc07f0142dbf

Note You need to log in before you can comment on or make changes to this bug.