Bug 11879 - Improper ATA cable detection after S2R _only_: "limited to UDMA/33 due to 40-wire cable"
Summary: Improper ATA cable detection after S2R _only_: "limited to UDMA/33 due to 40...
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Tejun Heo
URL:
Keywords:
: 12202 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-10-28 13:25 UTC by Andreas Mohr
Modified: 2012-05-22 15:07 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.27
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci -nnvvvxxx on Acer A110L (47.30 KB, text/plain)
2008-10-29 01:04 UTC, Andreas Mohr
Details
hdparm --Istdout /dev/sda on Acer A110L (1.26 KB, text/plain)
2008-10-29 01:09 UTC, Andreas Mohr
Details
ata_piix-maintain-iocfg-over-suspend.patch (1.62 KB, patch)
2008-11-09 23:28 UTC, Tejun Heo
Details | Diff
dmes after resume, containing IOCFG debug logs (55.95 KB, text/plain)
2008-11-10 07:17 UTC, Andreas Mohr
Details
dmesg of 2.6.28-rc9 with disabled pata_acpi module, after successful resume (UDMA/66) (76.72 KB, text/plain)
2008-12-24 00:43 UTC, Andreas Mohr
Details
decoded DSDT of BIOS v0.3109 (non-latest!) (217.71 KB, text/plain)
2008-12-24 00:54 UTC, Andreas Mohr
Details
ata_piix-save-iocfg (4.55 KB, patch)
2008-12-28 23:42 UTC, Tejun Heo
Details | Diff
2.6.28 dmesg of working ata_piix-save-iocfg patch after two resumes (85.01 KB, text/plain)
2008-12-31 01:36 UTC, Andreas Mohr
Details

Description Andreas Mohr 2008-10-28 13:25:08 UTC
Latest working kernel version:
Earliest failing kernel version:
Distribution: Ubuntu 8.10rc
Hardware Environment: Acer One A110L (Intel 945GSE / ICH7-M), SSD (Samsung P-SSD1800), BIOS v0.3109
Software Environment: Ubuntu 8.10rc, Linux 2.6.27-7-generic
Problem Description:

Bootup machine, no problem with ATA interfacing, full speed connection (leading to
~40MB/s with hdparm):

[    5.631156] ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0x60a0 irq 14
[    5.631168] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x60a8 irq 15
[    5.852225] usb 5-5: new high speed USB device using ehci_hcd and address 2
[    5.961673] ata2.00: ATA-5: P-SSD1800, Ver2.Y0C, max UDMA/66
[    5.961683] ata2.00: 15761088 sectors, multi 0: LBA 
[    5.969597] ata2.00: configured for UDMA/66
[    5.969915] scsi 1:0:0:0: Direct-Access     ATA      P-SSD1800        Ver2 PQ: 0 ANSI: 5


Suspend to RAM, then resume, witness:
[  968.980271] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out
[  968.980278] ata2.00: ACPI cmd ef/03:42:00:00:00:a0 filtered out
[  968.980315] ata2.00: limited to UDMA/33 due to 40-wire cable
[  968.989637] ata2.00: configured for UDMA/33
[  968.989781] sd 1:0:0:0: [sda] 15761088 512-byte hardware sectors (8070 MB)
[  968.989828] sd 1:0:0:0: [sda] Write Protect is off
[  968.989834] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[  968.989913] sd 1:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

and a severe hdparm downgrade of about 25MB/s (note that I tried about 5 cycles of boot and resume, always reliably resulting in proper and improper cable detection each).
(IOW, yet another area where Linux seems to lose massive heaps of speed lately - c.f. absolutely shocking Phoronix Ubuntu distro version testing results)
And with a lame SSD (low write performance), obviously this hurts plenty.

Driver is ata_piix, register PIIX_IOCFG (0x54, responsible for cable detection) is _identical_ (0x00) both after boot and after resume, in fact all PCI registers are identical in both cases:

00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 02)
00: 86 80 c4 27 05 00 b0 02 02 80 01 01 00 00 00 00
10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
20: a1 60 00 00 00 00 00 00 00 00 00 00 25 10 5b 01
30: 00 00 00 00 70 00 00 00 00 00 00 00 0b 02 00 00
40: 4c 80 07 e3 00 00 00 00 04 00 00 02 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 01 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00
80: 05 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 02 00 05 0c 80 01 80 da 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 0d 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 86 0f 02 00 00 00 00 00

Possibly the
[  968.980271] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out
[  968.980278] ata2.00: ACPI cmd ef/03:42:00:00:00:a0 filtered out
may be to blame here, since I'm quite sure this is a message about ACPI IDE setup resume method (Tejun Hejo elaborated a bugfix - #9530 - for another box of mine earlier in this area).

# lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GME Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 02)
03:00.0 Ethernet controller: Atheros Communications Inc. AR242x 802.11abg Wireless PCI Express Adapter (rev 01)
04:00.0 System peripheral: JMicron Technologies, Inc. SD/MMC Host Controller
04:00.2 SD Host controller: JMicron Technologies, Inc. Standard SD Host Controller
04:00.3 System peripheral: JMicron Technologies, Inc. MS Host Controller
04:00.4 System peripheral: JMicron Technologies, Inc. xD Host Controller

Cross-verified on Inspiron 8000 (i815, 2.6.27-rc9, ata_piix), no such issue after resume there, cabling always detected properly.

OK, I could easily add an Acer One A110L entry to the ich_laptop quirk list,
but somehow hardcoding this doesn't sound at all like being an acceptable solution (note that this list is full of Acer entries though!! Possibly due to BIOS resume quirks??).

Any thoughts?

Thanks,

Andreas Mohr
Comment 1 Tejun Heo 2008-10-28 19:01:02 UTC
Can you post the result of "lspci -nnvvvxxx" and "hdparm --Istdout /dev/sda"?
Comment 2 Andreas Mohr 2008-10-29 01:04:21 UTC
Created attachment 18490 [details]
lspci -nnvvvxxx on Acer A110L
Comment 3 Andreas Mohr 2008-10-29 01:09:32 UTC
Created attachment 18491 [details]
hdparm --Istdout /dev/sda on Acer A110L

Side note: BIOS on A110L does not operate in AHCI mode, possibly relevant.

Thanks!
Comment 4 Andreas Mohr 2008-10-29 06:09:50 UTC
SORRY, missed one very important part:

 diff -uN hdparm_Istdout_after_boot.log hdparm_Istdout_after_resume.log 
--- hdparm_Istdout_after_boot.log	2008-10-29 13:12:05.000000000 +0100
+++ hdparm_Istdout_after_resume.log	2008-10-29 09:00:17.000000000 +0100
@@ -11,7 +11,7 @@
 0003 0078 0078 0078 0078 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
 0030 0000 7008 5004 4000 7009 1004 4000
-101f 0000 0000 0000 0000 0000 0000 0000
+041f 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
@@ -20,7 +20,7 @@
 0000 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
-1000 0000 0000 0012 0000 0000 0000 0000
+1000 0000 0000 0092 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000
 0000 0000 0000 0000 0000 0000 0000 0000

Dito for diff of the SATA controller:

# diff -uN ls_after_boot.log ls_after_resume.log 
--- ls_after_boot.log	2008-10-29 14:07:54.000000000 +0100
+++ ls_after_resume.log	2008-10-29 14:08:12.000000000 +0100
@@ -18,12 +18,12 @@
 10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
 20: a1 60 00 00 00 00 00 00 00 00 00 00 25 10 5b 01
 30: 00 00 00 00 70 00 00 00 00 00 00 00 0b 02 00 00
-40: 00 80 07 e3 00 00 00 00 04 00 00 02 00 00 00 00
-50: 00 00 00 00 74 00 00 00 00 00 00 00 00 00 00 00
+40: 4c 80 07 e3 00 00 00 00 04 00 00 02 00 00 00 00
+50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 70: 01 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00
 80: 05 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-90: 02 00 05 00 80 01 80 da 00 00 00 00 00 00 00 00
+90: 02 00 05 0c 80 01 80 da 00 00 00 00 00 00 00 00
 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c0: 00 00 00 00 0d 00 00 00 00 00 00 00 00 00 00 00
Comment 5 Tejun Heo 2008-11-09 23:28:05 UTC
Created attachment 18770 [details]
ata_piix-maintain-iocfg-over-suspend.patch

Thanks for the nice reporting.  Does the attached patch fix the problem?  Thanks.
Comment 6 Andreas Mohr 2008-11-10 05:15:10 UTC
NOPE, unfortunately it didn't (patch _plus_ -rc4 as compared to 2.6.27 only).
Analyzing this was hampered by a 28ish resume issue anyway (no backlight, had to
write a script to get a dmesg log saved).

Note that this is a newly built -rc4 _with_ the patch actually successfully patched and used (dmesg log does show -rc4), I didn't make any mistake here.
I didn't record the SATA PCI register dump at that time to actually check for successful IOCFG restoration, though (will do).
Oh, and that's actually important, since:
- if the value _has_ been written and remains visible, that probably means that our action is incomplete
- if the value is NULL despite probably having written it, then something somewhere possibly does a hard-reset of the entire controller??

[  123.925163] sd 1:0:0:0: [sda] Starting disk
[  123.929706] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out
[  123.929718] ata2.00: ACPI cmd ef/03:42:00:00:00:a0 filtered out
[  123.929801] ata2.00: limited to UDMA/33 due to 40-wire cable
[  123.936671] ata2.00: configured for UDMA/33
[  123.937630] sd 1:0:0:0: [sda] 15761088 512-byte hardware sectors: (8.06 GB/7.51 GiB)
[  123.937708] sd 1:0:0:0: [sda] Write Protect is off
[  123.937718] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[  123.937838] sd 1:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  124.048163] usb 5-5: reset high speed USB device using ehci_hcd and address 2
[  124.336257] PM: resume devices took 0.616 seconds
[  124.340976] PM: Finishing wakeup.
[  124.340987] Restarting tasks ... done.
[  127.017124] ACPI: EC: non-query interrupt received, switching to interrupt mode

Thanks a lot!
Comment 7 Andreas Mohr 2008-11-10 05:29:34 UTC
Hmm, somehow this dump looks like nothing happened (absolutely identical to the dump further above):

--- lspci_ata_pre.log	2008-11-10 14:21:36.000000000 +0100
+++ lspci_ata_post.log	2008-11-10 14:22:01.000000000 +0100
@@ -3,12 +3,12 @@
 10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
 20: a1 60 00 00 00 00 00 00 00 00 00 00 25 10 5b 01
 30: 00 00 00 00 70 00 00 00 00 00 00 00 0b 02 00 00
-40: 00 80 07 e3 00 00 00 00 04 00 00 02 00 00 00 00
-50: 00 00 00 00 74 00 00 00 00 00 00 00 00 00 00 00
+40: 4c 80 07 e3 00 00 00 00 04 00 00 02 00 00 00 00
+50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 70: 01 00 02 40 00 00 00 00 00 00 00 00 00 00 00 00
 80: 05 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-90: 02 00 05 00 80 01 80 da 00 00 00 00 00 00 00 00
+90: 02 00 05 0c 80 01 80 da 00 00 00 00 00 00 00 00
 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c0: 00 00 00 00 0d 00 00 00 00 00 00 00 00 00 00 00


Yet the tree used for the build _does_ have the file patched, thus it was active and something else must be going on...
Comment 8 Tejun Heo 2008-11-10 05:29:43 UTC
Well, pci (rightfully) restores only the standard configuration area, so unless the BIOS reprograms the controller during resume, the low level driver is responsible for the rest.  Maybe ata_piix needs to restore other registers too but till now it worked fine, so I just added iocfg.  Can you please add printk()'s on resume path so that it prints out the value it's writing and take register dump after resume?

Thanks.
Comment 9 Andreas Mohr 2008-11-10 07:17:30 UTC
Created attachment 18780 [details]
dmes after resume, containing IOCFG debug logs

OK, even did that now.

# grep -i iocfg /root/dmesg_ata.log 
[    8.629609] checking IOCFG: 0x74
[   43.172382] saved_iocfg now: 0x00000074
[   44.488243] writing saved_iocfg: 0x00000074
[   44.488253] IOCFG register is now: 0x00000074
[   44.488287] IOCFG register is now: 0x00000074
[   45.609782] checking IOCFG: 0x04

The "checking IOCFG" thing is printed right nearby the cable check code.
As we can see, something is very, very wrong, since we write 0x74 properly, then I/O-read it (correct) and directly I/O-read it _again_ (correct) and later during resume's cable detection it's broken.
Who the ... is messing with our values???
So, we're definitely not dealing with a write-once register here; rather, something is actively messing it up, either by reprogramming or by triggering
some kind of re-_detection_ for this register value.
Comment 10 Tejun Heo 2008-11-10 18:40:33 UTC
Ah... strange.  Nothing else touches that register in ata_piix driver.  What's the value of iocfg before restoring saved_iocfg?  Is it all zero?  Maybe ata_piix should just use the iocfg value stored at the probing time.
Comment 11 Andreas Mohr 2008-11-25 12:12:43 UTC
OK, now that broken resume is debugged (bug #12100) one would be able to pursue this further. Do you believe pata_acpi.ko activity might be fumbling with this stuff? I cannot easily disable it (initrd!), but I'll eventually test it. Plus, I'm thinking of WARN_ONing on all PCI offset 0x54 accesses...
Comment 12 Tejun Heo 2008-11-25 19:51:28 UTC
pata_acpi could definitely throw a ranch into all this and must NOT be enabled unless you know what's going on.  Excluding a driver from initrd shouldn't be too difficult.  You can usually drop the module itself or add a blacklist entry to modules.conf inside the initrd.
Comment 13 Andreas Mohr 2008-12-24 00:43:55 UTC
Created attachment 19468 [details]
dmesg of 2.6.28-rc9 with disabled pata_acpi module, after successful resume (UDMA/66)
Comment 14 Andreas Mohr 2008-12-24 00:54:46 UTC
Created attachment 19469 [details]
decoded DSDT of BIOS v0.3109 (non-latest!)

OK, disabling pata_acpi resolves the issue, however that's hardly feasible for distro installs, and I wouldn't want to do this either here.
What I found is that the DSDT obviously _can_ zero IOCFG (and - judging from my success now with disabled pata_acpi - it does ;-), in Device SATA (member SDT3 of Field SACS). My SATA-switched controller is bus ID 1f.2, thus _ADR 001F0002 is the one to watch...
What is interesting is that fumbling of SDT3 happens in _STM of Device SECD (SECondary Disk?) _only_, _not_ in _STM of PRID!!!
IOW, are we calling an ACPI method that we shouldn't be calling for non-existent physical drives?? (should _STM for the secondary disk be skipped?)

Or... is simply this somewhat oldish BIOS version buggy? I'd be willing to update it if you'd like me to do that... (alternatively ask some other known owners of A110L/A150L with newer versions flashed).

Thanks a lot!
Comment 15 Tejun Heo 2008-12-24 01:01:27 UTC
Well, the same controller is being drive by two separate drivers.  Anything can happen.  For now, the only solution is to disable pata_acpi and the distro should either disable it from kernel configuration or black list it such that it's never loaded automatically.  Please report it to the specific distro.

Hmmm... maybe we should drop the wildcard PCI device match from that driver.
Comment 16 Tejun Heo 2008-12-24 01:09:07 UTC
Dang, I got confused.  You meant libata.acpi not the driver pata_acpi, right?  So, _STM clears the bit, alright.  I'll look into it.
Comment 17 Andreas Mohr 2008-12-24 01:43:15 UTC
Argh, me too (it's too confusing):

libata-$(CONFIG_ATA_ACPI)       += libata-acpi.o
is disabled now, yeah.

Well, to be exact, I removed _both_ settings in the last kernel rebuild:

-# CONFIG_ACPI_DEBUG is not set
+CONFIG_ACPI_DEBUG=y
+# CONFIG_ACPI_DEBUG_FUNC_TRACE is not set
 
-CONFIG_ATA_ACPI=y
+# CONFIG_ATA_ACPI is not set

 -CONFIG_PATA_ACPI=m


Hmm, so yeah, I'm wondering which driver was doing stuff here:
we have libata-acpi.c for Linux controller driver specific ACPI method handling,
or pata_acpi.c for ACPI-_only_ handling of the _entire_ controller operation.

Should I re-enable one of those two settings to figure out who exactly is doing IOCFG fiddling here?

BTW, the old CONFIG settings above were the distro default settings! (Ubuntu 8.10)


General observation: dang, you're too fast ;)
(boring Christmas winter morning, eh? ;)
Comment 18 Tejun Heo 2008-12-24 02:18:12 UTC
pata_acpi is harmless.  You don't have to turn it off.  Also, you don't need to turn off libata_acpi, it can be disabled with module parameter libata.noacpi=1.  Ubuntu needs to have its initrd regenerated to pass in the parameter tho.

Heh... I'm on GMT+0900 which makes now the evening of Christmas eve.  Maybe I should go out and try to hunt the deers down instead of sitting in front of computer.  Merry Christmas.
Comment 19 Andreas Mohr 2008-12-24 11:49:44 UTC
OK, I think I know where things need to be tweaked:

We probably need to modify ata_acpi_stm() to _not_ fill in drive ID blocks for non-existent drives! (I didn't see this suspicion mentioned explicitly in the spec yet, though)

3.0a sez:

. The IDE driver will call the _STM control method passing in transfer timing settings for the
channel, as well as the ATA drive ID block for each drive on the channel. The _STM control
method will configure the IDE channel based on this information.


Arguments:
Arg0
Arg1
Arg2
Buffer Channel timing information (formatted as described in Table 9-6)
Buffer ATA drive IDE block for drive 0
Buffer ATA drive IDE block for drive 1


My theory is that we should leave buffer.pointer _and_ buffer.length zeroed for non-existent drives, that's probably why my DSDT does:

                            If (LEqual (SizeOf (Arg2), 0x0200))  <==============
                            {
                                And (SECT, 0x3F0F, SECT)
                                Store (Zero, SSIT)
                                And (SYNC, 0x07, SYNC)
                                Store (Zero, SDT3)            <===============
 
...which contains an additional SizeOf() check which according to 17.5.108 (p.514) has distinct behaviour for each argument type - for a buffer (I take it this is ACPI_TYPE_BUFFER) it is said to return size in bytes of the data, IOW most likely the buffer.length value.

Thus I think we need to at least zero the length indicator, not sure about buffer pointer though.
And input.count most likely needs to remain as is (== 3) regardless of the number of connected drives, since spec above said that Arg1 _always_ is block for drive 0 and Arg2 _always_ is block for drive 1, thus there's no shifting of arguments.

To repeat, ata_acpi_stm()

        in_params[1].type = ACPI_TYPE_BUFFER;
        in_params[1].buffer.length = 512;
        in_params[1].buffer.pointer = (u8 *)ap->link.device[0].id;
 
probably isn't detailed enough.
Comment 20 Andreas Mohr 2008-12-25 00:16:16 UTC
Hohumm:

linux-2.6.28-rc9/drivers/ata# diff -uN libata-acpi.c.org libata-acpi.c
--- libata-acpi.c.org	2008-12-24 23:43:26.000000000 +0100
+++ libata-acpi.c	2008-12-24 23:56:09.000000000 +0100
@@ -403,17 +403,29 @@
 	struct ata_acpi_gtm		stm_buf = *stm;
 	struct acpi_object_list         input;
 	union acpi_object               in_params[3];
+	int i;
 
 	in_params[0].type = ACPI_TYPE_BUFFER;
 	in_params[0].buffer.length = sizeof(struct ata_acpi_gtm);
 	in_params[0].buffer.pointer = (u8 *)&stm_buf;
 	/* Buffers for id may need byteswapping ? */
-	in_params[1].type = ACPI_TYPE_BUFFER;
-	in_params[1].buffer.length = 512;
-	in_params[1].buffer.pointer = (u8 *)ap->link.device[0].id;
-	in_params[2].type = ACPI_TYPE_BUFFER;
-	in_params[2].buffer.length = 512;
-	in_params[2].buffer.pointer = (u8 *)ap->link.device[1].id;
+	for (i = 1; i <= 2; ++i) {
+		struct ata_device *dev = &ap->link.device[i-1];
+		in_params[i].type = ACPI_TYPE_BUFFER;
+		if (ata_dev_enabled(dev)) {
+			ata_dev_printk(dev, KERN_INFO, "_STM: enabled!\n");
+			in_params[i].buffer.length = 512;
+			in_params[i].buffer.pointer = (u8 *)dev->id;
+		} else {
+			/* at least certain DSDT implementations (Acer One)
+			 * rely on proper signalling of non-existent drives -
+			 * they check for length being exactly 512.
+			 */
+			ata_dev_printk(dev, KERN_INFO, "_STM: DISABLED!\n");
+			in_params[i].buffer.length = 0;
+			in_params[i].buffer.pointer = NULL;
+		}
+	}
 
 	input.count = 3;
 	input.pointer = in_params;



[    8.872815] ata2.00: ATA-5: P-SSD1800, Ver2.Y0C, max UDMA/66
[    8.872821] ata2.00: 15761088 sectors, multi 0: LBA 
[    8.884599] ata2.00: configured for UDMA/66
.
.
.
[  127.148264] pci 0000:00:1e.0: restoring config space at offset 0xf (was 0x0, 
writing 0xff)
[  127.148344] pci 0000:00:1e.0: setting latency timer to 64
[  127.164216] ata_piix 0000:00:1f.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[  127.164230] ata_piix 0000:00:1f.2: setting latency timer to 64
[  127.164242] ata_piix 0000:00:1f.2: PIIX_IOCFG5: 0x74
[  127.164279] ata_piix 0000:00:1f.2: PIIX_IOCFG6: 0x74
[  127.164347] ata1.00: _STM: DISABLED!
[  127.164355] ata1.01: _STM: DISABLED!
[  127.164398] ata2.00: _STM: enabled!
[  127.164402] ata2.01: _STM: DISABLED!
.
.
.
[  127.212762] pci 0000:04:00.4: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[  127.368169] sd 1:0:0:0: [sda] Starting disk
[  129.261683] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out
[  129.261689] ata2.00: ACPI cmd ef/03:42:00:00:00:a0 filtered out
[  129.261739] ata_piix 0000:00:1f.2: PIIX_IOCFG1: 0x34
[  129.261746] ata2.00: limited to UDMA/33 due to 40-wire cable
[  129.269619] ata2.00: configured for UDMA/33


ARGH. But IOCFG is _not_ 0x04 as it was in those previous problematic cases, but 0x34.
Oh, and this failure to retain the correct 0x74 value does _not_ mean that my fix is now fully incorrect anyway (the Acer DSDT does contain the 512-byte check after all, so it's probably going into the right direction).
Comment 21 Andreas Mohr 2008-12-26 10:56:47 UTC
OK, now I understand things more clearly:
my setup is SATA controller in SFF IDE (i.e., emulation) mode.
I was wondering why ich_pata_cable_detect() was being used for a device connected via non-PATA connector (ZIF cable). Then I found that piix_init_sata_map() showed the device being accessed via IDE emulation (combined mode):
 ata_piix 0000:00:1f.2: MAP [ P0 P2 IDE IDE ]
which thus does a
 pinfo[i / 2] = piix_port_info[ich_pata_100];
which thus invokes ich_pata_ops.
This is all fine and good I'm sure, but this means that the BIOS has to properly emulate the IDE_CONFIG / PIIX_IOCFG flag bits (the relevant 80-wire bits are a _scratch pad_ in ICH7, most likely meaning that the BIOS is supposed to always track them properly during combined-mode IDE emulation, to retain software compatibility with pure-IDE system driver environments).
And, well... it doesn't, so far. Thus something with _GTM/_STM/_SDD/_GTF invocation is weird and causing the bits to not get re-set properly after resume, I suppose.
Or they're accessing drive state itself somehow (the one "mis-programmed" by our driver, that is ;) and thus deduce wrong 80-wire flags).
I'll have a look at DSDT some more (BTW, SDT3 was _not_ the relevant setting, it's ICR0, ICR1, ... since the relevant offsets are the ones _following_ the "Offset (0x14)" statement).

BTW, using an SSD drive via AHCI instead of SFF is said to provide a 5%-10% transfer performance increase (probably due to NCQ?), thus it would be very nice to have our driver support a brute-force switch to AHCI (thus disabling DSDT method support, obviously), since my BIOS doesn't provide a BIOS-supported AHCI switch on its own, unfortunately. How much work would it be to extend the driver to offer a Linux-only switch to AHCI? (possibly even online-switchable back and forth?)
Comment 22 Andreas Mohr 2008-12-26 12:14:52 UTC
OK, I think it's simply the missing ATA IDENTIFY buffer byte-swap which is needed to make this work for me:

int ata_acpi_stm(struct ata_port *ap, const struct ata_acpi_gtm *stm)
{
        acpi_status status;
        struct ata_acpi_gtm             stm_buf = *stm;
        struct acpi_object_list         input;
        union acpi_object               in_params[3];

        in_params[0].type = ACPI_TYPE_BUFFER;
        in_params[0].buffer.length = sizeof(struct ata_acpi_gtm);
        in_params[0].buffer.pointer = (u8 *)&stm_buf;
        /* Buffers for id may need byteswapping ? */   <===============
        in_params[1].type = ACPI_TYPE_BUFFER;
        in_params[1].buffer.length = 512;
        in_params[1].buffer.pointer = (u8 *)ap->link.device[0].id;
 



        ATA_ID_HW_CONFIG        = 93,

static inline int ata_drive_40wire_relaxed(const u16 *dev_id)
{
        if ((dev_id[ATA_ID_HW_CONFIG] & 0x2000) == 0x2000)
                return 0;       /* 80 wire */
        return 1;
}



                    Method (_STM, 3, NotSerialized)
 
                                CreateWordField (Arg1, 0xBA, W930)

                                                                     ^^^^ oh BTW that one is 93, times two... ==> 186 == 0xBA

                                    If (And (W930, 0x2000))
                                    {
                                        Or (ICR1, 0x04, ICR1)
                                    }

And guess what ICR1 bit 0x04 would then be after this? The missing PIIX_IOCFG 0x34 --> 0x74 "upgrade" to achive 80-wire detection ;)

Verifying this slightly-more-than-mere-theory-I-guess will take a slight bit more time though... (will upgrade to .28 proper with this change plus logging of IDENTIFY buffer content, and that should make it work :)
Remains to be verified whether the 0x0200 buffer length change (see further above) really is needed, too...
Comment 23 Andreas Mohr 2008-12-26 13:52:47 UTC
Hmm:


# sg_sat_identify -v /dev/sda
    ATA pass through (16) cdb: 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 
Response for IDENTIFY DEVICE ATA command:
 00     044a 3d14 0000 0010 7e00 0200 003f 00f0     .J =. .. .. ~. .. .? .. 
 08     7ec0 0000 3032 3038 3237 3530 3331 3932     ~. .. 02 08 27 50 31 92 
 10     3020 2020 2020 2020 0002 0002 0004 5665     0           .. .. .. Ve 
 18     7232 2e59 3043 502d 5353 4431 3830 3020     r2 .Y 0C P- SS D1 80 0  
 20     2020 2020 2020 2020 2020 2020 2020 2020                             
 28     2020 2020 2020 2020 2020 2020 2020 8001                          .. 
 30     0000 2b00 4000 0200 0000 0007 3d14 0010     .. +. @. .. .. .. =. .. 
 38     003f 7ec0 00f0 0100 7ec0 00f0 0000 0007     .? ~. .. .. ~. .. .. .. 
 40     0003 0078 0078 0078 0078 0000 0000 0000     .. .x .x .x .x .. .. .. 
 48     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 50     0030 0000 7008 5004 4000 7009 1004 4000     .0 .. p. P. @. p. .. @. 
 58     041f 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 60     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 68     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 70     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 78     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 80     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 88     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 90     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 98     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 a0     1000 0000 0000 0092 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 a8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 b0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 b8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 c0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 c8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 d0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 d8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 e0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 e8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 f0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 f8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 


And at address 0xBA there is... nothing.

Currently rebuilding, let's see how it actually acts then. Possibly there's some IDENTIFY flag which says "hey, I'm not an IDE device, my HW_CONFIG is not valid" and which the DSDT code acts upon, but _only_ if the IDENTIFY buffer is swapped properly ;)
[day-dreaming]
Comment 24 Andreas Mohr 2008-12-27 09:25:08 UTC
I'm about to give up now (the search for a "proper", "politically correct" solution, that is).

Status quo is:
the Samsung P-SSD1800 is a ZIF drive, ZIF drives are PATA and _not_ SATA, ZIF cables definitely seem to qualify for 80-wire operation according to internet research (plus, the drive-side speed qualification counts as well, thus qualifying ZIF as UDMA66+ is ok I think).
Calling _GTM/_STM ACPI methods won't set IDE_CONFIG (PIIX_IOCFG) properly since P-SSD1800's IDENTIFY word 93 is zero (zero word 93 is commonly taken to mean SATA!), and thus a DSDT condition which would allow setting IDE_CONFIG properly fails.
IOW, we have a BIOS that has _GTM/_STM handling for usual old-style IDE drives and this fails since we do have a PATA (ZIF is a short 40-wire cable which qualifies for 80-wire operation) drive which is _not_ PATA compliant.
I shortly thought that maybe we're supposed to call _SDD only instead of _GTM/_STM, but the drive _is_ a PATA drive, driven in _IDE_ SATA mode!

Conclusion: better add a quirk to the ich_laptop list for the A110L to generically allow 80-wire operation for this ZIF connector and that's it. There cannot be done much else since it's a border-case (ZIF socket with a border-compliant PATA device) which should be treated accordingly, i.e. via quirk entry.

CC'd jgarzik since he messed with T13 stuff :)
Comment 25 Tejun Heo 2008-12-28 23:06:42 UTC
Yes, it seems ata_piix will need to remember IOCFG over _STM/_GTM.  Will prep a patch.
Comment 26 Tejun Heo 2008-12-28 23:42:28 UTC
Created attachment 19517 [details]
ata_piix-save-iocfg

Can you please verify the attached patch fixes the problem?
Comment 27 Andreas Mohr 2008-12-31 01:36:48 UTC
Created attachment 19563 [details]
2.6.28 dmesg of working ata_piix-save-iocfg patch after two resumes

Doh, right, that's of course a better way to go about it, simply maintain IOCFG register over the entire _STM cycle, better than quirk-listing single machines.
The only thing I'm feeling a bit uneasy about is that we don't report any IOCFG changes done by the BIOS, maybe we should report weird changes, but with the current solution adding this probably is a bit difficult.

Two resumes successfully maintain UDMA/66 configuration now.

# hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   1166 MB in  2.00 seconds = 583.39 MB/sec
 Timing buffered disk reads:  116 MB in  3.04 seconds =  38.18 MB/sec


Thanks a lot!

Some notes:
- kzalloc() really hates being such a lone child in this cold and cruel world
- BIOS comment doesn't reference "bug #11879", probably difficult to locate the origin of this work
- it's actually not a BIOS issue since the BIOS seems to correctly wire things as required for PATA disks, it's the SSD which is a _PATA_ disk yet doesn't support the pretty much mandatory non-zero word 93 (or maybe SSD does provide correct info but there's a _non-transparent_ bridge in the way).
Comment 28 Tejun Heo 2009-01-01 18:46:36 UTC
Happy new year!

(In reply to comment #27)
> Two resumes successfully maintain UDMA/66 configuration now.

Great.

> Some notes:
> - kzalloc() really hates being such a lone child in this cold and cruel world

Which one are you talking about?  If you're talking about devm_kzalloc(), it's a managed allocation which will automatically freed on driver detach.

> - BIOS comment doesn't reference "bug #11879", probably difficult to locate
> the
> origin of this work

I'll add it.

> - it's actually not a BIOS issue since the BIOS seems to correctly wire
> things
> as required for PATA disks, it's the SSD which is a _PATA_ disk yet doesn't
> support the pretty much mandatory non-zero word 93 (or maybe SSD does provide
> correct info but there's a _non-transparent_ bridge in the way).

The bit represents cable type seen by the controller.  The driver is responsible for combining the controller and drive side detection results and determine the cable type.  I think it's just quirky ACPI implementation.

Anyways, I'll forward the patch.  Thanks.
Comment 29 Tejun Heo 2009-01-01 19:19:22 UTC
*** Bug 12202 has been marked as a duplicate of this bug. ***
Comment 30 Andreas Mohr 2009-01-02 00:41:17 UTC
> Which one are you talking about?  If you're talking about devm_kzalloc(),
> it's
> a managed allocation which will automatically freed on driver detach.

Darn, suspected so, thanks.

About the non-announced duplicate case, two advantages I just figured out:
+ fully duplicate processing means duplicate and possibly entirely _different_, _better_ findings when trying to figure out the cause ("survival of the fittest")
+ no "duplicate? nice, The Other Guy will surely fix it for me" syndrome

Here again, thanks a lot for your hard work!
Comment 31 Tejun Heo 2009-01-02 00:53:07 UTC
I wrote on the mailing list but the late duplicate marking was mainly because I was traveling and lazy during the holiday season.  I usually collect dups as soon as they're identified.  Anyways, happy new year.  :-)

Note You need to log in before you can comment on or make changes to this bug.