Bug 9487

Summary: buggy firmware expects four-byte IRQ resource descriptor - 2.6.21 regression - (was: Serial port disappears after Suspend on Toshiba R25)
Product: ACPI Reporter: Tom Jaeger (ThJaeger)
Component: ACPICA-CoreAssignee: Robert Moore (Robert.Moore)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, Mike.Thomas.Heath, Robert.Moore, shaohua.li, stefan.bader, ThJaeger
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.21-rc4 and newer Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg output
kernel config
dmesg output (latest kernel from git)
the DSDT
Buffer that is passed to acpi_set_current_resources
Buffer that was passed to acpi_set_current_resources before 2.6.21
patch that fixes the issue on my machine
new patch
patch
patch
a modified version
Alternate implementation of ACPICA-core changes
separated parts of the earlier patch independent from discriptor length issues
Linuxized version of the patch in #28
linuxified acpica resource changes b/w 20080123 and 20080321
required linux-side changes
rsparser.c changes
patch included in linux-2.6.25-rc16
PNPACPI: use _CRS IRQ descriptor length for _SRS

Description Tom Jaeger 2007-12-01 18:27:38 UTC
Most recent kernel where this bug did not occur: v2.6.21-rc3 (908e0a8a265fe8057604a9a30aec3f0be7bb5ebb)
Distribution: Ubuntu 7.10
Hardware Environment: Toshiba Satellite R25 Tablet PC (x86)
Software Environment: Linux tthomas 2.6.21-rc3 #13 SMP Sat Dec 1 20:38:13 EST 2007 i686 GNU/Linux
Problem Description:
Recent kernel versions fail to wake up the serial port after both Suspend to RAM and Suspend to Disk, which causes the tablet pen to be unusable.  Trying to access the serial port result in a "ttyS0: LSR safety check engaged!" error message, and operation cannot be restored via "sudo setserial /dev/ttyS0 uart 16550A port 0x0338 irq 4".

Using git bisect, I identified the kernel version where this problem first appeared, the offending patch is:

Author: Michael Karcher <bugzilla-kernel@mkarcher.dialup.fu-berlin.de>  2007-03-08 23:29:29
Committer: Len Brown <len.brown@intel.com>  2007-03-08 23:29:29
Parent: 908e0a8a265fe8057604a9a30aec3f0be7bb5ebb ([PATCH] ecryptfs: nested locking annotation)
Branches: remotes/origin/master, master, bisect
Follows: v2.6.21-rc3
Precedes: v2.6.21-rc4

    ACPI: fix parallel port IRQ after resume from S3
    
    The PNPACPI resource flags were broken.
    This would apply to re-enabling a device any-time after boot,
    not just after resume from S3.
    
    http://bugzilla.kernel.org/show_bug.cgi?id=6316
    
    Acked-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>


I'll attach the output of dmesg and my kernel configuration.  This uses the first kernel version where the bug occurs; if you prefer, I can repeat the process on the latest kernel.

Thank you.
Comment 1 Tom Jaeger 2007-12-01 18:28:37 UTC
Created attachment 13819 [details]
dmesg output
Comment 2 Tom Jaeger 2007-12-01 18:29:10 UTC
Created attachment 13820 [details]
kernel config
Comment 3 Tom Jaeger 2007-12-01 22:08:13 UTC
Created attachment 13821 [details]
dmesg output (latest kernel from git)
Comment 4 Tom Jaeger 2007-12-01 22:16:30 UTC
Created attachment 13822 [details]
the DSDT
Comment 5 Tom Jaeger 2007-12-02 19:00:50 UTC
Not sure if this is helpful, if I uncomment line 109

res->irq_resource[i].flags |= irq_flags(triggering, polarity);

from drivers/pnp/pnpacpi/rsparser.c in the latest kernel, the problem goes away.
Comment 6 Tom Jaeger 2007-12-20 18:08:49 UTC
Another observation:

rsparser.c, pnpacpi_encode_irq, around line 765:

        if (triggering == ACPI_EDGE_SENSITIVE)
                resource->data.irq.sharable = ACPI_EXCLUSIVE;
        else
                resource->data.irq.sharable = ACPI_SHARED;
 
It appears that the IRQ is considered to be shared iff it is level-triggered.  Is this a safe assumption?  Because if I delete the above code and always set resource->data.irq.sharable to be ACPI_SHARED, the serial port wakes up properly.  Thanks, Tom.
Comment 7 Michael Heath 2007-12-20 23:25:56 UTC
Just wanted to confirm this bug on a Toshiba M200 Tablet laptop with a serial-based Wacom tablet which doesn't work after suspend to RAM or suspend to disk.

Tom Jaegers modifications work for me on fixing this. 

Many Ubuntu users have confirmed this issue (https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/152187). 

While Tom's fixes work for tablets on TOshiba laptops, its probably not an ultimately safe or desirable way to fix this issue. Anyone more familiar with the hardware or issue at hand willing to commment?
Comment 8 Tom Jaeger 2007-12-22 20:15:47 UTC
Created attachment 14149 [details]
Buffer that is passed to acpi_set_current_resources
Comment 9 Tom Jaeger 2007-12-22 20:20:52 UTC
Created attachment 14150 [details]
Buffer that was passed to acpi_set_current_resources before 2.6.21

The serial line used to wake up in older kernels, but that was pure coincidence since the relevant part of the buffer was computed from uninitialized data (see bug 6316, referenced above).
Comment 10 Tom Jaeger 2007-12-25 07:46:18 UTC
The AML buffers are more interesting (cf. ACPI spec, p. 203):

Here's what _CRS returns:
47 01 38 03 38 03 01 08 // IO
23 10 00 01 // IRQ: IRQ 4, 3 bytes, High true, edge sensitive, non-shareable
79 8c // END TAG

And here's what the kernel passes to _SRS:
47 01 38 03 3f 03 00 08 // IO
22 10 00 // IRQ: IRQ 4, 2 bytes, hence it defaults to High true, edge sensitive, non-shareable
79 00 00 // END TAG

Apparently the firmware has problems interpreting 2-byte IRQ descriptors.  If I remove the code in drivers/acpi/resources/rsio.c that checks if the third byte is necessary, the following AML buffer is passed instead:

47 01 38 03 3f 03 00 08 23 10 00 01 79 00

and the device resumes properly.

I'm not sure what to do now,  since what the kernel does is correct the problem is buggy firmware.  Just disabling the check could potentially break devices that are not standard-compliant in the opposite direction (I'm not sure how likely this is though).
I think there is something to be said about being as faithful as possible to what _CRS returns, though.  So maybe adding a field u8 use_3_byte_descriptor to acpi_resource_irq that keeps track of whether _CRS returned a 3-byte descriptor even though this wasn't necessary would be a better idea?
Comment 11 Tom Jaeger 2007-12-25 13:04:03 UTC
Created attachment 14185 [details]
patch that fixes the issue on my machine

I took the second route, adding a field preserve_four_byte_descriptor to acpi_resource_irq and acpi_resource_extended_irq that keeps track of whether _CRS returned a four-byte descriptor, in which case such a descriptor is also passed to _SRS.
Sorry about talking about three-byte descriptors instead of four-byte ones in my earlier port, the first byte certainly does belong to the descriptor.

The patch also fixes an obvious bug in  pnpacpi_encode_irq where IRQ resource information is not properly restored if the IRQ is shared.  I didn't change the part where shareable is deduced from triggering which I'm pretty sure is also a bug.

TODO: Choose shorter variable names.
Comment 12 Tom Jaeger 2007-12-25 14:09:00 UTC
I wrote earlier that
> Just disabling the check could potentially break devices that are not 
> standard-compliant in the opposite direction
Make that "*will* break devices".  The Fujitsu Lifebook C1020 from bug 6316 always expects three-byte descriptors if they are allowed.
So it seems that the additional bookkeeping that I did in the patch cannot be avoided.
Comment 13 Tom Jaeger 2007-12-26 15:32:43 UTC
Created attachment 14193 [details]
new patch

* Fixes a number of bugs in the first patch
Comment 14 Tom Jaeger 2007-12-26 16:53:04 UTC
Created attachment 14195 [details]
patch

cosmetic changes
Comment 15 Shaohua 2007-12-27 19:26:05 UTC
Your patch looks reasonable. One comment:
I suggest not adding IORESOURCE_IRQ_DESCRIPTOR_4_BYTES in generic header, as this is quite ACPI specific. You can set resource's descriptor_length in 'pnpacpi_type_resources'.
Comment 16 Len Brown 2007-12-27 23:55:57 UTC
as a patch is attached that is intended to address the problem,
I'm setting this report to RESOLVED, so it will be noticed
for review and testing.  Also setting sub-category to ACPICA-Core
since this proposes a change to the generic ACPICA resource code.
Comment 17 Tom Jaeger 2007-12-30 06:00:49 UTC
Created attachment 14230 [details]
patch

* moved the definition of IORESOURCE_IRQ_DESCRIPTOR_4_BYTES to rsparser.c.

Thanks for your comments.  The only place where IORESOURCE_IRQ_DESCRIPTOR_4_BYTES is used is in rsparser.c, so I moved its definition there.  I don't understand your comment about pnpacpi_type_resources.
Comment 18 Shaohua 2008-01-01 17:59:45 UTC
Created attachment 14257 [details]
a modified version

Tom, how about the slightly modified version?
Moving ORESOURCE_IRQ_DESCRIPTOR_4_BYTES doesn't work, because this bit might be used by generic code later. Actually we don't need the extra bit.
Comment 19 Fu Michael 2008-01-02 00:25:32 UTC
mark bug as "patch available" according to comment# 16.
Comment 20 Tom Jaeger 2008-01-02 05:46:41 UTC
David, thanks, I see, that is much more elegant.  Just tested it and it works perfectly.
Comment 21 Len Brown 2008-01-10 21:03:28 UTC
patch in comment #18 applied to acpi test
Comment 22 Robert Moore 2008-01-11 07:50:43 UTC
I'd almost say to heck with trying to optimize a 3-byte IRQ descriptor to the 2-byte version and always output the 3-byte version. This would actually *remove* some ACPICA code, and I'll bet this is exactly what Windows does.
Comment 23 Tom Jaeger 2008-01-11 11:56:20 UTC
I don't think that is what Windows does since bug 6316 was caused by a device that couldn't deal with an unoptimized IRQ descriptor.
Comment 24 Robert Moore 2008-01-11 13:14:22 UTC
What we seem to be saying here is that there is some requirement that the _SRS buffer be the exact same size as the _CRS buffer, at least on some systems. The ACPI spec only says that the descriptors must be in the same order as listed in the _CRS buffer. I suppose that we might infer from this that the *same* descriptors must appear, therefore we cannot change IRQ to IRQ_NO_FLAGS.

I'll ask Phoenix what they think about this.
Comment 25 Robert Moore 2008-01-11 15:56:46 UTC
From Phoenix:

"The original PNP spec stated that the exact format of _CRS must be matched with the subsequent _SRS"

So, the patch above seems to be moving in the right direction. The ACPI spec could also use a few more words about this.

I think that there may be at least one more descriptor like this (DMA?)
Comment 26 Robert Moore 2008-01-17 15:43:46 UTC
The only other descriptor that has optional fields is the StartDependentFunctions descriptor.
Comment 27 Robert Moore 2008-01-17 16:00:46 UTC
I have an issue with:

if (data.irq.descriptor_length == ACPI_DESCRIPTOR_4_BYTES)

If the field is named "length", I think that it should hold the actual length of the descriptor, not an encoded length. If the length is indeed a flag, the field should be:

if (data.irq.flags & ACPI_DESCRIPTOR_4_BYTES)

Otherwise, the name "length" is misleading and could lead to bugs later.

The options for the StartDependentFunctions are length 0 and length 1, so it may be best to simply use the actual length, otherwise we end up with flags like this:

ACPI_DESCRIPTOR_IRQ_3_BYTES
ACPI_DESCRIPTOR_IRQ_4_BYTES
ACPI_DESCRIPTOR_SDF_0_BYTES
ACPI_DESCRIPTOR_SDF_1 BYTE
Comment 28 Robert Moore 2008-02-07 13:30:00 UTC
Created attachment 14744 [details]
Alternate implementation of ACPICA-core changes


This patch provides a more complete fix to the problem.
1. Handles both IRQ and StartDependentFn descriptors
2. For each descriptor type, handles both types of cases: where the descriptor should not be optimized, and where the descriptor should not be "de-optimized".
3. The DescriptorLength field is an actual length, not a "flags" byte.

Patch does not include Linux changes required, and is not the Linuxized version of the ACPICA code -- therefore, it is for review only at this time.
Comment 29 Tom Jaeger 2008-02-11 21:12:22 UTC
Looks good.  I've just got one question:  If an invalid DescriptorLength is passed, AcpiRsSet* falls back on the old behavior of optimizing the descriptor if possible.  Is there a reason for this?  Because if we fully trust DescriptorLength this shouldn't be necessary and if we don't it's probably safer to only optimize if the other flags allow it.
Comment 30 Robert Moore 2008-02-12 09:17:46 UTC
I was thinking that if a "valid" DescriptorLength is passed (2 or 3), then we use that value. Anything else means "don't care", let the code figure out what is best. This at least makes some attempt to keep the existing behavior without forcing existing resource code above acpica to change.
Comment 31 Tom Jaeger 2008-02-14 18:36:39 UTC
The problem I see concerns StartDependentFn descriptors more than it does IRQ descriptors.  If we take the linux kernel as an example, pnpacpi_build_resource_template in rsparser.c fills the resource template with zeros before populating it.  So in its current form, it would pass a descriptor_length of zero to the ACPICA code, which would mean it could never produce a StartDependentFn resource of length one, not even if the value of CompatibilityPriority or PerformanceRobustness requires this, which is the current behavior.
Comment 32 Tom Jaeger 2008-02-14 19:11:42 UTC
Created attachment 14848 [details]
separated parts of the earlier patch independent from discriptor length issues

The issues were:

* if the ACPI_SHARED flag is set, triggering and polarity are not properly set (pnp_acpi_encode_irq).
* sharable is deduced from triggering instead of using the appropriate IORESOURCE_IRQ_SHAREABLE flag
* in addition to that, pnpacpi_encode_ext_irq writes into resource->data.irq.sharable instead of resource->data.extended_irq.sharable.

These issues are independent from the current bug, as far as I can see.
Comment 33 Tom Jaeger 2008-02-26 00:45:47 UTC
Created attachment 15002 [details]
Linuxized version of the patch in #28

I've adapted the proposed ACPCIA changes to the current linux kernel.  I don't know if this is the way it's supposed to be done or if there's a script for this sort of thing;  anyway, it wasn't much work and it makes it possible to test the proposed changes.  I haven't noticed any problems with the patch and I can confirm that it fixes the original issue.

There was a minor issue in the earlier patch: Somewhere around line 350 in rsio.c,

    {ACPI_RSC_LENGTH,   0, 0, sizeof (AML_RESOURCE_IRQ)},

should be replaced by

    {ACPI_RSC_LENGTH,   0, 0, sizeof (AML_RESOURCE_START_DEPENDENT)},

Also, TotalSize could probably decremented for short START_DPF descriptors as well.
Comment 34 Robert Moore 2008-02-26 06:48:53 UTC
We have a utility called acpisrc that converts the code to Linux format. We then lindent the code and create the patch(es).

I agree with both changes above.
Comment 35 Zhang Rui 2008-03-24 00:31:28 UTC
Bob,
Is the fix in the recent ACPICA release?
Comment 36 Tom Jaeger 2008-03-24 00:36:01 UTC
Yes, it is.
Comment 37 Robert Moore 2008-03-24 09:46:09 UTC
The change was released in ACPICA 20080213:

Fixed a problem where resource descriptor size optimization could cause a problem when a _CRS resource template is passed to a _SRS method. The _SRS resource template must use the same descriptors (with the same size) as returned from _CRS. This change affects the following resource descriptors: IRQ / IRQNoFlags and StartDependendentFn / StartDependentFnNoPri. (BZ 9487)
Comment 38 Len Brown 2008-03-25 19:21:03 UTC
this is technically "RESOLVED" b/c we have a patch that works.
However, we need to:
1. attach the actual pach that made it into ACPICA (linuxized)
2. get that patch into Linux (when this bug will be CLOSED)
Comment 39 Tom Jaeger 2008-03-25 19:23:29 UTC
Created attachment 15439 [details]
linuxified acpica resource changes b/w 20080123 and 20080321
Comment 40 Tom Jaeger 2008-03-25 19:24:08 UTC
Created attachment 15440 [details]
required linux-side changes
Comment 41 Tom Jaeger 2008-03-25 19:34:47 UTC
The patch in comment #39 was created automatically using acpisrc and lindent. #40 contains the required declarations and code that transfers the information from _CRS to _SRS using the method suggested by Shaohua.

Also, the patch in comment #32 contains a few bugfixes that are unrelated to the issue at hand.
Comment 42 Tom Jaeger 2008-04-29 11:34:05 UTC
Created attachment 15978 [details]
rsparser.c changes

It seems that in order for the descriptor length information to be passed to _SRS and in order to avoid regressions related to start_dpf, the above patch is still needed.
Comment 43 Robert Moore 2008-04-29 15:06:07 UTC
Comment 42#:

Please explain in detail why this patch is needed, thanks.
Comment 44 Tom Jaeger 2008-04-29 18:02:25 UTC
Linux doesn't use acpi_resources throughout, but instead stores the information in generic resources (pnpacpi_parse_allocated_*), which are encoded to acpi_resources (pnpacpi_encode_*) right before acpi_set_current_resources is called in pnpacpi_set_resources.  Since we want to avoid keeping track of the descriptor_length by adding a flag to the generic struct resource, Shaohua suggested in comment #15 that the device's resource be bypassed by saving the descriptor_length information directly to the resource template when it is created.  This is what the patch in comment #42 does.

Unfortunately, I don't currently have working hardware to test this on.
Comment 45 Len Brown 2008-04-30 20:35:05 UTC
Created attachment 15993 [details]
patch included in linux-2.6.25-rc16

The attached patch shipped in linux-2.6.25-rc16.

1d5b285da1893b90507b081664ac27f1a8a3dc5b
ACPICA: Fix for resource descriptor optimization issues for _CRS/_SRC

Are we still broken?
Comment 46 Tom Jaeger 2008-05-02 09:32:06 UTC
> Are we still broken?
Yes, I think so, but that is based on looking at the code, not on actual testing since my R25 won't let me boot anymore and is lying disassembled on the floor of my apartment.  I might be able to get someone on the ubuntu bug tracker to test this if you feel that's necessary.
Comment 47 Len Brown 2008-05-13 19:00:08 UTC
yes, it would be great if we could find out if 2.6.25 fixed this
or if we still have this problem.
Comment 48 Bjorn Helgaas 2008-05-23 15:37:42 UTC
Created attachment 16262 [details]
PNPACPI: use _CRS IRQ descriptor length for _SRS

This bug is not fixed in 2.6.25.  I have tested 2.6.26-rc3 + this patch, and it fixes the problem on my Toshiba Portege 4000.  This is an extension of Tom Jaeger's patch in comment #42, which was not quite sufficient.
Comment 49 Len Brown 2008-06-12 22:06:48 UTC
shipped in 2.6.26-rc6 -- closed.

commit e9fe9e188118a0a34c6200d9b10ea6247f53592d
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date:   Mon Jun 9 16:52:04 2008 -0700

    pnpacpi: fix IRQ flag decoding


commit a993273beae8022390e48fe9205480565ad470ab
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date:   Mon Jun 9 16:52:05 2008 -0700

    pnpacpi: fix shareable IRQ encode/decode

commit 36d872a370d3d10e5a7faa9dcacce744260fb13b
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date:   Mon Jun 9 16:52:06 2008 -0700

    PNPACPI: use _CRS IRQ descriptor length for _SRS