Bug 111261 - Lid close event does not generate acpi event - Sony Vaio VGN-SZ7
Summary: Lid close event does not generate acpi event - Sony Vaio VGN-SZ7
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: Platform_x86 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_platform_x86@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-24 23:43 UTC by Yan Pas
Modified: 2016-06-08 02:54 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump (184.01 KB, text/plain)
2016-02-01 12:53 UTC, Yan Pas
Details
dmesglog (894.01 KB, text/plain)
2016-04-05 20:23 UTC, Yan Pas
Details
dmesg.out2 (63.58 KB, application/octet-stream)
2016-04-13 08:24 UTC, Yan Pas
Details
dmesglog.15.04 (76.92 KB, application/octet-stream)
2016-04-15 19:17 UTC, Yan Pas
Details
acpitraces_sonylaptop (48.68 KB, application/zip)
2016-04-21 20:40 UTC, Yan Pas
Details
sony-laptop, fix (unrelated) unload bug (838 bytes, patch)
2016-04-22 14:46 UTC, Mattia Dongili
Details | Diff
sonylaptopmodule_log (67.59 KB, application/octet-stream)
2016-04-23 18:49 UTC, Yan Pas
Details
ec-gpe.log1 (70.04 KB, text/x-log)
2016-04-30 13:48 UTC, Yan Pas
Details

Description Yan Pas 2016-01-24 23:43:08 UTC
My notebook is sony vaio vgn-27w. 

When the lid is closed - acpi does not know about it (acpi_listen). But the kernel reads lid's state perfectly (while true; do sleep 1; cat
/proc/acpi/button/lid/LID0/state ;done).

 $ dmesg |grep -i lid
[    0.757188] input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input1
[    0.758805] ACPI: Lid Switch [LID0]
[    3.376265] input: AlpsPS/2 ALPS GlidePoint as /devices/platform/i8042/serio1/input/input4


Similar thread on forum http://ubuntuforums.org/showthread.php?t=2197872
Comment 1 Yan Pas 2016-01-25 06:11:28 UTC
Sorry, notebook model is Vaio VGN-SZ7 (or another name: PCG-6W4P)
Comment 2 Aaron Lu 2016-02-01 08:35:46 UTC
acpidump please:
# acpidump > acpidump.txt
Comment 3 Yan Pas 2016-02-01 12:53:04 UTC
Created attachment 202631 [details]
acpidump
Comment 4 Aaron Lu 2016-02-02 02:54:01 UTC
The notify is there:
                    Method (_Q66, 0, NotSerialized)  // _Qxx: EC Query
                    {
                        If ((ECOK == Zero))
                        {
                            Local0 = (PHSB (0xC0, 0x82) & 0x04)
                            Local0 >>= 0x02
                        }
                        Else
                        {
                            Local0 = LIDS /* \_SB_.PCI0.LPCB.EC0_.LIDS */
                        }

                        Notify (LID0, 0x80) // Status Change
                    }

so the only reason I can think of is, the EC _Q66 method doesn't get executed. I'm adding Lv and move the bug to ACPI/EC.
Comment 5 Lv Zheng 2016-04-05 05:51:21 UTC
1. What the version of the kernel to trigger this issue, probably this has already been fixed in the upstream kernel, so please give the upstream linus tree a try:
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

If this still persists, then let's do the next step:
2. please enable full EC debug by commenting the following line from the drivers/acpi/ec.c and rebuild the kernel:
/* #define DEBUG */
3. boot the kernel and perform the lid close.
4. upload the dmesg output here, it is better to mark the LID close begin/end in your dmesg for us.

Thanks
-Lv
Comment 6 Yan Pas 2016-04-05 20:23:13 UTC
Created attachment 211841 [details]
dmesglog

High!

I've tried latest kernel. The bug still exists.

Then I've rebuilt kernel with UNcommenting "#define DEBUG" directive. And then executed this bash script:
while true; do echo "LID STATE" >>dmesglog;cat /proc/acpi/button/lid/LID0/state >>dmesglog; dmesg -T >> dmesglog; sleep 1; done

Here is the output. The lid state is marked in the script. My dmesg version doesn't support "-w" (--follow) key, but I guess it's OK since it's a ring buffer.

Kind regards,
Yan
Comment 7 Lv Zheng 2016-04-13 05:46:42 UTC
Hi,

The log is not useful.
I couldn't see any EC entries in it.

Please:
1. Uncomment /* #define DEBUG */ from drivers/acpi/ec.c
2. Rebuild the kernel. (I'm not sure if you did this.)
3. Boot the rebuilt kernel. (I'm not sure if you did this.)
4. After booting into the new kernel and wait for 1 minute, please perform a LID close
5. After closing LID for 1 minute, please perform a LID open
6. After opening LID, execute "dmesg > dmesg.out"
7. It is better that you can make marks in the dmesg.out to let us know when do you start the LID close/open actions.
8. Upload the dmesg.out here.

Thanks
-Lv
Comment 8 Yan Pas 2016-04-13 08:24:51 UTC
Created attachment 212591 [details]
dmesg.out2

I did exactly what you did said. History of commands:
git clone git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git
ls
cd linux-lts-xenial-4.4.0/
ls
vi drivers/acpi/ec.c 
sudo apt-get build-dep linux-image-4.4.0-15-generic 
fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic
cd ..
ls -lh
sudo dpkg -i linux-headers-4.4.0-15-generic_4.4.0-15.31~14.04.1_amd64.deb linux-image-4.4.0-15-generic_4.4.0-15.31~14.04.1_amd64.deb linux-image-extra-4.4.0-15-generic_4.4.0-15.31~14.04.1_amd64.deb 
sudo dpkg -i linux-firmware-image-4.6.0-rc2-custom_4.6.0-rc2-custom-1_amd64.deb linux-headers-4.6.0-rc2-custom_4.6.0-rc2-custom-1_amd64.deb 

I wasn't able to build deb packages from kernel.org git second time, so I've built kernel 4.4 from canonical for debug purposes. The line 29 in ec.c file looks like this:" #define DEBUG " (two spaces: one before pp directive, one after)

Now I booted to this kernel (checked with uname) and performed your instructions. If it is not what you've expected to see, how should dmesg log look with DEBUG directive?
Comment 9 Lv Zheng 2016-04-13 08:53:48 UTC
I think you should confirm if this fix is in your kernel:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5508df89

If not, please try to back port it to your kernel.

Thanks
-Lv
Comment 10 Yan Pas 2016-04-13 12:54:18 UTC
As I've reported earlier in comment #6 I've installed kernel from kernel.org git earlier and the bug still existed. Then I was unable to rebuild it with debug directive, so I've installed canonical's kernel. The kernel was dated by april, so it included this commit (which is dated by march). COnsequently bug still exists in master brach.

The interesting feature of my lid - it can wake up my notebook by opening it in ms windows. Maybe linux doesn't support such types of lids.
Comment 11 Lv Zheng 2016-04-14 07:50:05 UTC
> I've tried latest kernel. The bug still exists.
> 
> Then I've rebuilt kernel with UNcommenting "#define DEBUG" directive.

The problem is:
I couldn't see any EC debugging messages in it.
So I cannot debug the issue.
Please make a confirmation again.

> And then executed this bash script:
> while true; do echo "LID STATE" >>dmesglog;cat
> /proc/acpi/button/lid/LID0/state >>dmesglog; dmesg -T >> dmesglog; sleep 1;
> done



The log looks a bit confusing, maybe you should:

1. Prepare a script like this:
  
    OLD_LIDSTATE=unknown
    while true; do
        NEW_LIDSTATE=`cat /proc/acpi/button/lid/LID0/state`
        if [ "x$OLD_LIDSTATE" != "x$NEW_LIDSTATE" ]; then
            dmesg -T -c >> dmesglog;
            echo "LID STATE $NEW_LIDSTATE" >> dmesglog;
            OLD_LIDSTATE=$NEW_LIDSTATE;
        fi
    done

Execute the following commands/actions sequence:
# dmesg -T -c > dmesglog
# echo "LID test start" >> dmesglog
# start the script and close the LID
# wait for 1 minutes
# open the LID and stop the script
# dmesg -T -c >> dmesglog
# echo "LID test stop" >> dmesglog

Thanks and best regards
-Lv
Comment 12 Yan Pas 2016-04-15 19:17:44 UTC
Created attachment 213031 [details]
dmesglog.15.04

What I did:
1)Cloned kernel from git://git.kernel.org again and rebuilt it :
$ 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
	cd linux/
	vi drivers/acpi/ec.c <-- Here line #define DEBUG doesn't contain spaces at begin and end
	uname -a
	cp /boot/config-`uname -r` .config
	yes '' | make oldconfig
	make clean
	make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-custom
	cd ..
	sudo dpkg -i linux-libc-dev_4.6.0-rc3-custom-1_amd64.deb linux-image-4.6.0-rc3-custom-dbg_4.6.0-rc3-custom-1_amd64.deb linux-headers-4.6.0-rc3-custom_4.6.0-rc3-custom-1_amd64.deb linux-firmware-image-4.6.0-rc3-custom_4.6.0-rc3-custom-1_amd64.deb
	sudo update-grub
	sudo dpkg -i linux-image-4.6.0-rc3-custom_4.6.0-rc3-custom-1_amd64.deb
2) added grub parameter debug
3) Booted to this kernel, waited for 1 minute and followed instructions under root

Seems like to no avail again :(
Comment 13 Lv Zheng 2016-04-18 08:25:58 UTC
I found these EC messages:
	Line 239: [Пт. апр. 15 22:03:32 2016] ACPI : EC: +++++ Starting EC +++++
	Line 240: [Пт. апр. 15 22:03:32 2016] ACPI : EC: 1: Increase driver
	Line 241: [Пт. апр. 15 22:03:32 2016] ACPI : EC: +++++ EC started +++++
It looks EC debugging is working.

I also found these:
LID test start
LID STATE state:      open
LID STATE state:      closed
LID STATE state:      open
LID test stop
The LID state is changed without any significant EC transactions.

This confuses me, since I'm not the one who reads the bug first.
Let me root cause it again:

Let me post the code related to this LID device:
                    Scope (\_SB)
                    {
                        Device (LID0)
                        {
                            Name (_HID, EisaId ("PNP0C0D"))  // _HID: Hardware ID
                            Method (_LID, 0, NotSerialized)  // _LID: Lid Status
                            {
                                If (LEqual (^^PCI0.LPCB.EC0.ECOK, Zero))
                                {
                                    And (PHSB (0xC0, 0x82), 0x04, Local0)
                                    ShiftRight (Local0, 0x02, Local0)
=> This code path should have been executed.
                                }
                                Else
                                {
                                    Store (^^PCI0.LPCB.EC0.LIDS, Local0)
                                }

                                Return (Local0)
                            }
                        }
                    }

                    Field (ECR, ByteAcc, Lock, Preserve)
                    {
                        Offset (0x82), 
                            ,   2, 
                        LIDS,   1
                    }

                    Method (_Q66, 0, NotSerialized)  // _Qxx: EC Query
                    {
                        If (LEqual (ECOK, Zero))
                        {
                            And (PHSB (0xC0, 0x82), 0x04, Local0)
                            ShiftRight (Local0, 0x02, Local0)
                        }
                        Else
                        {
                            Store (LIDS, Local0)
=> This line can trigger EC transactions, but _Q66 is not triggered.
                        }

                        Notify (LID0, 0x80)
=> This line can trigger EC event that can be monitored by acpi_listen.
                    }
                }

So it looks EC transactions against LIDS EC operation region should only happen when ECOK is not Zero.
And according to the log, _Q66 is in fact not executed.
What you obtained from /proc/acpi/button/lid/LID0/state is the evalation result of _LID, which is not dependent on ECOK.

Then I wonder whether ECOK is set to non Zero.
By searching the table, I got:
                Device (EC0)
                {
                    Name (_HID, EisaId ("PNP0C09"))  // _HID: Hardware ID
                    Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
                    {
                        IO (Decode16,
                            0x0062,             // Range Minimum
                            0x0062,             // Range Maximum
                            0x01,               // Alignment
                            0x01,               // Length
                            )
                        IO (Decode16,
                            0x0066,             // Range Minimum
                            0x0066,             // Range Maximum
                            0x01,               // Alignment
                            0x01,               // Length
                            )
                    })
                    Name (_GPE, 0x17)  // _GPE: General Purpose Events
                    Name (ECOK, Zero)
                    Method (_REG, 2, NotSerialized)  // _REG: Region Availability
                    {
                        If (LEqual (Arg0, 0x03)) {}
                    }

                    Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                    {
                        0x18, 
                        0x05
                    })
                    OperationRegion (ECR, EmbeddedControl, Zero, 0xFF)
It looks ECOK is Zero by default and won't be set by _REG!

And ECOK related search result is:
	Line 71:         If (\_SB.PCI0.LPCB.EC0.ECOK)
	Line 101:         If (\_SB.PCI0.LPCB.EC0.ECOK)
	Line 126:                 If (LEqual (\_SB.PCI0.LPCB.EC0.ECOK, Zero))
	Line 137:                 If (LEqual (\_SB.PCI0.LPCB.EC0.ECOK, Zero))
	Line 344:                 If (LEqual (^PCI0.LPCB.EC0.ECOK, Zero))
	Line 355:                 If (LEqual (^PCI0.LPCB.EC0.ECOK, Zero))
	Line 746:                 If (LEqual (\_SB.PCI0.LPCB.EC0.ECOK, Zero))
	Line 759:                     If (LEqual (\_SB.PCI0.LPCB.EC0.ECOK, Zero))
	Line 776:                 If (LEqual (\_SB.PCI0.LPCB.EC0.ECOK, Zero))
	Line 808:                 If (LEqual (\_SB.PCI0.LPCB.EC0.ECOK, Zero))
	Line 5554:                     Name (ECOK, Zero)
	Line 5604:                         If (LEqual (ECOK, Zero))
	Line 5674:                             If (LEqual (ECOK, Zero))
	Line 5697:                             If (LEqual (ECOK, Zero))
	Line 5718:                             If (LEqual (ECOK, Zero))
	Line 5779:                             If (LEqual (ECOK, Zero))
	Line 5831:                                 If (LEqual (^^PCI0.LPCB.EC0.ECOK, Zero))
	Line 5855:                         If (LEqual (ECOK, Zero))
	Line 5925:                         If (LEqual (^^EC0.ECOK, Zero))
	Line 5937:                         If (LEqual (^^EC0.ECOK, Zero))
	Line 5975:                         If (LEqual (^^EC0.ECOK, Zero))
	Line 5984:                         If (LEqual (^^EC0.ECOK, Zero))
	Line 6044:                         If (LEqual (^^EC0.ECOK, Zero))
	Line 6063:                         If (LEqual (^^EC0.ECOK, Zero))
	Line 6143:                         If (^^EC0.ECOK)
	Line 6158:                                 If (^^EC0.ECOK)
	Line 6171:                                 If (^^EC0.ECOK)
	Line 6185:                                 If (^^EC0.ECOK)
	Line 6202:                                     If (^^EC0.ECOK)
	Line 6220:                                     If (^^EC0.ECOK)
	Line 6238:                         If (^^EC0.ECOK)
	Line 6271:                             If (^^LPCB.EC0.ECOK)
	Line 6334:                         If (^^EC0.ECOK)
	Line 6366:                         If (^^EC0.ECOK)
	Line 6409:                         If (^^EC0.ECOK)
	Line 6453:                                 If (LEqual (^^EC0.ECOK, Zero))
	Line 6470:                                 If (LEqual (^^EC0.ECOK, Zero))
	Line 6487:                                 If (LEqual (^^EC0.ECOK, Zero))
	Line 8077:                 If (ECOK)
	Line 8099:                 If (ECOK)
	Line 8118:                     If (ECOK)
	Line 8142:                 If (ECOK)
	Line 8176:                     If (LEqual (^^PCI0.LPCB.EC0.ECOK, Zero))
	Line 8189:                     If (LEqual (^^PCI0.LPCB.EC0.ECOK, Zero))
	Line 8206:                 If (LEqual (^^PCI0.LPCB.EC0.ECOK, Zero))
	Line 8230:                 If (^^PCI0.LPCB.EC0.ECOK)
	Line 8250:                         If (^^PCI0.LPCB.EC0.ECOK)
	Line 8269:                         If (^^PCI0.LPCB.EC0.ECOK)
	Line 8289:                         If (^^PCI0.LPCB.EC0.ECOK)
	Line 8311:                 If (^^PCI0.LPCB.EC0.ECOK)
So you see, there is no code setting ECOK to non Zero!

So IMO, the "Notify (LID0, 0x80)" won't be executed at all.
That's why you cannot see LID status change events via acpi_listen.

I'm curious that:
Can Windows be running on this platform and can Windows be suspended by closing LID?

Thanks and best regards
-Lv
Comment 14 Yan Pas 2016-04-18 10:19:23 UTC
Yes, Win Vista is installed in dualboot. Lid close event can suspend notebook and lid open event can wakeup notebook from sleep. The first is configured in windows settings, the second in some sony app.
Comment 15 Lv Zheng 2016-04-18 23:04:43 UTC
(In reply to Yan Pas from comment #14)
> Yes, Win Vista is installed in dualboot. Lid close event can suspend
> notebook and lid open event can wakeup notebook from sleep. The first is
> configured in windows settings, the second in some sony app.

According to the ACPI code, EC is not working on this platform.
All ACPI functionalities may be achieved using platform specific means.
So it is likely that there is a platform specific driver on Windows responsible for enable the EC.

Could you try boot the kernel with:
acpi.

And upload the GPE information by:
cat /sys/firmware/acpi/interrupts/
Comment 16 Lv Zheng 2016-04-18 23:20:48 UTC
Sorry, my browser suddently committed my reply.

Could you try boot the kernel with:
acpi.

And upload the GPE information by:
cat /sys/firmware/acpi/interrupts/

Should be:

Could you try to boot the kernel with the following boot parameter:
acpi.trace_method_name=_SB.PCI0.LPCB.EC0._REG acpi.trace_state=method
After booting, perform the following commands:
# echo "_SB._LID0._LID" > /sys/module/acpi/parameters/trace_method_name
# echo "opcode-once" > /sys/module/acpi/parameters/trace_state
# cat /proc/acpi/button/lid/LID0/state
# dmesg > dmesg-acpi-tracer.log
# cat /sys/firmware/acpi/interrupts/gpe17 >> dmesg-acpi-tracer.log
And upload the log here.

Thanks
-Lv
Comment 17 Lv Zheng 2016-04-18 23:23:58 UTC
I guess there is a platform driver responsible for enabling EC and setting ECOK to non Zero.
But on Linux we are lacking of the platform specific way to enable EC to have the functionalities working.
Let's check your log to see if it is the truth.
BTW, has this platform ever been working as expected with old Linux kernels?

Thanks
-Lv
Comment 18 Mattia Dongili 2016-04-19 14:45:11 UTC
For Vaio laptops, the platform specific way to enable EC is in sony-laptop.ko.
See the SNC and SPIC device definitions.

This model is very old though an my recollection of specifics about it are starting to fade away. It may help to load sony-laptop with debug=1 and retry the opening/closing of the lid although I don't see either of the devices that sony-laptop knows about in the GPE notification path.
Comment 19 Yan Pas 2016-04-19 15:39:39 UTC
kernel 4.6 has sony-laptop module out of the box and it is loaded automatically (lsmod shows: sony_laptop            57344  0). I've ran: "sudo modprobe sony-laptop debug=1" and then "acpi_listen". There were no output  when I'closed the lid or pressed power button.

Then I've booted with these paarmeters:
$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-4.6.0-rc3-custom root=UUID=3b47a7a2-a977-4e78-9d37-5e0fa5307ac8 ro quiet splash acpi.trace_method_name=_SB.PCI0.LPCB.EC0._REG acpi.trace_state=method crashkernel=384M-:128M vt.handoff=7

BUT, files "/sys/module/acpi/parameters/trace_method_name" and "/sys/module/acpi/parameters/trace_state" have not appeared. So bash said access denied and haven't created any files when I entered first two commands.
Comment 20 Lv Zheng 2016-04-20 00:37:41 UTC
(In reply to Yan Pas from comment #19)
> kernel 4.6 has sony-laptop module out of the box and it is loaded
> automatically (lsmod shows: sony_laptop            57344  0). I've ran:
> "sudo modprobe sony-laptop debug=1" and then "acpi_listen". There were no
> output  when I'closed the lid or pressed power button.
> 
> Then I've booted with these paarmeters:
> $ cat /proc/cmdline 
> BOOT_IMAGE=/boot/vmlinuz-4.6.0-rc3-custom
> root=UUID=3b47a7a2-a977-4e78-9d37-5e0fa5307ac8 ro quiet splash
> acpi.trace_method_name=_SB.PCI0.LPCB.EC0._REG acpi.trace_state=method
> crashkernel=384M-:128M vt.handoff=7
> 
> BUT, files "/sys/module/acpi/parameters/trace_method_name" and
> "/sys/module/acpi/parameters/trace_state" have not appeared. So bash said
> access denied and haven't created any files when I entered first two
> commands.

In order to enable ACPICA interpreter tracer, you need to configure the kernel with CONFIG_ACPI_DEBUG=y.
Please refer to Documentation/acpi/method-tracing.txt if you still encountered issues in using it.

Thanks and best regards
-Lv
Comment 21 Lv Zheng 2016-04-20 00:38:05 UTC
(In reply to Mattia Dongili from comment #18)
> For Vaio laptops, the platform specific way to enable EC is in
> sony-laptop.ko.
> See the SNC and SPIC device definitions.
> 
> This model is very old though an my recollection of specifics about it are
> starting to fade away. It may help to load sony-laptop with debug=1 and
> retry the opening/closing of the lid although I don't see either of the
> devices that sony-laptop knows about in the GPE notification path.

Thanks for the information.

Best regards
-Lv
Comment 22 Yan Pas 2016-04-20 10:07:55 UTC
Sounds like I should again recompile entire kernel, which takes 4-5 hours. Maybe you know something else which I should enable for FULL debug output? (I'm building kernel 3rd time). Or do you know how to build kernel on debian-based distros without recompiling entire kernel? (every make command executes make clean for some stupid reason...)
Comment 23 Mattia Dongili 2016-04-21 05:14:07 UTC
(In reply to Yan Pas from comment #19)
> kernel 4.6 has sony-laptop module out of the box and it is loaded
> automatically (lsmod shows: sony_laptop            57344  0). I've ran:
> "sudo modprobe sony-laptop debug=1" and then "acpi_listen". There were no
> output  when I'closed the lid or pressed power button.

You should first remove the module (rmmod sony-laptop) and the reload it (modprobe sony-lapop debug=1).
Apologies if this is what you did already, it wasn't clear from the previous message.

With debug=1, you should get a lot of extra output from sony-laptop in the kernel log (dmesg), I'd be interested in seeing that. Then you can also check if there is any additional output on LID events.

Thanks
Mattia
Comment 24 Lv Zheng 2016-04-21 15:21:00 UTC
(In reply to Yan Pas from comment #22)
> Sounds like I should again recompile entire kernel, which takes 4-5 hours.
It sounds like you are using a very old or slow machine.
Or your kernel is not customized for development.

> Maybe you know something else which I should enable for FULL debug output?
> (I'm building kernel 3rd time). Or do you know how to build kernel on
> debian-based distros without recompiling entire kernel? (every make command
> executes make clean for some stupid reason...)

You can also use dynamic debugging facility instead of rebuilding if the dynamic debugging is possible on your platform.
# mount -t debugfs none /sys/kernel/debug
# echo -n "file ec.c +p" > /sys/kernel/debug/dynamic_debug/control
Please refer to Documentation/dynamic-debug-howto.txt if I made mistake here.
Comment 25 Lv Zheng 2016-04-21 15:36:41 UTC
(In reply to Yan Pas from comment #22)
> Or do you know how to build kernel on
> debian-based distros without recompiling entire kernel? (every make command
> executes make clean for some stupid reason...)

I don't know what's happening there.
Normally we'll shrink kernel config items on our development platforms.
As we don't need most of the drivers and many features are not used.
That can help to save lots of time in building a kernel.

The test is meant to:
1. confirm if EC0._REG has been evaluated by the kernel.
2. confirm there is no EC GPE triggered.
If so, from EC driver's point of view, there is no bug in the EC driver.
Because EC should be enabled by external facilities.
For example, ACPI hardware enabling or some system management chip enabling.

After confirming that, we may start to figure out the platform specific way of handling the LID events of this platform.

Thanks and best regards
-Lv
Comment 26 Lv Zheng 2016-04-21 15:40:38 UTC
(In reply to Mattia Dongili from comment #23)
> (In reply to Yan Pas from comment #19)
> > kernel 4.6 has sony-laptop module out of the box and it is loaded
> > automatically (lsmod shows: sony_laptop            57344  0). I've ran:
> > "sudo modprobe sony-laptop debug=1" and then "acpi_listen". There were no
> > output  when I'closed the lid or pressed power button.
> 
> You should first remove the module (rmmod sony-laptop) and the reload it
> (modprobe sony-lapop debug=1).
> Apologies if this is what you did already, it wasn't clear from the previous
> message.
> 
> With debug=1, you should get a lot of extra output from sony-laptop in the
> kernel log (dmesg), I'd be interested in seeing that. Then you can also
> check if there is any additional output on LID events.

I can see LID_OPOENED/LID_CLOSED events are not correctly implemented in sony-laptop.c.
Can they correctly be handled by the driver?

Thanks and best regards
-Lv
Comment 27 Lv Zheng 2016-04-21 20:37:29 UTC
(In reply to Lv Zheng from comment #24)
> (In reply to Yan Pas from comment #22)
> > Sounds like I should again recompile entire kernel, which takes 4-5 hours.
> It sounds like you are using a very old or slow machine.
> Or your kernel is not customized for development.
> 
> > Maybe you know something else which I should enable for FULL debug output?
> > (I'm building kernel 3rd time). Or do you know how to build kernel on
> > debian-based distros without recompiling entire kernel? (every make command
> > executes make clean for some stupid reason...)
> 
> You can also use dynamic debugging facility instead of rebuilding if the
> dynamic debugging is possible on your platform.
> # mount -t debugfs none /sys/kernel/debug
> # echo -n "file ec.c +p" > /sys/kernel/debug/dynamic_debug/control
> Please refer to Documentation/dynamic-debug-howto.txt if I made mistake here.

I think you have to recompile the kernel with CONFIG_ACPI_DEBUG=y if you want to enable the interpreter tracer to confirm if _REG has been executed.
Dynamic debugging is just a useful replacement of enabling ec.c debugging while you should have already enabled it according to comment 13.

Thanks
-Lv
Comment 28 Yan Pas 2016-04-21 20:40:32 UTC
Created attachment 213601 [details]
acpitraces_sonylaptop

It was challenging, but I did managed to get these files :)
Comment 29 Mattia Dongili 2016-04-22 14:45:38 UTC
(In reply to Yan Pas from comment #28)
> Created attachment 213601 [details]
> acpitraces_sonylaptop
> 
> It was challenging, but I did managed to get these files :)

Oh shucks...

[   66.555207] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   66.555327] IP: [<ffffffffc02c2a1a>] sony_nc_function_cleanup+0x5a/0x3b0 [sony_laptop]

You'll need the attached patch to be able to successfully remove sony-laptop, else you can load it with debug=1 from the beginning by creating a file called /etc/modprobe.d/sony-laptop.conf that contains only this one line:
 
options sony-laptop debug=1

- - - 8< - - -
Then reboot.
Comment 30 Mattia Dongili 2016-04-22 14:46:24 UTC
Created attachment 213671 [details]
sony-laptop, fix (unrelated) unload bug
Comment 31 Mattia Dongili 2016-04-22 14:54:46 UTC
(In reply to Lv Zheng from comment #26)
> (In reply to Mattia Dongili from comment #23)
> > (In reply to Yan Pas from comment #19)
...
> > With debug=1, you should get a lot of extra output from sony-laptop in the
> > kernel log (dmesg), I'd be interested in seeing that. Then you can also
> > check if there is any additional output on LID events.
> 
> I can see LID_OPOENED/LID_CLOSED events are not correctly implemented in
> sony-laptop.c.
> Can they correctly be handled by the driver?

If there are events unknown to the driver, we should get here:

        dprintk("unknown event ([%.2x] [%.2x]) at port 0x%.4x(+0x%.2x)\n",       
                        ev, data_mask, dev->cur_ioport->io1.minimum,             
                        dev->evport_offset);  

and see the event details once the module is loaded with debug=1 and the event generated.
Then yes, they can be added to the driver.
Comment 32 Yan Pas 2016-04-23 18:49:07 UTC
Created attachment 213831 [details]
sonylaptopmodule_log

Created script in /etc/modprobe.d
Comment 33 Mattia Dongili 2016-04-24 03:01:47 UTC
(In reply to Yan Pas from comment #32)
> Created attachment 213831 [details]
> sonylaptopmodule_log
> 
> Created script in /etc/modprobe.d

Thank you for collecting this data.
Assuming that you closed and re-opened the LID, I don't see any event logged by sony-laptop.
We are back at the ACPI EC driver I suspect.
Comment 34 Yan Pas 2016-04-24 08:12:16 UTC
Yes, I've ran dmesg -c after that. Than again closed lid several times and then executed dmesg. The output was empty.
Comment 35 Lv Zheng 2016-04-25 05:11:03 UTC
(In reply to Yan Pas from comment #28)
> Created attachment 213601 [details]
> acpitraces_sonylaptop
> 
> It was challenging, but I did managed to get these files :)

Hi,

First I got:
       0   enabled
So it looks the EC is not running at all.

Second I got this as the first line:
[    1.084539] hub 4-0:1.0: USB hub found

So it is already too late for _REG tracer logs.
And I couldn't see "Command line" log in it.
So I cannot help to confirm if you have correctly enabled ACPICA tracer.
Could you append boot parameter of "log_buf_len=64M" to see if we can get the correct log stuffs appeared.

Thanks
-Lv
Comment 36 Lv Zheng 2016-04-25 05:22:16 UTC
(In reply to Mattia Dongili from comment #33)
> (In reply to Yan Pas from comment #32)
> > Created attachment 213831 [details]
> > sonylaptopmodule_log
> > 
> > Created script in /etc/modprobe.d
> 
> Thank you for collecting this data.
> Assuming that you closed and re-opened the LID, I don't see any event logged
> by sony-laptop.

The error log is only triggered when the index is exceeding array size of sony_laptop_input_index.
While we can see SONYPI_EVENT_LID_OPENED/CLOSED is in this array but marked as -1.
For scancode = -1 events, they need to be specially handled.
While there is no code and error/info logs related toe the 2 events.
Why don't you add 2 case for ONYPI_EVENT_LID_OPENED/CLOSED to see if they have been reported?

> We are back at the ACPI EC driver I suspect.

I don't think so.

ACPI Ec driver is not responsible for enabling EC hardware/firmeare.
They are normally enabled by other means:
For example:
1. GPIO drivers
2. system menagement drivers
3. ACPI hardware enabling
The driver only handles EC IO and is responsible for executing _REG.

So what I need to confirm is whether _REG is executed by the EC driver.
If it has executed and there is still no EC GPEs, the bug should not be related ACPI LID/EC and is still related to some platform specific stuffs.

Thanks
-Lv
Comment 37 Mattia Dongili 2016-04-25 07:12:36 UTC
(In reply to Lv Zheng from comment #36)
> (In reply to Mattia Dongili from comment #33)
> > (In reply to Yan Pas from comment #32)
> > > Created attachment 213831 [details]
> > > sonylaptopmodule_log
> > > 
> > > Created script in /etc/modprobe.d
> > 
> > Thank you for collecting this data.
> > Assuming that you closed and re-opened the LID, I don't see any event
> logged
> > by sony-laptop.
> 
> The error log is only triggered when the index is exceeding array size of
> sony_laptop_input_index.
> While we can see SONYPI_EVENT_LID_OPENED/CLOSED is in this array but marked
> as -1.
> For scancode = -1 events, they need to be specially handled.

Ok, but what you want to look at is the 

	struct sonypi_eventtypes type3_events[]

(notice that in the log, the model is identified as a Type3) and the fact that
this array doesn't have the sonypi_lidev events.
So the IRQ handler will fall off the end of the dev->event_types array and
print the warning.
Regardless, the other dprintk ~30 lines earlier should have been printed:

         dprintk("event ([%.2x] [%.2x]) at port 0x%.4x(+0x%.2x)\n",
                        ev, data_mask, dev->cur_ioport->io1.minimum,
                        dev->evport_offset);

> While there is no code and error/info logs related toe the 2 events.
> Why don't you add 2 case for ONYPI_EVENT_LID_OPENED/CLOSED to see if they
> have been reported?

Like I said, the IRQ handler (sony_pic_irq) already prints all events when the
module is loaded with debug=1.
But also, I'm not sure how they would be reported via any of the SNC/SPIC
devices since _Q66 notifies LID0.

> > We are back at the ACPI EC driver I suspect.
> 
> I don't think so.
> 
> ACPI Ec driver is not responsible for enabling EC hardware/firmeare.
> They are normally enabled by other means:
> For example:
> 1. GPIO drivers
> 2. system menagement drivers
> 3. ACPI hardware enabling
> The driver only handles EC IO and is responsible for executing _REG.

That's right. Ok.
I'm wondering if there is any other event that is correctly dispatched via EC?

> So what I need to confirm is whether _REG is executed by the EC driver.
> If it has executed and there is still no EC GPEs, the bug should not be
> related ACPI LID/EC and is still related to some platform specific stuffs.

The bottom line is that if there is a platform specific initialization
required, without reverse engineering what the Windows drivers do, there is no
easy way to know how to initialize the embedded controller.

So, back to the original request, there's two problems:
1. no LID events
2. no wake-up on LID open.

For (2), I think the SLRS method of the SNC device is the one responsible for
enabling wake-up.
It shouldn't be too difficult to add support for it if you are willing to try.

Disclaimer: reality is that I have no idea whether the code I provide will fry
your laptop or do anything close to what I think it will do. I won't be
responsible for any damage.
Comment 38 Yan Pas 2016-04-25 16:26:55 UTC
(In reply to Lv Zheng from comment #35)
> (In reply to Yan Pas from comment #28)
> > Created attachment 213601 [details]
> > acpitraces_sonylaptop
> > 
> > It was challenging, but I did managed to get these files :)
> 
> Hi,
> 
> First I got:
>        0   enabled
> So it looks the EC is not running at all.
> 
> Second I got this as the first line:
> [    1.084539] hub 4-0:1.0: USB hub found
> 
> So it is already too late for _REG tracer logs.
> And I couldn't see "Command line" log in it.
> So I cannot help to confirm if you have correctly enabled ACPICA tracer.
> Could you append boot parameter of "log_buf_len=64M" to see if we can get
> the correct log stuffs appeared.
> 
> Thanks
> -Lv

The kernel with acpi tracer was only 15mb and there were no support of FN keys, touchpad and a lot of things. A was frustrated when at the last step of building I got an error that LOCALVERSION can't start with letter A. So I've installed kernel via simple sudo checkinstall -D. Should I rebuild kernel again with acpi_tracer support? I've forgot where should I enable it in menuconfig...
Comment 39 Lv Zheng 2016-04-26 03:17:43 UTC
(In reply to Mattia Dongili from comment #37)

> But also, I'm not sure how they would be reported via any of the SNC/SPIC
> devices since _Q66 notifies LID0.

The _Q66 is not triggered.
In attachment 213601 [details], I asked Yan to obtain EC GPE count. It was 0. see comment 15 for the GPE count result:
       0   enabled
I also asked Yan to enable EC debugging logs, and there is no EC activity in the dmesg except the driver initialization. In comment 8, this means "EC GPE is enabled to me“:
[    0.186026] ACPI : EC: +++++ Starting EC +++++
[    0.186044] ACPI : EC: 1: Increase driver
[    0.186048] ACPI : EC: +++++ EC started +++++

EC has 2 kinds of interrupts - EC transactions (if EC operation regions are queried via AML tables), EC events (like _Q66). Normally, EC._REG only controls EC transaction GPEs, cannot control EC event GPEs. It is used to block EC operation region accesses from the AML tables.

So this looks suspecious to me that the platform firmware hasn't made the EC working or we are lacking of platform specific way to enable the EC.

Another proof is "by decoding DSDT, we got an empty EC._REG":
                    Name (ECOK, Zero)
                    Method (_REG, 2, NotSerialized)                    {
                        If (LEqual (Arg0, 0x03)) {}
                    }
It never sets ECOK to One.
And the entire table requires ECOK to be not Zero to trigger EC transactions.
For example _LID and _Q66.
Please see comment 13 for the details.

> > > We are back at the ACPI EC driver I suspect.
> > 
> > I don't think so.
> > 
> > ACPI Ec driver is not responsible for enabling EC hardware/firmeare.
> > They are normally enabled by other means:
> > For example:
> > 1. GPIO drivers
> > 2. system menagement drivers
> > 3. ACPI hardware enabling
> > The driver only handles EC IO and is responsible for executing _REG.
> 
> That's right. Ok.
> I'm wondering if there is any other event that is correctly dispatched via
> EC?

Nothing dispatched via EC as I said before.
See comment 15 for the conclusion about this line:
       0   enabled

> > So what I need to confirm is whether _REG is executed by the EC driver.
> > If it has executed and there is still no EC GPEs, the bug should not be
> > related ACPI LID/EC and is still related to some platform specific stuffs.
> 
> The bottom line is that if there is a platform specific initialization
> required, without reverse engineering what the Windows drivers do, there is
> no
> easy way to know how to initialize the embedded controller.

From the EC driver's point of view.
I only need to confirm that EC._REG has been executed.
EC driver doesn't ensure anything other than executing _REG for the initialization.
So probably we should move this bug to platform specific if we can confirm the execution of EC._REG.

> So, back to the original request, there's two problems:
> 1. no LID events
> 2. no wake-up on LID open.

That means, no _Q66 or some platform specific stuff other than EC events.

> For (2), I think the SLRS method of the SNC device is the one responsible for
> enabling wake-up.
> It shouldn't be too difficult to add support for it if you are willing to
> try.
> 
> Disclaimer: reality is that I have no idea whether the code I provide will
> fry
> your laptop or do anything close to what I think it will do. I won't be
> responsible for any damage.

That's beyond what I can help.
Sony professionals should be needed for this.

Thanks and best regards
-Lv
Comment 40 Lv Zheng 2016-04-26 04:14:23 UTC
> The kernel with acpi tracer was only 15mb and there were no support of FN
> keys, touchpad and a lot of things. A was frustrated when at the last step
> of building I got an error that LOCALVERSION can't start with letter A. So
> I've installed kernel via simple sudo checkinstall -D. Should I rebuild
> kernel again with acpi_tracer support? I've forgot where should I enable it
> in menuconfig...

The chanllenging steps for the 1st test are:

1. Enable ACPI tracer by setting CONFIG_ACPI_DEBUG=y via "make menuconfig":
Power management and ACPI options --->
  ACPI (Advanced Configuration and Power Interface) support --->
    Debug Statements
    ^^^^^^^^^^^^^^^^

2. Rebuild the kernel, this should be able to make the following files appearing after booting:
/sys/module/acpi/parameters/trace_method_name
/sys/module/acpi/parameters/trace_state
/sys/module/acpi/parameters/trace_debug_level
/sys/module/acpi/parameters/trace_debug_layer
We don't need to modify trace_debug_level/layer, but can use default trace_debug_level/layer for testing.

3. You need to boot the kernel with the following boot parameters:
acpi.trace_method_name=_SB.PCI0.LPCB.EC0._REG acpi.trace_state=method

4. After booting, you need to perform the following commands:
# echo "_SB._LID0._LID" > /sys/module/acpi/parameters/trace_method_name
# echo "opcode-once" > /sys/module/acpi/parameters/trace_state
# cat /proc/acpi/button/lid/LID0/state
# dmesg > dmesg-acpi-tracer.log
# cat /sys/firmware/acpi/interrupts/gpe17 >> dmesg-acpi-tracer.log

5. Please upload the dmesg-acpi-tracer.log here.

I can offer you another tests to confirm if EC is running:

1. Please build EC userspace tools (tools/power/acpi/tools/ec/ec) under tools/power/acpi. In the kernel source tree, do the following commands:
# cd tools/power/acpi
# make

2. Enable CONFIG_ACPI_EC_DEBUGFS=y and CONFIG_DYNAMIC_DEBUG via "make menuconfig",
Power management and ACPI options --->
  ACPI (Advanced Configuration and Power Interface) support --->
    EC read/write access through /sys/kernel/debug/ec
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    Debug Statements
  Kernel hacking --->
    printk and dmesg options --->
      Enable dynamic printk() support
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

3. Build and boot the kernel (you may skip this step because you can have this step done with the first test).

4. Enable dynamic debugging for EC driver:
# mount -t debugfs none /sys/kernel/debug
# echo -n "file ec.c +p" > /sys/kernel/debug/dynamic_debug/control

5. Obtain initial EC GPE counts with tools/power/acpi/tools/ec/ec you built before
# dmesg -T -c > ec-gpe.log
# echo "before accessing EC" >> ec-gpe.log
# cat /sys/firmware/acpi/interrupts/gpe17 >> ec-gpe.log
# tools/power/acpi/tools/ec/ec -r >> ec-gpe.log
# dmesg -T -c >> ec-gpe.log
# echo "after accessing EC" >> ec-gpe.log
# cat /sys/firmware/acpi/interrupts/gpe17 >> ec-gpe.log

6. Please upload the ec-gpe.log here.

Thanks and best regards
-Lv
Comment 41 Mattia Dongili 2016-04-26 14:08:12 UTC
(In reply to Lv Zheng from comment #39)
> (In reply to Mattia Dongili from comment #37)
...
> > For (2), I think the SLRS method of the SNC device is the one responsible
> for
> > enabling wake-up.
> > It shouldn't be too difficult to add support for it if you are willing to
> > try.
> > 
> > Disclaimer: reality is that I have no idea whether the code I provide will
> > fry
> > your laptop or do anything close to what I think it will do. I won't be
> > responsible for any damage.

The question above was for Yan.

> That's beyond what I can help.
> Sony professionals should be needed for this.

The whole sony-laptop driver is a community effort, reverse engineered from Windows code and traces of Windows drivers. I'm afraid I can't help much further, as I don't have the bandwidth to help you trace how to initialize the embedded controller using the SNC device io ports.

The SLRS offer above stands (as does the disclaimer), in my opinion the lack of LID events could be worked around with a script that polls the state of the LID0 button and suspends the machine when it's closed.

One last attempt here, I noticed that the SNC device initialization code does something slightly different if sony-laptop is loaded with compat=1.
If you want to give it a shot, set that option in the /etc/modprobe.d file that you created before:

options sony-laptop debug=1 compat=1

and reboot. Then please open/close the lid and post the kernel log here.

Lv, I think this is a platform driver issue, feel free to re-assign.

P.S.: Sony doesn't make Vaios anymore and never acknowledged the sony-laptop driver, good luck finding a Sony professional that is willing to help. ;-)
Comment 42 Lv Zheng 2016-04-27 08:49:26 UTC
Hi,

> The SLRS offer above stands (as does the disclaimer), in my opinion the lack
> of LID events could be worked around with a script that polls the state of
> the LID0 button and suspends the machine when it's closed.

It shouldn't be a script.
We are suffering from userspace behaviors of triggering suspend by reading LID status from proc directly or enforcing kernel returning initial LID state.
Because _LID control method can be implemented by cached value which can be defaulted to "close".
So false-using _LID returning value could break many machines.
It is better to have userspace always waiting for LID input events.
And implement such polling mechanism from kernel.

> Lv, I think this is a platform driver issue, feel free to re-assign.

Let's see if this is an EC issue according to the test result.
BTW, I may move it to LID to implement the polling mechanism mentioned above.
And move it to platform specific after that.

> P.S.: Sony doesn't make Vaios anymore and never acknowledged the sony-laptop
> driver, good luck finding a Sony professional that is willing to help. ;-)

That's a bad news for the distribution vendors rather than a bad news for me. :(

Thanks and best regards
-Lv
Comment 43 Yan Pas 2016-04-30 13:48:05 UTC
Created attachment 214781 [details]
ec-gpe.log1

I've chosen the second option. Now I was building kernel on more powerful machine. make muneconfig, make, checkinstall -D. It weights again 15mb, I don't know what do I wrong. The only thing I've understood is that linux has novice-unfriendly build system. There are a lot of hooks/workaround for building debian packages, all of them  throw errors if there were any change in menuconfig. It's weird that it uses raw makefiles instead something higher-level like cmake or at least autotools.

The thing that confuses me: "Linux already knows somehow that lid is closed (file in /proc/...), but it can't do anything with this"
Comment 44 Lv Zheng 2016-05-02 12:27:19 UTC
(In reply to Yan Pas from comment #43)
> Created attachment 214781 [details]
> ec-gpe.log1
> 
> I've chosen the second option.

I didn't mean two options.
I meant that the both tests should be required for confirming if there was an EC driver bug on your system.
So please try again the test 1.

If your test was correct, we could see:
before accessing EC
       0   enabled
after accessing EC
       0   enabled
Which means:
1. No tools/power/acpi/tools/ec -r output?
2. No significant EC GPEs.

For 1, I don't know what's going on there.
The ec command should at least prompt us errors.

Could you try the test again with the following command:
# tools/power/acpi/tools/ec/ec -r >> ec-gpe.log
replaced by the following commands:
# echo "obtaining io..." >> ec-gpe.log
# sudo tools/power/acpi/tools/ec/ec -r >> ec-gpe.log 2>&1
# echo "obtaining gpe..." >> ec-gpe.log
# sudo cat /sys/kernel/debug/ec/ec0/gpe >> ec-gpe.log 2>&1
# echo "obtaining dynamic debug..." >> ec-gpe.log
# sudo cat /sys/kernel/debug/dyamic_debug/control | grep ec.c >> ec-gpe.log 2>&1

> Now I was building kernel on more powerful
> machine. make muneconfig, make, checkinstall -D. It weights again 15mb, I
> don't know what do I wrong.

But the file you uploaded here was just 72kb.
Could you put the 15mb file on some network share site for us to access.
In the log file you uploaded, there was still nothing useful in it.
I still couldn't see the kernel boot parameters in it.

> The thing that confuses me: "Linux already knows somehow that lid is closed
> (file in /proc/...), but it can't do anything with this"

IMO, you shouldn't root cause bugs using user experiences...

Thanks and best regards
-Lv
Comment 46 Lv Zheng 2016-06-08 02:54:47 UTC
Hi,

Since EC0.ECOK has never been set in the AML.
There dosen't seem to be any EC related functionalities working.
We'd like to see how Windows behave on this platform.
Validators could install a dbg build of Windows and run it on the platform.
Then they can use "amli" (which is only available on dbg build of Windows) to obtain if EC0.ECOK was set on the platform.
Please let us know the Windows validation result first.

OTOH, implenting polling thread for _LID can only benefit one platform.
It is not sufficient.
We need to know this is a common issue and many users need such kind of quirk, otherwise we'd better not to implement such a quirk for just one single platform.

So re-assign it to platform drivers.

Thanks and best regards
-Lv

Note You need to log in before you can comment on or make changes to this bug.