Bug 44161

Summary: Samsung Series 9 np900x4b: ACPI does not seem to reflect charging/discharging status for the battery correctly.
Product: ACPI Reporter: Lori Holden (email)
Component: BIOSAssignee: Zhang Rui (rui.zhang)
Status: RESOLVED CODE_FIX    
Severity: high CC: 3daniele03, aaron.lu, adborden, alan, alessandro.crismani, andfagiani, avi.kivity, aviram, balazs4web, berend.de.schouwer, bjorn.snoen, ccrisan, cinemaapk.in, clancy.kieran+kernel, cribari, david.shirvanyants, dennis.jansen, dmitriy.trt, dutra.julio, el, equilibriumtr, frank, fringd, giannis.koutsou, grrwlf, h, jacco.ermers, jan, jlee, jonas, joseph.salisbury, jpcarlino, juanmanuel.cabo, jvpgomes, kernel, lenb, leon_ti, lops776, lv.zheng, lyderic.lefever, marc-kernel, marcus.m.maertens, mark, mauritiusdadd, miticotoby, mjg, mmmaaarrrnnneeesss, moraus, mr203010spam, mytools, nichtstuer, nicolasporcel06, nils, odi, pablo.caron, parchd+kernel, phemmer+kernel, psep.pentauc, reshad, rheluani, Robert.Moore, rui.zhang, san, sebastian.riemer, soda, teppot, thomas, tianyu.lan, webbino, yoeturner, yu.c.chen, zenderro, zygimantas.b
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.11.10 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
lsmod
lspci
upower -d
ACPI DSDT DSL
dmesg
lshw
lsmod
lspci
upower
acpidump output, as requested by Robert in Comment #114
acpidump for SAMSUNG NP900x3f-k01de
acpidump output
acpidump NP900X3C
acpidump for Samsung NP530U3C-A03IT
acpidump for NP535U3C-A01PL
ec.patch
samsung_fix_ec_events patched to show count
acpi_ec_clear function, only run on affected hardware
acpi_ec_clear function, only run on affected hardware (fixed)
acpi_ec_clear function, only run on affected hardware
acpi_ec_clear function: void, show loops, warn
acpi_ec_clear function, only run on affected hardware
process rather than discard events in acpi_ec_clear
[PATCH] ACPI/EC: Fix a regression caused by conflict firmware behavior between Samsung and Acer.
[PATCH] ACPI/EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag.
[PATCH] ACPI/EC: Enhance the checks to apply to QR_EC transactions.
[PATCH] ACPI/EC: Add reference counting for query handlers.
[PATCH] ACPI/EC: Add reference counting for query handlers.
[PATCH] ACPI/EC: Add command flushing support.
[PATCH] ACPI/EC: Cleanup QR_SC command processing by adding a kernel thread to poll EC events.
[PATCH] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.
[PATCH 7/8] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.
[PATCH] ACPI/OSL: Add IRQ handler flushing support in the OSL.
proc/acpi/wakeup
dmesg - #DEBUG enabled
dmesg-right-after-resume_A
dmesg-right-after-resume_C
dmesg-right-after-resume_D2
dmesg on linux-next without quirk showing EC working before but not after triggering bug
dmesg on linux-next without quirk showing bug persisting over several suspend cycles
dmesg on linux-next without quirk showing bug persisting over power cycle
revert patch
revert-patch works
revert patch on top of 4.19-rc4

Description Lori Holden 2012-07-03 07:23:53 UTC
While the power supply device (ADP1) seems to always understand when the cord is plugged in and removed, the battery (BAT1) does not seem to know when it is discharging or not. 

Samsung Series 9 np900x4b with Fedora 17

$ uname -a
Linux Lethe 3.4.4-3.fc17.x86_64 #1 SMP Tue Jun 26 20:54:56 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

(Results of acpi -i -b -a)
*** AC PLUGGED IN ***
Battery 0: Full, 100%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** AC REMOVED ***
Battery 0: Charging, 100%,  until charged
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

*** System Suspended and woken back up ***

*** AC STILL REMOVED ***
Battery 0: Discharging, 100%, 06:33:49 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line

*** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** SUSPEND AND WAKE ***

*** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: on-line

*** AC REMOVED ***
Battery 0: Charging, 98%, charging at zero rate - will never fully charge.
Battery 0: design capacity 8400 mAh, last full capacity 8500 mAh = 100%
Adapter 0: off-line


The same information is reflected when looking directly at the sysfs enteries. 
/sys/class/power_supply/ADP1/online shows the correct status all the time. (1 for online, 0 offline)

/sys/class/power_supply/BAT1/status will show the correct status after a fresh suspend, reflecting the above results.
Comment 1 Lori Holden 2012-07-03 07:24:37 UTC
Created attachment 74581 [details]
dmesg
Comment 2 Lori Holden 2012-07-03 07:24:54 UTC
Created attachment 74591 [details]
lsmod
Comment 3 Lori Holden 2012-07-03 07:25:09 UTC
Created attachment 74601 [details]
lspci
Comment 4 Lori Holden 2012-07-03 07:25:32 UTC
Created attachment 74611 [details]
upower -d
Comment 5 Lori Holden 2012-07-03 07:26:13 UTC
Created attachment 74621 [details]
ACPI DSDT DSL
Comment 6 Lori Holden 2012-07-03 07:57:10 UTC
The effect this has on my system is that upowerd does not always know if the laptop is running on AC or Battery. This also means that things like the battery indicator in Gnome 3 and tuned do not end up in the right state.
Comment 7 Bryan Stine 2012-07-07 23:52:41 UTC
Also affects new Ivy Bridge Samsung Series 9 models (observed on 900X4C). Also tested this using mainline kernels without vendor changes and got the same behavior.
Comment 8 Mark Syms 2012-07-15 07:37:24 UTC
Just in case the NP400x4C (Ivy Bridge version) give different values, here are the results of acpi -i -b -a on this system.

** AC Off **

Battery 0: Discharging, 70%, 03:46:06 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** AC On **

Battery 0: Discharging, 70%, 01:55:16 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

** Suspend and wake **

** AC Still On **

Battery 0: Charging, 70%, 00:50:05 until charged
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

** AC Off **

Battery 0: Charging, 70%, 02:11:35 until charged
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** Suspend and wake in this state **

** AC Off **

Battery 0: Discharging, 70%, 05:06:01 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: off-line

** AC On **

Battery 0: Discharging, 70%, 406:00:00 remaining
Battery 0: design capacity 8400 mAh, last full capacity 8700 mAh = 100%
Adapter 0: on-line

What I find quite interesting about these is that the times change as the AC state changes and seem to reflect the "correct" time, i.e. time to discharge when disconnected and time to charged when connected. So something is detecting the change, it's just not propagating to all properties.
Comment 9 Mark Syms 2012-07-15 07:39:03 UTC
Should have said, the above came from an Ubuntu 12.04 machine using mainline kernel packages.

Linux XXXXXX 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Comment 10 Jason Robinson 2012-07-23 14:24:27 UTC
Also affected, Samsung 5 Series NP530U3C laptop, same symptoms:

Plugged in:
Battery 0: Charging, 79%, 00:27:20 until charged
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: on-line

Not plugged in:
Battery 0: Charging, 79%, 01:26:32 until charged
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: off-line

uname -a
Linux jaytec 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Also had Ubuntu Quantal 3.5 kernel before wiping and reinstalling 3.2. The bug existed there also.
Comment 11 Jason Robinson 2012-07-23 14:25:40 UTC
Created attachment 75961 [details]
dmesg
Comment 12 Jason Robinson 2012-07-23 14:26:00 UTC
Created attachment 75971 [details]
lshw
Comment 13 Jason Robinson 2012-07-23 14:26:16 UTC
Created attachment 75981 [details]
lsmod
Comment 14 Jason Robinson 2012-07-23 14:26:35 UTC
Created attachment 75991 [details]
lspci
Comment 15 Jason Robinson 2012-07-23 14:26:54 UTC
Created attachment 76001 [details]
upower
Comment 16 Jason Robinson 2012-07-31 10:38:46 UTC
This for me seems fixed, although my Samsung laptop brightness buttons cause the screen to flicker and thus are useless now :)

uname -a
Linux jaytec 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Kernel has not updated, not quite sure what update fixed it, possibly a driver update from Ubuntu  http://ppa.launchpad.net/ubuntu-x-swat/x-updates/ Ubuntu PPA which offers updated X drivers.
Comment 17 Lori Holden 2012-08-01 06:05:56 UTC
Interesting... though I don't see how it could be x drivers that would fix things. I can say though that the status updates seemed to work at *some* point for me... I remember it working quite clearly... but then it stopped working again for some reason and has not worked since. Almost makes me think it's an ACPI issue on the laptop itself... though things do work in windows so I am really not sure.

(Actually it *may* be broken in windows too, but Windows may be seeing the fact it's discharging and updating the icon based on that vs the ac being unplugged)
Comment 18 Mark Syms 2012-08-01 13:45:12 UTC
Just tried booting my NP400X4C from a USB stick built with the Ubuntu Quantal Alpha 3 build and it seems to be somewhat more successful in that the icon states change as the AC adapter is plugged and unplugged. However, acpi -i -b -a reflects the state absolutely fine immediately after applying the AC adapter (i.e. Charging) and then at some point later, fairly quickly - within a minute, the state goes from Charging to Unknown.

I'm not sure if there is a log or something I can monitor to see these state changes and what is triggering them?
Comment 19 Jason Robinson 2012-08-02 06:05:30 UTC
And now broken again for me, as Lori said had happened.

The only package changes I made were these via normal updates:
Start-Date: 2012-08-02  08:10:51
Commandline: apt-get upgrade
Upgrade: libcupsppdc1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsimage2:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsimage2:i386 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupscgi1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsdriver1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), empathy-common:amd64 (3.4.2.1-0ubuntu1, 3.4.2.3-0ubuntu1), cups-client:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), empathy:amd64 (3.4.2.1-0ubuntu1, 3.4.2.3-0ubuntu1), cups-ppdc:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), cups-common:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcups2:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcups2:i386 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), cups:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), cups-bsd:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), libcupsmime1:amd64 (1.5.3-0ubuntu1, 1.5.3-0ubuntu2), nautilus-sendto-empathy:amd64 (3.4.2.1-0ubuntu1, 3.4.2.3-0ubuntu1)
End-Date: 2012-08-02  08:11:11

No boot after upgrades so unlikely to have anything to do with these?

Yesterday I booted several times (have this freezing issue as well on this Samsung), then suspended when leaving from work. Restored today and it didn't work.

Maybe returning from suspend broke something? A full reboot though did not help to resolve situation. AC is plugged in but acpi shows:

jaywink@jaytec: pts/2: 4 files 72Kb -> acpi -a -i -b
Battery 0: Discharging, 27%, 00:33:27 remaining
Battery 0: design capacity 6100 mAh, last full capacity 5600 mAh = 91%
Adapter 0: on-line
Comment 20 Lori Holden 2012-08-03 04:52:36 UTC
Just updated to 2.5.0 and the issue is still there :)
Comment 21 Lori Holden 2012-08-03 04:57:50 UTC
Odd you are having a freezing issue Jason - Mine wakes and sleeps several times day for weeks without issue. I am however running Fedora and not Ubuntu for what that matters.

Is the suspend happening on resume?
Comment 22 Lori Holden 2012-08-03 05:01:37 UTC
Jason: Also very weird on the flickering! Do you have the samsung_laptop module loaded? (lsmod | grep samsung)
Comment 23 Lori Holden 2012-08-03 05:14:28 UTC
Jason, on your flickering issue it looks like there is a ticket in the system for that.

https://bugzilla.kernel.org/show_bug.cgi?id=43031
Comment 24 Jason Robinson 2012-08-03 06:15:42 UTC
Thanks Lori for a note about the bug with flickering. It's interesting that when my screen started flickering, the ACPI issue was fixed - until it broke again. Wondering if these are somehow related.

Also it seems this bug also concerns so far only Samsung users.

My laptop suspends and resumes ok - though it doesn't suspend on closing the lid even though it is not plugged in and power settings have been checked to be correct. Need to find/raise a bug about that too.

The freezing issue happens at random, 0-2 times a day. Suddenly everything just freezes, fan also goes on quite highly. Only way out is holding the power button for a long time. No solution from killing X or trying ctrl-alt-f1 etc.
Comment 25 Lori Holden 2012-08-03 16:38:06 UTC
Jason, 

What version of Ubuntu are you running? I was running 12.04 without crashes for a week or so before going back to Fedora.

Have you run Windows on the laptop for any length of time? I'm wondering if the freezing is being caused by an issue with the laptop itself... bad memory or something?
Comment 26 Marc MAURICE 2012-08-11 17:09:08 UTC
For information, some Ubuntu users reported that upgrading the BIOS fixed the problem:

https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061
Comment 27 Aviram Jenik 2012-08-12 07:36:16 UTC
Marc: Users that installed the BIOS upgrade a month or so ago, all reported the bug 'returning' after a while, myself included. It's unclear what causes the bug to re-manifest itself, but it always does.
Comment 28 Jan 2012-08-26 16:00:05 UTC
Can also verify this on NP530U3C with ubuntu and latest mainline kernel:

Linux notebook 3.5.2-030502-generic #201208151151 SMP Wed Aug 15 15:52:12 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Comment 29 nils 2012-09-01 08:34:58 UTC
I can also confirm this. Tested on 2 samsung 900x3c laptops running ubuntu 12.04 x86_64. upgraded kernel to 3.5.0-13.
Comment 30 j0lly 2012-09-04 21:42:12 UTC
i was creating acpid rules for my archlinux from my new np530uc3 and everything was working great, than acpi started to not log stuff like lid closing, brightness button and others so, is an acpi problem or, et least something related to it.
my config worked for at least a week without any issue...
i'm sure to not have installed any kernel update neither acpi, bios or somethig... so i can't figure out what can be the problem
Comment 31 Lee, Chun-Yi 2012-09-12 03:39:40 UTC
Please try:
echo 0 > /sys/module/battery/parameters/cache_time
Comment 32 Marcus Maertens 2012-09-12 08:28:39 UTC
Just stumbled upon a possible carrot:
http://www.fossfactory.org/project/p330
Comment 33 Jason Robinson 2012-09-14 10:15:25 UTC
(In reply to comment #31)
> Please try:
> echo 0 > /sys/module/battery/parameters/cache_time

Interesting, things change slightly after this:

/*plugged in */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Full, 100%
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: on-line

/*unplugged */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Discharging, 100%, 03:39:19 remaining
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: off-line

I guess that kind of is better. If I change the value back to 1000 which it was before, the following happens:

/*plugged in */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Unknown, 98%
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: on-line

/*unplugged */
jaywink@jaytec: pts/5: 24 files 828Kb -> acpi -abi
Battery 0: Discharging, 98%, 03:46:44 remaining
Battery 0: design capacity 6100 mAh, last full capacity 5900 mAh = 96%
Adapter 0: off-line

Anyway, even with acpi correctly saying whether it is discharding or charging, Ubuntu battery indicator doesn't notice the change. Anyone know how to trigger it manually?
Comment 34 Lee, Chun-Yi 2012-09-14 12:45:06 UTC
That's better also check the daemon on userspace, e.g. upower or HAL.

Different machine might have different situation.
Comment 35 j0lly 2012-09-17 13:57:42 UTC
If use a needle and reset the (battery/bios ?!) from the hole on the bottom of the laptop the acpi events work correctly for few days after falls again in acpi issues.

Have to shutdown and without powercord to perform the reset; have to plug the powercord again or the laptop will not wakeup.
Comment 36 Sebastian Parschauer 2012-10-30 17:01:06 UTC
It seems like everything related to the embedded controller "Device (H_EC)" doesn't work. Power source "ADP1", the battery "BAT1" and "LID0" are connected to it. Bug 45461 is related. LID0 should return the 1-bit value "LIDS" from the Controller registers with method "_LID". This method is described in the ACPI Spec: http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf .

The "_LID" method is only called by the following: "Notify (LID0, 0x80)" and this is part of "Method (_Q5E, 0, NotSerialized)" as well as "Method (_Q5F, 0, NotSerialized)" which are embedded controller queries.

So if nothing queries that embedded controller, then it can't be reported if the lid is closed or not. Perhaps, something has to activate that embedded Controller first.

It is the Intel HM76 chipset.

http://www.intel.com/content/www/us/en/chipsets/performance-chipsets/mobile-chipset-hm76.html

They have an OEM manual where they describe their embedded controller.

http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7-series-chipset-pch-datasheet.pdf

Look at page 197. The EC is connected to the SMBus via "SMLink1".
Comment 37 Sebastian Parschauer 2012-11-28 15:15:37 UTC
This seems to be difficult to fix.

The EC on SMLink1 is connected to "Intel ME SMBus Controller 3" and not to SMBus. This requires the "Advanced TCO Mode" and I guess this also requires the MEI/HECI driver. On Linux this driver is quite new. It needs CONFIG_INTEL_MEI and is located in "drivers/misc/mei/" since kernel 3.5.

ME = Management Engine
MEI = Management Engine Interface
HECI = Host Embedded Controller Interface
Comment 38 Leonid Titov 2013-01-04 17:32:19 UTC
I can confirm this issue for Samsung 535 laptop, which is AMD A6 platform, not Intel chipset.
No reaction for power supply plugging/unplugging.
CPU0: AMD A6-4455M APU with Radeon(tm) HD Graphics    stepping 01
Comment 39 Jason Robinson 2013-01-15 07:18:23 UTC
Bumped my laptop to Ubuntu 3.7.2-raring kernel (http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7.2-raring/) with no change:

/* not plugged in */
$ acpi -abi
Battery 0: Discharging, 67%, 01:36:05 remaining
Battery 0: design capacity 6100 mAh, last full capacity 3700 mAh = 60%
Adapter 0: off-line

/* plugged in */
$ acpi -abi
Battery 0: Discharging, 66%, 09:30:07 remaining
Battery 0: design capacity 6100 mAh, last full capacity 3700 mAh = 60%
Adapter 0: on-line

Laptop again is Samsung 530U3C.
Comment 40 Dmitriy 2013-02-17 08:07:21 UTC
Same problem here. Ubuntu 12.10 all updates installed, 32bit. Notebook Samsung 900X3C. Discharging or Charging always in both cases when AC plugged or not.
Same as in comments 39, 10, 8. Under Win8 all works fine even with no samsung drivers installed.

Some time ago when Ubuntu 12.10 just reliased I do not observe this problem and all works fine too. I think this happens after some update. But I don't know after what.
Comment 41 Lan Tianyu 2013-02-18 07:26:30 UTC
Hi all:
       It there someone that can check whether the problem also exits on the windows? This seems a bios issue. The battery state directly comes from bios.
The /sys/class/power_supply/BAT1/status only presents the value from bios.
Comment 42 Lori Holden 2013-02-18 08:19:25 UTC
Hi Lan,

I can't say for certain,... but for the couple of days I had Windows on the machine before installing Linux I didn't notice the problem. I did put my Windows image back on the machine attempting to troubleshoot things not long after running into the issue and Windows still seemed to act fine. Same again when I put windows on the machine to upgrade the firmware to see if that would address the issue. (Interestingly, it seemed to make the issue go away for a day or so...)

Having said that, it's also very possible that Window's may be handling battery status differently then we do in Linux.

In Linux, we do actually see when the transformer is plugged into the device... we just can't tell if the battery is discharging/recharging. Window's could just be inferring that because the transformer is unplugged, that we are discharging... again with the reverse.

If you could recommend a tool to get real diagnostic information on the battery state in Windows I will happily flash it back to the machine and get some logs.

I think it's pretty easy to just say "the machine / bios is just broke" (Though in all honesty, I'm very willing to believe such)... but the laptop has obviously passed Q/A at Samsung under Windows. Though this could also just mean there was a workaround implemented, or again that Windows is just not as picky about how it determines the battery state.

Is it possible that perhaps the firmware is reacting poorly to something we are doing? I can't help but to also think of this issue: http://www.pcworld.com/article/2027819/not-just-linux-windows-can-brick-samsung-laptops-too.html

Anyhow, please let me know if there is anything I can do to help.
Comment 43 Mark Syms 2013-02-18 08:43:33 UTC
Behaviourally Windows works correctly, I've actually restored the factory load onto my Series 9 as the behaviour of this bug means that Linux is just untenable. By behaviourally I mean that things like power options change correctly when the AC is attached and detached. One of the most impactful implications of this bug was that if the AC was attached after being started on battery the system would still consider itself to be discharging and do things like suspend/hibernate after 10 minutes of inactivity, not what you want if the machine is plugged in a playing music etc. This sort of switch works as expected in Windows. So, it may be that the Windows user-space just looks at different properties and "does the right thing" or it may be that the bug doesn't exist. Either way, the user experience "works" in Windows and doesn't work in Linux and that at the end of the day is the important thing.
Comment 44 Dmitriy 2013-02-19 08:00:52 UTC
I confirm comment #35. Also I think that this is BIOS/Firmware issue on partial Samsung models. But this is not affected by Windows OS (7/8). So I think linux should do the same as Windows.
Comment 45 Lan Tianyu 2013-02-19 08:31:51 UTC
Hi all:
         Can you test whether Bug 45461 is the same as this one? Seem they all happen on Samsung system and related with EC.
Comment 46 Jason Robinson 2013-02-19 08:45:48 UTC
(In reply to comment #45)
> Hi all:
>          Can you test whether Bug 45461 is the same as this one? Seem they
>          all
> happen on Samsung system and related with EC.

Yes can confirm this too, laptop has no idea lid is closed but suspend works very well indeed when triggered via OS.
Comment 47 Lan Tianyu 2013-02-19 08:54:32 UTC
Ok. Mark this dupicate

*** This bug has been marked as a duplicate of bug 45461 ***
Comment 48 Dmitriy 2013-02-19 09:43:05 UTC
I can not confirm this. When I close lid computer suspends well. Heh.

I think this also can be related how ubuntu (linux) have been installed.
My idea is:
1. If ubuntu (linux) installed just right after Win8 -> bugs happen.
2. If installed just right after Win7 -> bugs not happen.
3. If installed just right after Win8 and pressed reset button -> no bugs.

Maybe someone can confirm this...
Comment 49 nils 2013-02-19 10:20:44 UTC
For me this has been fixed on samsung 9 series 900X3C-A01_NL by the 
latest biosupdate provided by samsung (P04AAC). I think Windows doesn't 
suffer from this since it's using WMI.
Comment 50 alex 2013-02-19 11:21:59 UTC
(In reply to comment #41)
>        It there someone that can check whether the problem also exits on the
> windows? This seems a bios issue.
When I plug/unplug AC while in bios setup the laptop changes the brightness, but under Linux it doesn't.

(In reply to comment #35)
> If use a needle and reset the (battery/bios ?!) from the hole on the bottom
> of
> the laptop the acpi events work correctly for few days after falls again in
> acpi issues.
Works for me

(In reply to comment #48)
> I think this also can be related how ubuntu (linux) have been installed.
> My idea is:
> 1. If ubuntu (linux) installed just right after Win8 -> bugs happen.
> 2. If installed just right after Win7 -> bugs not happen.
> 3. If installed just right after Win8 and pressed reset button -> no bugs.
> Maybe someone can confirm this...
I can confirm this:
> 2. If installed just right after Win7 -> bugs not happen.
Comment 51 Lori Holden 2013-02-19 19:28:17 UTC
I am not sure why this would be a duplicate... closing the lid works just fine for me.
Comment 52 Lan Tianyu 2013-02-20 01:23:34 UTC
Oh. Sorry. I am not notice that. Reopen it. Many people suffer the issue on the Samsung machine. But seems the causes are different.
Comment 54 j0lly 2013-02-20 10:32:58 UTC
>
> (In reply to comment #35)
> > If use a needle and reset the (battery/bios ?!) from the hole on the
> bottom of
> > the laptop the acpi events work correctly for few days after falls again
> in
> > acpi issues.
> Does this fix you problem?


This is a temporary workaround that last untill something screw again the
bios // acpi connectivity. In general, the issue rise again after a X
resume (can be the first or after several resumes). This problem is closely
related to the LID issue because the Bios // acpi control also it; in fact
i have also this problem while the Bios is screwed... I'm on a Samsung
np530 ultrabook, but the issue is the same!


2013/2/20 <bugzilla-daemon@bugzilla.kernel.org>

> https://bugzilla.kernel.org/show_bug.cgi?id=44161
>
>
>
>
>
> --- Comment #53 from Lan Tianyu <tianyu.lan@intel.com>  2013-02-20
> 06:16:58 ---
> Hi Lori:
>
> (In reply to comment #35)
> > If use a needle and reset the (battery/bios ?!) from the hole on the
> bottom of
> > the laptop the acpi events work correctly for few days after falls again
> in
> > acpi issues.
> Does this fix you problem?
>
> (In reply to comment #49)
> > For me this has been fixed on samsung 9 series 900X3C-A01_NL by the
> > latest biosupdate provided by samsung (P04AAC). I think Windows doesn't
> > suffer from this since it's using WMI.
> Since there is a newer bios version. Please have a try.
> I check there is no samsung-wmi driver and maybe this  is the cause.
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>
Comment 55 Dmitriy 2013-02-20 17:22:52 UTC
In my Samsung 900X3C I have BIOS ver. P05AAC. After reset button I have NOT observe issue.
I can not test my idea about something happens only usage of Win8. But I think this can be some "interference".
Comment 56 alex 2013-02-20 17:46:39 UTC
(In reply to comment #55)
Same with np530u3b and pre-installed Win7.
Comment 57 Ram Dobson 2013-02-23 19:00:24 UTC
Is there some way to log the signals being sent to the battery? if so, i could reset the battery and mess with it until it breaks (suspend and resume a lot seem to be good starting places) and then mark when it fails. Anybody have any clues how to accomplish this?
Comment 58 alex 2013-03-14 07:59:01 UTC
After installing opensuse 12.3 the solution with needle doesn't work for me anymore both with suse and with linux mint 14.
Comment 59 dutra.julio 2013-04-10 10:07:23 UTC
After amount of time my system ACPI returns to reflect charging/discharging status for the battery correctly . I have a 530u3c with Linux 3.8.6.
Comment 60 j0lly 2013-04-10 10:10:45 UTC
#59
Are you saing you're not experiencing anymore theACPI problems with kernel 3.8.6?
What Os do you have?
Comment 61 dutra.julio 2013-04-10 10:31:03 UTC
My english is not so good. Sorry.

No. The problem persists. I turn on my computer without power cord. Then i plug the power cord and ACPI don't detects them. The power state remains in discharging. I continue working and after certain time the power state is detected as charging.

I have Archlinux.
Comment 62 Andrea Fagiani 2013-04-16 16:53:55 UTC
I have a samsung series 9 np900x3c affected by the same issue.
I've noticed a pattern in this bug's behavior and I would like to know if someone else can confirm the following:

The bug triggers ONLY if a suspend/resume (hibernate/thaw) event happens while the laptop is connected to the charger. I've tested this for about 3 weeks now; I always unplug the charger before suspending (heck, even before shutting down the system, although that's unlikely to cause any harm) and always plug it in (if needed) only after resume.

Can anyone confirm that I haven't just hit a lucky streak?
Comment 63 Jason Robinson 2013-05-09 07:29:06 UTC
Weird, the battery events just started working on my Samsung 530u3c - I didn't do anything really, specifically did not reset the battery via the pin on the bottom.

Maybe some update from the Ubuntu repos?

Kernel 3.8.8-030808-generic
Ubuntu 13.04
Comment 64 Reshad Patuck 2013-05-15 21:03:01 UTC
@Jason Is the battery and power monitor still working for you.

Can you find out what changed around the time of the update(which packages were updated, etc) Ubuntu stores these logs in /var/log/apt and should either be in the files history.log or in one of the history.log.*.tar.gz files.

I'm on Arch using the stock Arch Kernel 3.9.2-1 and both the battery/charger and lid still are not working for me.
Comment 65 Jason Robinson 2013-05-16 08:23:55 UTC
(In reply to comment #64)
> @Jason Is the battery and power monitor still working for you.

Sorry forgot to update here. A few days later it stopped working. I think it started working not due to updates but if I remember correctly around that time my laptop had ran out of battery over the weekend. Maybe it worked like the 'battery pin' temporary fix.

Anyway, didn't last :)
Comment 66 Marnes 2013-05-20 03:12:56 UTC
For me too, battery status, lid and etc do not work as expected. Resetting BIOS solves temporarily.

Samsung 530U
Ubuntu 12.10
kernel package 3.5.0-30-generic
Comment 67 psep 2013-06-02 02:39:39 UTC
(In reply to comment #65)
> (In reply to comment #64)
> > @Jason Is the battery and power monitor still working for you.
> 
> Sorry forgot to update here. A few days later it stopped working. I think it
> started working not due to updates but if I remember correctly around that
> time
> my laptop had ran out of battery over the weekend. Maybe it worked like the
> 'battery pin' temporary fix.
> 
> Anyway, didn't last :)

You forgive me for my terrible English.

I really did something similar with the ultrabook, this was without power for a day and the next I was running the battery widget until yesterday.

I use Debian GNU/Linux sid/experimental.
Kernel 3.8-2
Comment 68 Teppo Turtiainen 2013-06-02 06:43:41 UTC
Has anyone tried contacting Samsung over this? Do we have Samsung kernel developers who could maybe help?
Comment 69 psep 2013-06-02 15:06:19 UTC
(In reply to comment #68)
> Has anyone tried contacting Samsung over this? Do we have Samsung kernel
> developers who could maybe help?

From what I understand samsung developers together with the kernel would be "working" on the problem.
Comment 70 psep 2013-06-07 16:09:17 UTC
I have updated the kernel to 3.10 in Debian GNU/Linux sid/experimental and the problem still persists.
Comment 71 Gabriel 2013-06-08 09:15:55 UTC
I have a Samsung NP530U3C with Gentoo Linux and Kernel 3.8.13 and the problem is still present.
Comment 72 Mark Syms 2013-06-18 09:45:08 UTC
What can we do to get some progress on this issue?

Is there any further information that can be gathered from the kernel or interfaces to improve diagnosis of the rot cause of the problem?

Or this just not going to get any interest from anyone who is in a position to address it?
Comment 73 Jan Hoffmann 2013-08-04 19:57:21 UTC
This problem exists on a NP540U3C, as well.

When the system is in this "broken state" the same problems appear on Windows 8, too. Probably this is because I did a fresh install and only installed the Samsung drivers that were needed when the system was not in the "working state". So, apparently, this bug does happen using the standard Windows ACPI implementation, as well, but on the preinstalled system there must be some Samsung specific driver to "fix" this.

Another thing I noticed: On Windows, the power supply state gets updated when I click on the battery icon.

Now, back to Linux: The problem seems to be that there is no system control interrupt (SCI) when the power supply is plugged in or out. I checked this by looking at /sys/firmware/acpi/interrupts/sci before and after connecting / disconnecting it. That also explains the behaviour on Windows, as it seems to poll the value independently of a SCI when you click on the battery icon

What I don't know, is whether it is possible to cause the system to switch from "working" to "broken state" independently of the operating system. But I can't try that right now, as it would mean using Windows only for some time.
Comment 74 Sebastian Parschauer 2013-08-05 09:08:19 UTC
@Jan: No wonder, it is the same chipset, the "Intel HM76". Have you installed the HECI driver from Samsung to your Windows?
After installing it, it should work if the doc from Intel is correct that the embedded controller and therefore lid, power and battery are controlled via HECI/MEI instead of SMBus.

Can you check your Linux kernel >= 3.5 config for CONFIG_INTEL_MEI and CONFIG_INTEL_MEI_ME please?

Locations:
-> Device Drivers -> Misc devices -> Intel Management Engine Interface
-> Device Drivers -> Misc devices -> ME Enabled Intel Chipsets

You should have a /dev/mei device then. The other driver provides MEI support for the following Intel chipsets:

7 Series Chipset Family
6 Series Chipset Family
5 Series Chipset Family
4 Series Chipset Family
Mobile 4 Series Chipset Family
ICH9
82946GZ/GL
82G35 Express
82Q963/Q965
82P965/G965
Mobile PM965/GM965
Mobile GME965/GLE960
82Q35 Express
82G33/G31/P35/P31 Express
82Q33 Express
82X38/X48 Express

The HM76 belongs to the 7 Series Chipset Family.

There is also kernel documentation: Documentation/misc-devices/mei/
Sending an email to linux-mei@linux.intel.com could also help.

Intel AMT (Active Management Technology) is required in the user-space it seems:

http://software.intel.com/en-us/blogs/2009/08/21/working-with-intel-amt-and-linux

There are "Power Packages" and the LMS (Local Manageability Service). OpenAMT seems to be legacy.

This is the place for latest Open Source Intel AMT downloads:
http://software.intel.com/en-us/articles/download-the-latest-intel-amt-open-source-drivers
http://software.intel.com/en-us/articles/intel-active-management-technology-start-here-guide-intel-amt-9

Please check that out if you get it working with that!
Comment 75 david.shirvanyants 2013-08-30 17:57:01 UTC
The bug is still present in NP900X3E-A02US, BIOS version P07AKB.
Resetting with a pin fixes the problem for the short period, usually until charger is plugged a few times.
Comparing the dmidecode output in the "fixed" and "broken" states shows only one line difference:

"Fixed": 
Wake-up Type: Power Switch

"Broken":
Wake-up Type: Other

My sister has an NP900X3C with windows 7 installed, and that machine has no problems with lid or charger events. I suspect windows samsung drivers somehow reset some internal switch after system wakeup. I could do some tests on windows machine, but I just don't know how to debug system drivers.
Comment 76 Ayberk Özgür 2013-09-08 20:08:48 UTC
Adding 

acpi_osi='!Windows 2012' 

to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub seems to have fixed the problem for me (linked to the solution given in the comments on https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061 ). The fix seems permanent; after about 10 reboots and 15 sleeps, the battery's charge/discharge states are still detected correctly. On top of that, the keyboard backlight started and is still working too :) It adjusts the brightness automatically according to the ambient light. Adjustment from keyboard (Fn+F9, Fn+F10) still has no effect though. 

I am on the following configuration:

-Samsung NP900X4C
-Kubuntu 12.04.2
-Kernel 3.10.4-031004-generic
-BIOS P08ABK
-acpid-2.0.10
Comment 77 boris 2013-09-10 14:31:12 UTC
> acpi_osi='!Windows 2012' to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub
> seems 

> to have fixed the problem

No it doesn't. On my NP900X3C it still does not work. linux-3.4.61.
Comment 78 psep 2013-09-10 14:36:25 UTC
(In reply to boris from comment #77)
> > acpi_osi='!Windows 2012' to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub
> seems 
> 
> > to have fixed the problem
> 
> No it doesn't. On my NP900X3C it still does not work. linux-3.4.61.

A year has passed and still no solution. I preferred ultrabook change.
Comment 79 boris 2013-09-10 14:47:11 UTC
> A year has passed and still no solution. I preferred ultrabook change.

Yep. Right now we are picking notebook vendor for our employes. I'll recommend to buy Lenovo Carbon X1 instead. Samsung has very poor linux support.
Comment 80 Leho Kraav 2013-09-10 14:47:33 UTC
(In reply to psep from comment #78)
> A year has passed and still no solution. I preferred ultrabook change.

This and lack of mini-DP was part of the reason why Samsung got dumped for Dell here.
Comment 81 Gabriel 2013-09-10 16:43:59 UTC
(In reply to Sebastian Riemer from comment #74)
> @Jan: No wonder, it is the same chipset, the "Intel HM76". Have you
> installed the HECI driver from Samsung to your Windows?
> After installing it, it should work if the doc from Intel is correct that
> the embedded controller and therefore lid, power and battery are controlled
> via HECI/MEI instead of SMBus.
> 
> Can you check your Linux kernel >= 3.5 config for CONFIG_INTEL_MEI and
> CONFIG_INTEL_MEI_ME please?
> 
> Locations:
> -> Device Drivers -> Misc devices -> Intel Management Engine Interface
> -> Device Drivers -> Misc devices -> ME Enabled Intel Chipsets
> 
> You should have a /dev/mei device then. The other driver provides MEI
> support for the following Intel chipsets:
> 
> 7 Series Chipset Family
> 6 Series Chipset Family
> 5 Series Chipset Family
> 4 Series Chipset Family
> Mobile 4 Series Chipset Family
> ICH9
> 82946GZ/GL
> 82G35 Express
> 82Q963/Q965
> 82P965/G965
> Mobile PM965/GM965
> Mobile GME965/GLE960
> 82Q35 Express
> 82G33/G31/P35/P31 Express
> 82Q33 Express
> 82X38/X48 Express
> 
> The HM76 belongs to the 7 Series Chipset Family.
> 
> There is also kernel documentation: Documentation/misc-devices/mei/
> Sending an email to linux-mei@linux.intel.com could also help.
> 
> Intel AMT (Active Management Technology) is required in the user-space it
> seems:
> 
> http://software.intel.com/en-us/blogs/2009/08/21/working-with-intel-amt-and-
> linux
> 
> There are "Power Packages" and the LMS (Local Manageability Service).
> OpenAMT seems to be legacy.
> 
> This is the place for latest Open Source Intel AMT downloads:
> http://software.intel.com/en-us/articles/download-the-latest-intel-amt-open-
> source-drivers
> http://software.intel.com/en-us/articles/intel-active-management-technology-
> start-here-guide-intel-amt-9
> 
> Please check that out if you get it working with that!


I've followed your tips and compiled INTEL_MEI into the kernel. The lid event has been working properly ever since. For me the issue seems solved, but I can only speak for myself.
Comment 82 alex 2013-09-16 17:10:54 UTC
https://lkml.org/lkml/2013/6/29/98
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1196155
Since then I have MEI and MEI_ME compiled as modules in kernels 3.10-3.11 and the problem with Samsung still exists.
Comment 83 Ayberk Özgür 2013-09-26 09:31:55 UTC
Ignore my previous comment, after about 15 sleep/resumes, all of the errors came back.
Comment 84 Žygimantas Beručka 2013-09-29 13:01:19 UTC
I started having this on my SAMSUNG NP530U3C-A05EE machine this September (or at least I did not notice it before, although I had been using it with Quantal since June). Decided to upgrade to Saucy, however this did not fix the issue for me. No ACPI event is triggered after connecting/disconnecting the AC cable and/or closing the laptop lid.

Booted into Windows to see if there is a BIOS update avaible, as suggested by someone here, alas nothing has been provided by SAMSUNG so far. (And I do not believe it will be.)
Comment 85 Flavio 2013-09-29 13:10:06 UTC
I have the same problem !

My PC is a samsung serie 5 (np530u3c), I have upgrade the BIOS to latest version (P09ABH).
Comment 86 Žygimantas Beručka 2013-10-07 16:35:02 UTC
(In reply to Andrea Fagiani from comment #62)
> I have a samsung series 9 np900x3c affected by the same issue.
> I've noticed a pattern in this bug's behavior and I would like to know if
> someone else can confirm the following:
> 
> The bug triggers ONLY if a suspend/resume (hibernate/thaw) event happens
> while the laptop is connected to the charger. I've tested this for about 3
> weeks now; I always unplug the charger before suspending (heck, even before
> shutting down the system, although that's unlikely to cause any harm) and
> always plug it in (if needed) only after resume.
> 
> Can anyone confirm that I haven't just hit a lucky streak?

I have been following your advice for the past week and it indeed seems to be the case. If I unplug it before suspending AND resume with it being unplugged (in fact, you can actually plug it in *after* suspending, just be sure to unplug it before resuming), the bug DOES NOT get triggered.

However, I should I forget to unplug it before suspending or resume while it is plugged in, the bug occurs and the only way to get rid of it is to press the reset button underneath the laptop.

In my case it is a SAMSUNG Series 5 Ultra NP530U3C-A05EE machine.
Comment 87 Agustin Barto 2013-10-14 11:02:14 UTC
The same problem affects my Samsung NP550P5C.
Comment 88 Kieran Clancy 2013-10-25 14:50:58 UTC
Same deal on my NP9003XC-K01AU.

My main work-around is to just suspend and resume whenever I plug/unplug so that the state gets updated.
Comment 89 Žygimantas Beručka 2013-10-25 15:13:21 UTC
Kieran, what do you mean? Can you explain the sequence of your actions in more detail?
Comment 90 Kieran Clancy 2013-10-25 15:17:19 UTC
Žygimantas, I have found that the charging state is detected correctly if I suspend and resume.

So, if the state is incorrect (i.e. showing "charging" when unplugged, or vice versa), I press the power button gently to suspend the laptop. Then, one second later, I press the power button again and the charging state will then be correct.
Comment 91 Andrea Fagiani 2013-10-25 15:25:17 UTC
You can also run something like

# /usr/sbin/rtcwake -m standby -s 2

standby is somewhat faster than suspend to ram and using rtcwake wakes up your device automatically in 2 seconds.

Anyway, I can now confirm, several months later, than the issue does not trigger  if you follow my advice from comment 62.
Comment 92 Žygimantas Beručka 2013-10-26 20:06:37 UTC
Kieran, thanks! I will try that out once this bug is triggered. For the time being, I am following Andrea's method of avoiding this.

Andrea, I have been following you advice for nearly a month now and I can only second that. That is clearly a pattern. The bug gets triggered only when I suspend/resume with the charger being connected.
Comment 93 3daniele03 2013-11-04 10:04:06 UTC
(In reply to Žygimantas Beručka from comment #92)
> Kieran, thanks! I will try that out once this bug is triggered. For the time
> being, I am following Andrea's method of avoiding this.
> 
> Andrea, I have been following you advice for nearly a month now and I can
> only second that. That is clearly a pattern. The bug gets triggered only
> when I suspend/resume with the charger being connected.

I've been tried this trick for a couple of week, but today the bug has come back.
Comment 94 3daniele03 2013-11-04 10:13:10 UTC
A stupid question (I admit my ignorance). Why is the status of this bug "NEEDINFO"?
I would be happy to provide more info if needed...

I have a 900X4C with gentoo, kernel 3.10.7-r1 and, as far as I know, the only ways to fully detect the change of power state (including relevant change of /sys/devices/system/cpu/cpu*/cpufreq/bios_limit) are to suspend/resume hibernate/resume or reboot
Comment 95 Sebastian Parschauer 2013-11-04 12:10:18 UTC
Gabriel followed my instructions to setup HECI/MEI and he says that it works, now. What is required, is that more people test that. If enough people can confirm this, then we could talk to distributions to automatically integrate this.

I don't know why Intel doesn't care for this - they should know!

I'm just a kernel developer for storage with a hobby interest in ACPI and reverse engineering. My colleague with this issue has already sold his Samsung laptop again.
E.g. on LinuxTag in Berlin most speakers come with an IBM ThinkPad or a Mac. The ThinkPad guys have connections to ThinkPad ACPI developers from Japan. This is why they have so good Linux support. But in the thinkpad driver you can also see indication for reverse engineering in comments like "unknown magic constant".
Comment 96 Alessandro Crismani 2013-11-04 12:33:30 UTC
Wait,

is having MEI and MEI_ME compiled as kernel modules enough, or do we need them compiled in the kernel (y in the config)?

Second question, do they still break suspend, as other people pointed out earlier?

I have MEI and MEI_ME compiled as modules, and lid detection is broken. I can try to compile them in the kernel, but I need a working suspend!
Comment 97 alex 2013-11-04 12:35:27 UTC
(In reply to Sebastian Riemer from comment #95)

I have MEI and MEI_ME compiled as modules, but the problem still exists. Could you write a more detailed guide on how to compile the kernel with them? I'd really appreciate that.
Comment 98 Sebastian Parschauer 2013-11-04 13:30:40 UTC
Please compile the MEI and MEI_ME drivers into the kernel as Gabriel did (CONFIG_...=y)! This ensures that they are really available during early boot and loaded if the hardware is detected. Required modules are not always integrated into the initrd automatically.

Make sure that Intel AMT services are running in the user-space. Without them the MEI drivers are quite useless.

As I don't have access to such hardware and can only give recommendations, please contact Gabriel for details how he managed it. His email address is available here.
Comment 99 Sebastian Parschauer 2013-11-04 15:18:49 UTC
(In reply to alex from comment #97)
> I have MEI and MEI_ME compiled as modules, but the problem still exists.
> Could you write a more detailed guide on how to compile the kernel with
> them? I'd really appreciate that.

The issues you had in kernels < 3.12 should be fixed in 3.12.
Download the git source with

git clone https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable

The Google mirror is much faster than kernel.org. Then you can look at the MEI commits with:

git log drivers/misc/mei/

Looks like they've introduced a lot of bugs in 3.9 as they've sent many commits to linux-stable 3.9+. Ubuntu is not really known for picking the most stable kernels. So compiling a kernel yourself can help in some cases.

This should be the fix for suspend issues:
mei: me: clear interrupts on the resume path

And these 3.12 commits look interesting:
mei: make me client counters less error prone
mei: bus: stop wait for read during cl state transition
mei: cancel stall timers in mei_reset
Comment 100 Andrea Fagiani 2013-11-16 10:35:14 UTC
@Sebastian Riemer

I can confirm this is fixed on 3.12; I've been running it on ArchLinux since it entered testing a few days ago and I haven't been unable to reproduce the issue!
Comment 101 Marc MAURICE 2013-11-17 10:22:57 UTC
Tested on my Samsung np530u3b with kernel 3.12 : problem still here.
Comment 102 alex 2013-11-17 13:12:22 UTC
(In reply to Marc MAURICE from comment #101)
> Tested on my Samsung np530u3b with kernel 3.12 : problem still here.

Did you compile MEI and MEI_ME drivers and LMS (see Sebastian tips above)?
Comment 103 Marc MAURICE 2013-11-17 14:14:13 UTC
arf no. I just tested with the last ubuntu kernel 3.12.0
it seems to have MEI and MEI_ME, but no LMS.

CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m

I should try compiling LMS when I have some time.
Comment 104 Reshad Patuck 2013-11-17 14:37:52 UTC
(In reply to Andrea Fagiani from comment #100)
> I can confirm this is fixed on 3.12; I've been running it on ArchLinux since
> it entered testing a few days ago and I haven't been unable to reproduce the
> issue!

Did you compile the recompile the kenrnel.

I just upgraded to the stock arch kernel to 3.12 and the bug remains.

Let me know if you did anything differet to make it work.
Comment 105 alex 2013-11-17 14:38:40 UTC
(In reply to Marc MAURICE from comment #103)
Firstly, you need to compile kernel yourself with
CONFIG_INTEL_MEI=y
CONFIG_INTEL_MEI_ME=y
Ubuntu kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ won't work.
Secondly, I faced some problems compiling LMS on Ubuntu. Here are instructions from readme-file and some ubuntu-specific tips:
> To build and install the LMS, call "./configure" with no arguments. Next,
> call "make install". 
If you got an error, add
#include <unistd.h>
header in tools/ATVersion.cpp before building.
> To start the LMS use "service lms start".
> To ensure that the LMS will load upon startup, use "chkconfig --level 35 lms
> on" command to activate the service in specific runlevels.
Yo'll probably need to use "sysv-rc-conf" instead of "chkconfig" and add "2" to levels, since Debian-Ubuntu use second as default the default multi-user mode.
Comment 106 Andrea Fagiani 2013-11-17 15:18:38 UTC
(In reply to Reshad Patuck from comment #104)
> Let me know if you did anything differet to make it work.

Actually no, I haven't done anything besides upgrading to the official Arch 3.12 package, which by the way has the INTEL_MEI stuff compiled as modules, i.e.
CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m

Also, I am *not* running LMS, the kernel update alone seems to be enough to solve the issue on my NP900X3C; I'll report back if the problem comes up again in the following days.
Comment 107 Maurizio D'Addona 2013-11-18 10:39:35 UTC
Hi,

I'm running Archlinux on a Samsung NP530U3C-A03IT, and I can also confirm that upgrading to kernel version 3.12 seems to solve this bug, in fact after few days of suspend/resume and hibernation/resume cycles I can't reproduce this bug anymore.

Like Andrea Fagiani (comment #106), I have the kernel provided form official Archlinux repositories that has Intel MEI built as module (CONFIG_INTEL_MEI=m and CONFIG_INTEL_MEI_ME=m) and no LMS.

Best regards.
Comment 108 alex 2013-11-18 15:36:50 UTC
Well, I can confirm that neither ubuntu nor vanilla kernel 3.12 solves the problem. I followed Sebastian's instructions (thanks a lot!) and only after compilation of BOTH kernel and LMS the problem seems to be gone (although I'm still not sure).
I have NP530U3B running under Ubuntu 13.04.
Comment 109 Sebastian Parschauer 2013-11-18 16:46:19 UTC
I've helped Daniele debugging the issue from LMS to kernel level. And on his laptop (Samsung NP900X4C-A03US) the LMS doesn't work.

syslog entry:
LMS: Cannot connect to Intel AMT via MEI driver

We've debugged it further down and the MEI ioctl "IOCTL_MEI_CONNECT_CLIENT" fails.

$ strace -e ioctl -p `pidof -s lms`
Process 6109 attached
ioctl(5, HIDIOCGRDESCSIZE or HIDIOCGVERSION or SNDRV_HWDEP_IOCTL_INFO,
0x7fff86321af0) = -1 ENODEV (No such device)

The kernel returns -ENODEV in drivers/misc/mei/main.c, function mei_ioctl().

We've brought this issue to Tomas Winkler as he is the driver maintainer:

$ scripts/get_maintainer.pl -f drivers/misc/mei/main.c
Tomas Winkler <tomas.winkler@intel.com> (supporter:INTEL MANAGEMENT...)

He says that the system is not a (by hardware) vPRO (AMT) enabled system. This is why the driver refuses the LMS to connect. There are other possible MEI usages with less functionality. These are displayed under:
<debugfs>/mei/meclients.

So it looks like Samsung has a special MEI client software running on the systems where AMT or LMS can't be used.

Tomas has sent this issue to the Intel support. But no response, yet. IMHO he should know best what's going on with this hardware. :-/

I would look at hardware access in the driver (setting activation bits. etc.). Perhaps it can be solved by custom kernel hacking.
Comment 110 Andrea Fagiani 2013-11-22 10:43:25 UTC
After a few days, the bug has appeared again on my system as well.
I tried using LMS but having a ULV ivy bridge processor (which is not vPro enabled) I encounter the same issue Daniele has.
Comment 111 lops 2013-11-27 13:03:47 UTC
Same Problem as in first post from "Lori Holden" 2012-07-03 and here with 

* Samsung Series 9 np900x3f
* Ubuntu 13.10
* Kernel 3.11.0-13-generic #20-Ubuntu SMP Wed Oct 23 07:38:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


Its remarkable that this problem has been existing for so long...
Comment 112 Leho Kraav 2013-11-27 13:16:56 UTC
(In reply to lops from comment #111)
> 
> Its remarkable that this problem has been existing for so long...

I hate to see all these good people's energy being wasted on Samsung's booboo. Went through similar back in the day with an Acer Travelmate 8172 and Broadcom wifi adapter. Wasted so much effort trying to debug the problem it's insane. At least the resulting life rule "never touch Broadcom again" has served me well from then on.

The solution is to support the right companies and right models, and while trying to not spam this bug, I'd like to influence the gatherers here to do the same. Dell and Lenovo do seem to be on the forefront with Linux so that's what I've chosen for my companies.

Maybe I should note I also purchased 900X3A for myself earlier. You have to hand it to Samsung's design team, the Series 9 is a hit look-and-feel wise. Luckily, 900X3A also worked flawlessly on Linux. But this bug single-handedly reminded me of the lurking Windows-company dangers. So when the battery started fading and I needed a higher-res screen + mini-DisplayPort, I dropped all newer S9-s out of the race. Dell XPS14, while weightier and mid-res screen, so far is doing quite well on all heavy duty Linux stuff I've thrown at it.

That being said, I hope someone gets this thing solved. Hate to see good hardware going to waste.
Comment 113 Agustin Barto 2013-11-27 13:56:03 UTC
I agree 100% that people should avoid Samsung hardware if they're planning on using it with Linux. The response upon support requests for Linux issues has always been "use Windows" and we shouldn't encourage that kind of toxic behavior from a manufacturer.

I have an NP550P5C which is also plagued with ACPI issues, including this one, and doing a little bit of research I found a workaround that works for me:

1) Remove the battery and leave it out for a minute or two (some people mentioned that I could use a reset button that's below the battery but my model doesn't have one)
2) Remove the AC plug before suspending or resuming.

If I do 1) once and keep doing 2), the AC detection seems to work. For some reason, suspending and resuming with the AC plugged seems to cause the issue, and the only way to revert it is doing 1). Booting on Windows does nothing if the issue has already occurred.

I hope this helps.
Comment 114 Robert Moore 2013-11-27 16:51:33 UTC
Please post the acpidump for the original Samsung machine.
Comment 115 3daniele03 2013-11-27 17:11:05 UTC
Created attachment 116461 [details]
acpidump output, as requested by Robert in Comment #114
Comment 116 lops 2013-11-27 17:18:40 UTC
Created attachment 116471 [details]
acpidump for SAMSUNG NP900x3f-k01de
Comment 117 lops 2013-11-27 17:21:56 UTC
Ok, I did it like in post #1:

Samsung Series 9 np900x3f with Ubuntu 13.10

$ uname -a
Linux mars 3.11.0-13-generic #20-Ubuntu SMP Wed Oct 23 07:38:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

(Results of acpi -i -b -a)

1.*** AC IS PLUGGED IN ***
Battery 0: Full, 100%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

2.*** AC REMOVED ***
Battery 0: Charging, 100%,  until charged
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: off-line

3.*** System Suspended and woken back up ***
  *** AC STILL REMOVED ***
Battery 0: Discharging, 100%, 05:56:22 remaining
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: off-line

4. *** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

5. *** SUSPEND AND WAKE ***
   *** AC STILL PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

6. *** AC REMOVED ***
Battery 0: Charging, 98%, 00:06:57 until charged
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: off-line

7. *** AC PLUGGED IN ***
Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line

8. *** AC STILL PLUGGED IN ***
   This time acpi -V

Battery 0: Unknown, 98%
Battery 0: design capacity 5880 mAh, last full capacity 6100 mAh = 100%
Adapter 0: on-line
Thermal 0: ok, 29.8 degrees C
Thermal 0: trip point 0 switches to mode critical at temperature 106.0 degrees C
Thermal 0: trip point 1 switches to mode passive at temperature 95.0 degrees C
Thermal 1: ok, 50.0 degrees C
Thermal 1: trip point 0 switches to mode critical at temperature 106.0 degrees C
Thermal 1: trip point 1 switches to mode active at temperature 87.0 degrees C
Thermal 1: trip point 2 switches to mode active at temperature 55.0 degrees C
Cooling 0: LCD 70 of 100
Cooling 1: pkg-temp-0 no state information available
Cooling 2: intel_powerclamp no state information available
Cooling 3: Processor 0 of 10
Cooling 4: Processor 0 of 10
Cooling 5: Processor 0 of 10
Cooling 6: Processor 0 of 10
Cooling 7: Fan 0 of 1
Cooling 8: Fan 0 of 1
Cooling 9: Fan 0 of 1
Cooling 10: Fan 0 of 1
Cooling 11: Fan 0 of 1

9. *** AC STILL PLUGGED ***

 sudo acpidump >acpidump.dat
 Wrong checksum for FADT!

The dump file i have attached

I hope that is the information you need. If not, tell me plz...

greetings
Comment 118 alex 2013-11-27 18:03:09 UTC
Created attachment 116481 [details]
acpidump output
Comment 119 Andrea Fagiani 2013-11-28 09:34:39 UTC
Created attachment 116521 [details]
acpidump NP900X3C
Comment 120 Maurizio D'Addona 2013-11-28 15:15:12 UTC
Created attachment 116571 [details]
acpidump for Samsung NP530U3C-A03IT
Comment 121 Maurizio D'Addona 2013-11-28 15:18:31 UTC
Please also ignore my previous comment #comment 107 because unfortunately today the bug appeared again and on my machine the intel LMS does not work.
best regards.
Comment 122 david.shirvanyants 2013-12-05 04:06:45 UTC
Recent Samsung laptop Book 9 Plus (NP940X3G-K01US) or recent Linux versions seem to have this bug fixed.
I am running Fedora 20 Beta and do not see any problem with lid detection, etc. I have tried to plug and unplug power in sleeping and active states, with lid open and closed and see no problem.

A side note - reset hole on the bottom seems to be gone, so if acpi/power problems arise again the old fix won't work.
Comment 123 lops 2013-12-18 13:25:32 UTC
(In reply to david.shirvanyants from comment #122)

On my Samsung Series 9 np900x3f I have started the Fedora live Image:

 "Fedora-Live-Desktop-x86_64-20-1.iso" from a USB stick.

The Kernel version was 3.11.10-301.fc20.x86_64. None of the issues seem to be solved.
Comment 124 San 2014-02-10 21:16:34 UTC
Created attachment 125541 [details]
acpidump for NP535U3C-A01PL
Comment 125 Kieran Clancy 2014-02-19 05:14:15 UTC
Possible fix found by juanmanuel at:

https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061

See comment numbers 100 through 103 there.

I can confirm that this fix is working since I tried it 20 minutes ago.

It seems that once a particular buffer fills up, no more events are generated. To fix this, it seems to be necessary to manually poll and clear this buffer so that new events are sent.

I wonder if this can be handled better in the kernel, so that an external program is not needed? Also, rather than blindly ignoring built-up events, should the kernel be processing them in some way?
Comment 126 lops 2014-02-19 09:54:07 UTC
(In reply to Kieran Clancy from comment #125)
> Possible fix found by juanmanuel at:


I can confirm that the fix is working on my Samsung Series 9 np900x3f Ubuntu 13.10...!

LID/BATTERY/AC detection is working now

nice work by juanmanuel
Comment 127 Agustin Barto 2014-02-19 11:35:38 UTC
(In reply to Kieran Clancy from comment #125)
> Possible fix found by juanmanuel at:
> 
> https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061
> 
> See comment numbers 100 through 103 there.
> 
> I can confirm that this fix is working since I tried it 20 minutes ago.
> 
> It seems that once a particular buffer fills up, no more events are
> generated. To fix this, it seems to be necessary to manually poll and clear
> this buffer so that new events are sent.
> 
> I wonder if this can be handled better in the kernel, so that an external
> program is not needed? Also, rather than blindly ignoring built-up events,
> should the kernel be processing them in some way?

It seems to fix the problem on my NP550P5C too.
Comment 128 Berend De Schouwer 2014-02-19 13:11:14 UTC
juanmanuel's workaround WorksForMe(tm)

Samsung NP900X3C
Fedora 20
Kernel 3.12.10-300.fc20.x86_64

Lid open/closed
Power cable plugged/unplugged

A note on deleting the events:

I unplugged/plugged the cable and opened/closed the lid while suspended.  Even though those events are now deleted instead of processed, the status icons in gnome-shell and files in /proc/acpi/ were correct pretty much instantaneous.

systemd uses:

/usr/lib/systemd/system-sleep/samsung.sh
Comment 129 Juan Manuel Cabo 2014-02-19 19:39:12 UTC
Hello! I'm Juan Manuel Cabo
I just registered here to reply to the last post:

> A note on deleting the events:
> I unplugged/plugged the cable and opened/closed the lid while suspended. 
> Even > though those events are now deleted instead of processed, the status
> icons in  > gnome-shell and files in /proc/acpi/ were correct pretty much
> instantaneous.

The reason that they AC plugged in/unplugged status is reported correctly on resume, is that the OS calls the _WAK method of the DSDT on resume, which checks to see if there were any changes in LID/AC/Battery status and generates the corresponding events through Notify(). This is independant of the EC.
    Also the EC events were discarded anyways on resume, since the GPE's are disabled during that critical stage of resuming.


Now let me try to summarize the bug here, which I already explained in 
     https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061
(posts #100 and #102).

The embedded controller accumulates events during sleep. If the events are too many (16 according to my "binary search" by hand), as in too many times plugging/unplugging the PSU, or battery change percentage decreasing or increasing 16 times, or LID open/close, a combination of them, during sleep, then the EC decides to not interrupt the OS with any more events (GPE 0x17) until they are queried.

This situation remains even after rebooting or shutting down, and there are only three ways to unstuck the EC:

     1) Force Query the events (which my little C program does, attached in the launchpad issue). The OS will never query the events by itself because the EC's GPE 0x17 is never produced again.

     2) Or remove the battery (these laptops don't have removable battery), which leaves us with hitting the reset button through the small hole in the back of the ultrabook.

     3) Or flash any bios, which seems to reset the EC.

What my little program does is the following:

    do {
        outb(ACPI_EC_COMMAND_QUERY, EC_COMMAND_PORT);
        status = inb(EC_COMMAND_PORT);
        data = inb(EC_DATA_PORT);
    } while (data != 0 && ++count < 10000);

The eventy type comes back in 'data'. When data==0, no more events are left in the EC. And the EC starts to produce them again, comming back to normal immediately.

("normal" meaning that the laptop starts to produce LID close events and AC and battery change events again).

So far, from the feedback in this and the other thread, the models which have the same problem, that can be fixed by this fix are:

     Samsung Series 5 (models NP530U3C, NP535U3C, NP530U3B, NP550P5C)

     Samsung Series 9 (models NP900X3F, NP900X4B, NP900X4C, NP900X4D, NP900X3C)

I personally have a series 5 NP530U3C. My DSDT is posted in the launchpad issue, in post #101.

My workaround fix is attached in post #103. It is a simple C program which reads the events from the EC until there are no more, so that the EC can start producing GPE 0x17 again.

https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/971061

--
Juan Manuel Cabo
Comment 130 Dennis Jansen 2014-02-19 20:58:41 UTC
I can confirm this. Now we only have to turn this into a generic kernel patch. Does it always make sense to read the EC until it's empty? Could someone confirm this? Maybe this will fix other problems we were not even aware of?
At least it seems the listed Samsung system should get a fix.
Comment 131 Juan Manuel Cabo 2014-02-19 21:42:35 UTC
(In reply to Dennis Jansen from comment #130)
> I can confirm this. Now we only have to turn this into a generic kernel
> patch. Does it always make sense to read the EC until it's empty? Could
> someone confirm this? Maybe this will fix other problems we were not even
> aware of?
> At least it seems the listed Samsung system should get a fix.

A proper fix would be to have some way to tell the EC to not accumulate events during S3 sleep, failing that, running the loop right after suspend to empty those useless events produced during sleep I think is the best option. A good place to do that would be in <linuxkernel>/drivers/acpi/ec.c function acpi_ec_unblock_transactions_early() or in some other function called from sleep.c at resume.

The events accumulated during S3 sleep are useless, so it is does make sense to clear them all. Again, the best fix would be to have a way to tell the EC not to try to report events during sleep.

After all, the DSDT method _WAK() does Notify() whether the batterycharge/AC/LID changed during sleep, and does this independently of the EC.

--
Juan Manuel Cabo
Comment 132 Aaron Lu 2014-02-20 01:13:43 UTC
Very nice analysis and it's pretty cool to know what happened underneath, thanks a lot.

(In reply to Juan Manuel Cabo from comment #131)
> (In reply to Dennis Jansen from comment #130)
> > I can confirm this. Now we only have to turn this into a generic kernel
> > patch. Does it always make sense to read the EC until it's empty? Could
> > someone confirm this? Maybe this will fix other problems we were not even
> > aware of?
> > At least it seems the listed Samsung system should get a fix.
> 
> A proper fix would be to have some way to tell the EC to not accumulate
> events during S3 sleep, failing that, running the loop right after suspend
> to empty those useless events produced during sleep I think is the best
> option. A good place to do that would be in <linuxkernel>/drivers/acpi/ec.c
> function acpi_ec_unblock_transactions_early() or in some other function
> called from sleep.c at resume.

Or add a resume callback for the EC driver and clear all the events gathered during sleep phase there.

> 
> The events accumulated during S3 sleep are useless, so it is does make sense
> to clear them all. Again, the best fix would be to have a way to tell the EC
> not to try to report events during sleep.

I've no idea how to do that, sounds like we need some way of manipulating the EC?

> 
> After all, the DSDT method _WAK() does Notify() whether the
> batterycharge/AC/LID changed during sleep, and does this independently of
> the EC.

Also, the LID/Battery/AC driver will also update the device's status on their resume callback so there doesn't seem to be any meaning to store these events to me.
Comment 133 Lan Tianyu 2014-02-20 02:11:38 UTC
Created attachment 126801 [details]
ec.patch

I write a debug patch to query EC event during system resume. Please have a try.
Comment 134 Kieran Clancy 2014-02-20 03:48:06 UTC
For what it's worth, Windows seems to use the "stop EC from generating events" approach rather than "clear events on wake". So it must be possible.

However, it seems prudent to do BOTH, as there are situations where just doing one still leads to problems.

That is, if we only clear events on wake and don't stop them on sleep, then the event buffer will still fill and potentially interfere with other operating systems (e.g. if something wrong happens during suspend/resume or the battery goes flat and next boot goes into Windows with a full event buffer, Windows doesn't clear the events and has most of the same problems described here).

Alternatively, if we only stop the EC from generating events during sleep, we won't clear the buffer if it has been filled by another operating system or by other yet unforeseen triggers (for example, if the kernel is locked up or indisposed the buffer could fill in the EC without ever going into sleep).

--

For the same reason, any function to clear the events on wake should also run on normal system boot.

--

I think the ideal solution would be to stop the EC from generating events during sleep in addition to clearing the events on wake and on boot (in case the buffer is full from a previous operating system or kernel).
Comment 135 Juan Manuel Cabo 2014-02-20 05:56:53 UTC
(In reply to Lan Tianyu from comment #133)
> Created attachment 126801 [details]
> ec.patch
> 
> I write a debug patch to query EC event during system resume. Please have a
> try.

After looking at the patch, a long recompile (and nice coffee), and testing thrice, I can confirm that this patch does the job perfectly. Thanks Lan Tianyu!!!!! 

One small thing: maybe it'd be reasonable to include one small comment on the source, so that it doesn't get deleted in the future by someone not understanding the issue, such as:

--
    /* This loop consumes events accumulated during sleep by the EC. Some EC's stop reporting GPE events until they are polled, if too many accumulated. (samsung series 5 and 9). Fixes LID close / AC status ignored problem. */
--
(and a 'thank you' note in the patch would not be unwelcome; uncovering this bug took me quite some nights  ;-) )


My laptop is the Samsung Series 5 NP530U3C-A01, I'm on Kubuntu 13.10, Here is exactly what I did:

    1) Disabled my workaround first.

    2) I applied the patch it to the source of ubuntu's linux kernel 3.11.0-14-generic_3.11.0-14.21, which is what I used in my previous tests. Recompiled and installed. Reboot into new kernel. 

    3) Started 'sudo acpi_listen' in a console and kept running.

    4) Suspended the laptop. Unplug/plug/unplug... (more than 8 times).

    5) Resumed. Observed events in acpi_listen, everything normal, all good. Tried unplugging, it was recognized. Run my little program to see if any event was left, printing each output. None really left ('status' didn't have 0x20 bit).

    6) Tried again, but without running my program afterwards to see if any left. All good.

    7) Restarted, tried again (plugging/unplugging like crazy). All good.

This is what I did to test potential solutions before comming up with the workaround. Nothing resisted the tests. This patch did, and it works like a charm. 

More details of my computer:

    Samsung Series 5 NP530U3C-A01

    # dmesg | grep -i dmi
    DMI: SAMSUNG ELECTRONICS CO., LTD. 530U3C/530U4C/SAMSUNG_NP1234567890, BIOS P14AAJ 04/15/2013

    # uname -a
    Linux varinia 3.11.0-14-generic #21 SMP Thu Feb 20 00:49:17 ART 2014 x86_64 x86_64 x86_64 GNU/Linux

    # cat /etc/issue
    Ubuntu 13.10 \n \l


Thank you again Lan Tianyu!!
--
Juan Manuel Cabo
Comment 136 Dennis Jansen 2014-02-22 18:08:08 UTC
Created attachment 127081 [details]
samsung_fix_ec_events patched to show count

I've patched the program to show the count. I often get over 800 EC events even if nothing happens (no AC replugging, no charging). Maybe this is specific to my system. But it seems the issues would appear more than expected. Maybe other people could also use the attached source to find out how many events are in the buffer during normal run time.
Comment 137 Dennis Jansen 2014-02-22 18:09:15 UTC
Also, for the kernel patch I propose these notes:

BUG: https://bugzilla.kernel.org/show_bug.cgi?id=57271
BUG: https://bugzilla.kernel.org/show_bug.cgi?id=44161

Signed-Off-By: Lan Tianyu <tianyu.lan@intel.com<
Signed-Off-By: Juan Manuel Cabo <juanmanuel.cabo@gmail.com>
Tested-By: Juan Manuel Cabo <juanmanuel.cabo@gmail.com>
Tested-By: Dennis Jansen <Dennis.Jansen@web.de>
Comment 138 Juan Manuel Cabo 2014-02-22 18:36:20 UTC
(In reply to Dennis Jansen from comment #136)
> Created attachment 127081 [details]
> samsung_fix_ec_events patched to show count
> 
> I've patched the program to show the count. I often get over 800 EC events
> [..]

They are not 800 events, its just that my little program doesn't sleep between queries, and gets its queries replied with the same event for a little while. I realized this after I first posted it.

Cheers!
--
Juan Manuel Cabo
Comment 139 Dennis Jansen 2014-02-22 18:43:29 UTC
Ah, OK. so even if I get a count of e.g. 3000, this does not mean there is more than one event? I've added a check if the data is the same as before. Let's see what count I get now. Of course if the EC is filled with the same data over and over again, this would remain unnoticed.

Thanks so much for your time spent on this, Juan!! Finally my Notebook is starting to really behave well in Linux.
Comment 140 Juan Manuel Cabo 2014-02-22 20:12:31 UTC
No, adding a check to see if the data is the same as before can be tricky. The best is to add a usleep() or sleep() call, to give time to the EC.

Because sometimes the same event _is_ actually queued many times.

Try a call to sleep(1) inside the loop, and you'll see better.

Anyways, the kernel patch in comment #133 doesn't avoids all this because it uses the actual ec.c functions to query the EC properly (as opposed to my little program which uses a more direct (ahem quick and dirty :-) ) approach).
--
Juan Manuel Cabo
Comment 141 Juan Manuel Cabo 2014-02-22 20:13:10 UTC
s/doesn't avoids/avoids/
Comment 142 Kieran Clancy 2014-02-23 04:27:17 UTC
Created attachment 127121 [details]
acpi_ec_clear function, only run on affected hardware

There is already a function which runs on resume, namely acpi_ec_unblock_transactions(). This patch uses that function rather than setting up a new hook to be run on wake.

To avoid potential side-effects on non-Samsung hardware, I have also added DMI info which is checked along with other quirks when the ACPI bus is initialised. If the system vendor is Samsung, the flag EC_FLAGS_CLEAR_ON_RESUME is set to 1.

When the system resumes and acpi_ec_unblock_transactions() is run, this flag is checked and acpi_ec_clear() is run if necessary.

Finally, acpi_ec_clear() is also run if needed when the EC is finalised in acpi_ec_add(), in case the EC was jammed when the system boots. Otherwise the EC wouldn't be cleared until a manual suspend+resume (which may not happen often, or at all, depending on the user).

I'd also like to reiterate that we should ideally find out how to stop the EC filling during suspend, since this:
a) would prevent problems if resume never happens and another operating system is booted which doesn't clear the EC (e.g. Windows, or a Linux kernel prior to this patch)
b) may reduce power consumption while in sleep - perhaps the EC or other hardware is supposed to be put in a lower power state as part of this process
Comment 143 Kieran Clancy 2014-02-23 06:28:30 UTC
Created attachment 127131 [details]
acpi_ec_clear function, only run on affected hardware (fixed)

Fix some problems with the previous patch. A function was used before its definition.
Comment 144 Kieran Clancy 2014-02-23 06:47:31 UTC
Created attachment 127141 [details]
acpi_ec_clear function, only run on affected hardware

Fixed newline in pr_info()
Comment 145 Juan Manuel Cabo 2014-02-23 08:04:36 UTC
(In reply to Kieran Clancy from comment #144)
> Created attachment 127141 [details]
> acpi_ec_clear function, only run on affected hardware
> 
> Fixed newline in pr_info()


It works!!! I like this patch better!!

I do see the new pr_info message "Clearing state EC events" in dmesg at startup.

The other message, I don't see. Because by the time I enable debug messages
for ec.c, that pr_debug("Detected sys..") message was already in the past.


The events get cleared both on resume and on reboot according to my tests with your patch.

'acpi_listen' shows events happening correctly at all times.

--------
PLEASE: More people should test this kernel patch. There is a very simple guide for compiling your own kernel if you use ubuntu/kubuntu/xubuntu: 

     https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

To apply the patch, just change directory to the place were the kernel got uncompressed, and do: 
     patch < thePachFile.patch
(you might need to point the patch util, if asked interactively, to:   drivers/acpi/ec.c )

One more thing: with kernel 3.11 a line of context is different in the patch than in the ec.c, so I had to apply hunk #6 by hand of Kieran's patch. I guess kieran generated the patch against the latest kernel (3.13?). Since the difference is in an unrelated context line, it is a harmless difference.
-------

Thanks Kieran Clancy!
--
Juan Manuel Cabo
Comment 146 Dennis Jansen 2014-02-23 08:52:51 UTC
Thanks Kieran!

I've tested whether 20 cycles are sufficient on my system and... they are for me. I get to a maximum of 9. 

I've modified the patch by Kieran 
- to change the ec_reset function into a void and 
- to show the number of loops. This way we can do some debugging on that number of 20 loops
- to warn if all loops are used up. On such systems there may be need to increase the number of loops.

I think it might be hard to get the patch in for *all* Samsung systems. We should probably try to collect the concrete systems (535,530,900) and restrict the fix to these systems. My system is
535U3C / NP535U3C-A04DE
(sudo dmidecode | grep Product)

But as there may be more affected systems - also non-Samsung systems, should we try to create a boot parameter which activates the EC clearing as well?

Attaching patch next.
Comment 147 Dennis Jansen 2014-02-23 08:56:28 UTC
Created attachment 127151 [details]
acpi_ec_clear function: void, show loops, warn
Comment 148 Dennis Jansen 2014-02-23 10:01:58 UTC
Ok, this also fixes https://bugzilla.kernel.org/show_bug.cgi?id=45461 for me.
Comment 149 Kieran Clancy 2014-02-23 10:20:12 UTC
Created attachment 127161 [details]
acpi_ec_clear function, only run on affected hardware

Thanks Dennis, some good suggestions there. I've attached a slightly cleaner version of the patch (hopefully).

1. Yes, acpi_ec_clear() should have been void - thanks.
2. I've #define'd a constant ACPI_EC_CLEAR_ITER to store the magical "20".
3. I've incorporated your extra messages

Lastly, I've been thinking about whether this should be more hardware specific. While it would be trivial to instead add a sequence of table entries such as:

        {
        ec_clear_on_resume, "Samsung hardware", {
        DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD."),
        DMI_MATCH(DMI_PRODUCT_NAME, "900X")}, NULL}

I currently feel that just picking up the system vendor is enough, on the basis that:

- this probably affects more Samsung systems than we realise
- it really shouldn't do any harm on systems without this bug
- if you look at the other entries in the table, they turn on quirks for things like "all ASUS systems", so it's not unprecedented

But, if it's knocked back for not being specific enough, here's a suggested list of product name substrings:

530U
535U
540U
550P
900X

(Mine is 900X3F/NP900X3F-K01AU)

As for adding another boot parameter, my personal opinion is that it's not necessary since we have a simple user space program which can be used to check if systems are affected. That is, I can't really think of a situation where the user space program couldn't work just as well as a stop-gap measure until other hardware was added to the table.
Comment 150 San 2014-02-23 10:51:34 UTC
I do not know if this question is right here, but is there anyone who can put patched and compiled kernel somewhere so person who want to test it can just install it? I can put it on my server if there is a problem.

Regards!
and thanks for all who supported this patch.
Comment 151 Dennis Jansen 2014-02-23 11:01:02 UTC
(In reply to Kieran Clancy from comment #149)

Thanks!

> Lastly, I've been thinking about whether this should be more hardware
> specific. While it would be trivial to instead add a sequence of table
> entries such as:
...

> I currently feel that just picking up the system vendor is enough, on the
> basis that:
> 
> - this probably affects more Samsung systems than we realise
> - it really shouldn't do any harm on systems without this bug
> - if you look at the other entries in the table, they turn on quirks for
> things like "all ASUS systems", so it's not unprecedented

Yes, actually personally I think it might even be a good idea to always do this. I'm just not sure if the maintainers will agree.

> 
> As for adding another boot parameter, my personal opinion is that it's not
> necessary since we have a simple user space program which can be used to
> check if systems are affected. That is, I can't really think of a situation
> where the user space program couldn't work just as well as a stop-gap
> measure until other hardware was added to the table.

Well, the issue with the user space program is currently that the EC status and data ports may have to be adjusted. At least for non-Samsung system I believe they would have to be. On the other hand - shouldn't it be possible to detect them automatically? You're right, a user space program might be a better way to test if this fixes any issues on different systems if we can make it work across different systems.

@San: Yes, that sounds like a good idea. We could use kernel.ubuntu.com/~kernel-ppa/mainline/ (and similar Fedora sources) as a basis and then create a .deb/rpm package for easy testing. Though I think there's already quite compelling evidence that this helps all around and does not cause any issues.

I'm still surprised how many things seem to work better now.
Comment 152 Dennis Jansen 2014-02-23 11:05:42 UTC
(In reply to Dennis Jansen from comment #151)
> > As for adding another boot parameter, my personal opinion is that it's not
> > necessary since we have a simple user space program which can be used to
> > check if systems are affected. That is, I can't really think of a situation
> > where the user space program couldn't work just as well as a stop-gap
> > measure until other hardware was added to the table.
> 
> Well, the issue with the user space program is currently that the EC status
> and data ports may have to be adjusted. At least for non-Samsung system I
> believe they would have to be. On the other hand - shouldn't it be possible
> to detect them automatically? You're right, a user space program might be a
> better way to test if this fixes any issues on different systems if we can
> make it work across different systems.

Maybe we can grab the information from dmesg? 
dmesg| grep ACPI | grep GPE
Comment 153 Maurizio D'Addona 2014-02-23 14:38:27 UTC
Hi,
I have tested the mainline kernel (3.14-rc3) with the patch applied and I can confirm that it works on my Samsung NP530U3C-A03IT.
Thank you very much!

@Dennis Jansen (In reply to Dennis Jansen from comment #152)

> Maybe we can grab the information from dmesg? 
> dmesg| grep ACPI | grep GPE

I think a better way to find the EC ports is to read the file /proc/ioports

For example on my system I have:

[daddona@530U3C ~]$ cat /proc/ioports | grep EC
  0062-0062 : EC data
  0066-0066 : EC cmd
Comment 154 Juan Manuel Cabo 2014-02-23 19:17:39 UTC
Hello again!
I updated the little program to read ports from /proc/ioports.

You can find it in:   
     http://zenstep.com.ar/samsung-linux

or in post #14 here: 
     https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1283589

or in post #158 here:
     https://bugs.launchpad.net/ubuntu/+source/linux/+bug/971061

the little program is also now GPL, and it also now micro-pauses between queries so that the EC tends to return each event only once.

Cheers!
--
Juan Manuel Cabo
Comment 155 San 2014-02-23 19:26:15 UTC
Works fine on Samsung NP535U3C-A01PL.

Here you can download deb packages:
http://kernel.zabałaganionemiejsce.pl/
or http://kernel.xn--zabaaganionemiejsce-8fd.pl

You can find patch and modified ec.c source.

Build made by this steps:
1. git clone git://kernel.ubuntu.com/ubuntu/ubuntu-saucy.git
2. apply patch from comment #149
3. apply manual what does not apply automagically
4. fakeroot debian/rules clean && fakeroot debian/rules binary-headers binary-generic

Nothing modified with config.
Some package did not install (i do not remember which one...).

san@sammie:~ > uname -a
Linux sammie 3.11.0-18-generic #32 SMP Sun Feb 23 13:43:26 CET 2014 x86_64 x86_64 x86_64 GNU/Linux

san@sammie:~ > cat /proc/ioports | grep EC
  0062-0062 : EC data
  0066-0066 : EC cmd

I use it for an half hour and works fine - it seems.

Regards!
Comment 156 cyclingsir 2014-02-24 12:37:05 UTC
Applied the patch from comment #149 to the most current Kernel available on openSUSE (12.3): linux-3.13.4-1.g6eda950-desktop

The patch applied smoothly without any hickups.

dmidecode | grep Product
        Product Name: 900X3C/900X3D/900X4C/900X4D
        Product Name: SAMSUNG_NP1234567890

On the casing it says NP900X3C-A07DE

Plugging in and unplugging power is recognized now, so is lid close.

Thanks ever so much to all who invested so much work and time to tackle this bug!

I just hope this patch makes it into the main kernel soon!
Comment 157 San 2014-02-24 16:35:44 UTC
cat /proc/acpi/button/lid/LID0/state 
is always open on Samsund NP535U3C with AMD A4.
Comment 158 Dennis Jansen 2014-02-24 17:51:24 UTC
@San: Have you tried keeping it closed for 10-20 seconds? In my case it simply took a long time. Otherwise you can try to reset the computer with the reset button on the bottom side of your notebook. It should keep working afterwards. If not, let us know.
I have an NP535U3C and it works for me. But it takes quite some time to be recognized.
Comment 159 Joseph Salisbury 2014-02-24 17:56:33 UTC
@Kieran Clancy, do you plan on submitting your patch for inclusion in the mainline kernel?
Comment 160 Kieran Clancy 2014-02-24 18:56:32 UTC
@Joseph Salisbury

At this stage it's only been tested on a few machines. I'd like to give it a few more days, especially to see what kind of numbers come back from the "... EC events polled" messages.

If all goes well I plan to submit the patch.
Comment 161 San 2014-02-24 20:04:10 UTC
san@sammie:~/Dokumenty > cat lid 
         #!/bin/bash 
         COUNTER=0
         while [  $COUNTER -lt 30 ]; do
             echo About $COUNTER s `cat /proc/acpi/button/lid/LID/state`
             let COUNTER=COUNTER+1 
                sleep 1
         done


Agree. After about 11 sec it says closed. Many thanks and best regards.
Comment 162 Juan Manuel Cabo 2014-02-24 21:02:22 UTC
(In reply to Kieran Clancy from comment #160)
> @Joseph Salisbury
> 
> At this stage it's only been tested on a few machines. I'd like to give it a
> few more days, especially to see what kind of numbers come back from the
> "... EC events polled" messages.
> 
> If all goes well I plan to submit the patch.

----------------
Testing results with San build of Kieran patch (comment #149 and comment #155)
----------------

*) Boot with an old kernel. Suspend. Unplug/plug/....(more than 8 times). Resume. Issue shows. Install new kernel (download, sudo dpkg -i *.deb). Reboot. Result in /var/log/syslog and dmesg: 
     Clearing stale EC events
     8 EC events polled.
(as previously tested, Kieran patch continues to poll at boot. Good!)
Everything returned to normal, issue resolved.

*) Suspend with lid. Open lid while suspended (1 action). 
Result: "1 EC events polled.". 
I guess it was "lid open event". Will test without the lid so I don't have to subtract its lid open event.

*) Suspend through menu (instead of lid). Resume without doing nothing. 
Result: "0 EC events polled."

*) Suspend through menu. Unplug/plug/unplug/plug (4 actions). 
Result: "4 EC events polled."

*) Suspend through menu. Unplug/plug... (6 actions). 
Result: "6 EC events polled."

*) Suspend through menu. Unplug/plug... (18 actions). 
Result: "8 EC events polled."


CONCLUSIONS: 

A) The patch keeps working perfectly, resolving all situations.

B) My Series 5 NP530U3C accumulates up to 8 events. Not more as I previously thought.

C) I think 20 is a good number for ACPI_EC_CLEAR_ITER.


So I was wrong about the 16 events number. It is actually definitely 8. I previously assumed it was double, because when unplugging with 'acpi_listen' open, one can see both ADP1 and BAT1 devices being triggered (the CPU devices are notified from the DSDT when the AC _Q51 or _Q52 methods are called, so they don't count). So I assumed that two events were generated per plug. But it seems not, or at least not during sleep.
    When I discovered that 8 unplug or plug actions were necessary to trigger the problem state, I assumed that each accumulated two events, so I thought 16 total. I was wrong there. Definitely it's 8 max accumulated events.

Cheers!
--
Juan Manuel Cabo
Comment 163 Dennis Jansen 2014-02-25 14:50:04 UTC
(In reply to Juan Manuel Cabo from comment #154)
> Hello again!
> I updated the little program to read ports from /proc/ioports.
> 
> You can find it in:   
>      http://zenstep.com.ar/samsung-linux

Great! And again: thanks so much! :)
Comment 164 toby 2014-02-26 05:19:07 UTC
thanks soo much! been annoying as hell this bug. 
can confirm this worked for me too with no issues (Model: NP900X3C)
Comment 165 Juan Manuel Cabo 2014-03-01 14:17:44 UTC
Breaking News!

A fix in the form of a kernel patch has now been posted to the linux-acpi and linux-kernel mailing list:

        "[PATCH v2] ACPI / EC: Clear stale EC events on Samsung systems"
        http://marc.info/?l=linux-acpi&m=139359680828880&w=2

That kernel patch was made by Kieran Clancy and tested by me and others.

--
Juan Manuel Cabo
http://zenstep.com.ar/samsung-linux
Comment 166 Kieran Clancy 2014-03-02 10:48:09 UTC
The patch has been accepted into Rafael J. Wysocki's linux-pm development tree:
https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?id=27095111cbafd3212c7e9a4a8cef1099b7520ca8

From there it will hopefully land in the 3.15 stable release.
Comment 167 Joseph Salisbury 2014-03-13 16:04:11 UTC
Thanks so much for creating that patch, Kieran!  Would it be possible for you to also request that the patch be submitted for inclusion in upstream stable?
Comment 168 Agustin Barto 2014-03-13 16:07:26 UTC
For Fedora 20 users, the patch was merged into the latest kernel package. I would also like to thank everyone involved in fixing this bug.
Comment 169 Jacco 2014-03-26 19:14:32 UTC
Cool, I already got used to manually suspending my machine. Can't wait for it to make it into the Kernel (using Linux Mint).

So far we are stuck at 3.13...

Now the question remains, can we install using EFI or still legacy BIOS
Comment 170 Geert Janssens 2014-03-26 21:03:44 UTC
(In reply to Jacco from comment #169)
> Cool, I already got used to manually suspending my machine. Can't wait for
> it to make it into the Kernel (using Linux Mint).
> 
> So far we are stuck at 3.13...
> 
Fedora has ported this patch to their 3.13.6 kernel. So with Fedora 20 I don't have this issue anymore. Perhaps Linux Mint has a backport policy as well ?

> Now the question remains, can we install using EFI or still legacy BIOS

Indeed...
Comment 171 toby 2014-03-27 02:55:19 UTC
does anybody know if/when Ubuntu is trying to build in this patch? just wondering when I'll be able to run all the updates again without having to uncheck the kernel every time ;)
Comment 172 Juan Manuel Cabo 2014-03-27 03:06:17 UTC
(In reply to toby from comment #171)
> does anybody know if/when Ubuntu is trying to build in this patch? just
> wondering when I'll be able to run all the updates again without having to
> uncheck the kernel every time ;)

Hi!!! 

The patch has been merged recently to the stable kernel 3.13 in version 3.13.7. (look in http://kernel.org in the changelog of 3.13).

To my knowledge, its too fresh to be in regular ubuntu repositories yet, but you can get the most recent kernel for ubuntu/kubuntu/xubuntu from here:

   http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.7-trusty/

and it includes the patch. Just download the "generic" packages from there, and install with "sudo dpkg -i *.deb".

Cheers!
--
Juan Manuel Cabo
Comment 173 Kieran Clancy 2014-04-01 09:34:30 UTC
Created attachment 131151 [details]
process rather than discard events in acpi_ec_clear

Hi all,

If you have a machine that was (previously) affected by this bug, could you please take the time to try a new patch for this issue?

A user with an older Samsung machine found that the earlier patch introduced some issues for them, so this is a new patch designed to be a better solution to the problem.

Currently the new patch has only been tested on 2 affected machines (to my knowledge), so I would be very grateful if anyone else can test this patch (especially users with Samsung series 5 or 7 machines).

You will need to use a recent kernel source like 3. (one which includes the earlier acpi_ec_clear patch), or otherwise apply the previous patch first:

https://github.com/torvalds/linux/commit/ad332c8a45330d170bb38b95209de449b31cd1b4.patch

Then apply the attachment ec_process_stale_events.patch 

Please test to see if your lid closing / suspending / etc. still work as intended with this patch.

This new patch also provides some debug information after every resume/boot which will show up in dmesg something like:

[ 1192.208056] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.217844] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.218844] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.222835] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.223834] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.224833] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.227833] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.228834] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.229832] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1192.230828] ACPI : EC: 8 stale EC events cleared

Please report if you get any EC_SC bytes here _other than_ 0x28. I only get 0x28 on my machine but I am very interested to know if other affected machines produce something else.

If you test it, please also report your system's product code, e.g.:

# dmidecode -s system-product-name
900X3F
# dmidecode -s baseboard-product-name
NP900X3F-K01AU

Many thanks,
Kieran
Comment 174 Kieran Clancy 2014-04-01 09:52:27 UTC
Meant to say "You will need to use a recent kernel source like 3.14". Sorry for the extra bug email.
Comment 175 Juan Manuel Cabo 2014-04-01 12:26:44 UTC
(In reply to Kieran Clancy from comment #173)
> Created attachment 131151 [details]
> process rather than discard events in acpi_ec_clear

> [..] could you
> please take the time to try a new patch for this issue?
 
Will test tonight
Cheers!
--
Juan Manuel Cabo
Comment 176 Jacco 2014-04-01 20:49:30 UTC
NP900X3E here, upgraded to kernel 3.13.7, works like a charm. Had to re-initiate the trackpad after reboot, but all else is fine.
Comment 177 Maurizio D'Addona 2014-04-02 08:54:08 UTC
(In reply to Kieran Clancy from comment #173)

Hi, 

I have just built the mainline kernel 3.14 with the new patch applied, and it works perfectly.

# dmidecode -s system-product-name
530U3C/530U4C

# dmidecode -s baseboard-product-name
530U3C/530U4C

# dmesg | grep "ACPI : EC"
[    0.288950] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[    0.288954] ACPI : EC: acpi_ec_clear: EC_SC 0x00
[    0.290821] ACPI : EC: 0 stale EC events cleared
[   30.575003] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.577929] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.581249] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.584568] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.587893] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.591236] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.594559] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.597890] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.601229] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[   30.604569] ACPI : EC: 8 stale EC events cleared
Comment 178 Juan Manuel Cabo 2014-04-03 14:39:19 UTC
Hello!!

I tested last night. My computer is Samsung Series 5 NP530U3C:

    # dmidecode -s system-product-name
    530U3C/530U4C
    # dmidecode -s baseboard-product-name
    SAMSUNG_NP1234567890

for some reason, my baseboard-product-name seems generic.

On boot I got:

    [    0.226027] ACPI : EC: acpi_ec_clear: EC_SC 0x00
    [    0.226299] ACPI : EC: 0 stale EC events cleared

Then suspended, unplugged/replugged more than 10 times, and I got:

    [  101.449890] CPU3 is up
    [  101.454065] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.459620] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.463622] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.467604] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.471603] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.475605] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.479608] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.483609] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.487610] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  101.491610] ACPI : EC: 8 stale EC events cleared
    [  101.491899] ACPI: Waking up from system sleep state S3

Then suspended again, just closed the lid and didn't unplug or replug:

    [  225.274756] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  225.276221] ACPI : EC: acpi_ec_clear: EC_SC 0x28
    [  225.280200] ACPI : EC: 1 stale EC events cleared
    [  225.280478] ACPI: Waking up from system sleep state S3

at that time, after resume, "acpi_listen" showed the following:

    button/lid LID close
    button/lid LID open

Kernel is 3.13.8. I got the .tar.xz from www.kernel.org, applied the patch  that you attached in your comment, and compiled with:

     make menuconfig
     make
     make modules_install install


Everything is working fine as expected on my machine.

Cheers!
--
Juan Manuel Cabo
Comment 179 Giannis Koutsou 2014-04-03 22:09:56 UTC
(In reply to Kieran Clancy from comment #173)

Hi,

patched kernel 3.13.8 on Arch Linux using attachement from comment #173

So far everything is working perfectly; plug, unplug and lid close events are all being detected. I'm on a Series 9:

# dmidecode -s baseboard-product-name
NP900X3C-A05UK

# dmidecode -s system-product-name
900X3C/900X3D/900X3E/900X4C/900X4D

Up for almost 24hrs, suspended/resumed some five times or so:

[    0.707289] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[    0.707292] ACPI : EC: acpi_ec_clear: EC_SC 0x00
[    0.707534] ACPI : EC: 0 stale EC events cleared
[  234.145738] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[  234.148802] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[  234.152118] ACPI : EC: 1 stale EC events cleared
[ 1144.739072] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.742253] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.745590] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.748935] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[ 1144.752238] ACPI : EC: 3 stale EC events cleared
[39577.534490] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.537478] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.540795] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.544126] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.547465] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[39577.550821] ACPI : EC: 4 stale EC events cleared
[47768.342146] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[47768.357869] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[47768.361202] ACPI : EC: 1 stale EC events cleared
[48527.104686] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[48527.105043] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[48527.108386] ACPI : EC: 1 stale EC events cleared

Cheers, and thanks a lot for the fix!
Comment 180 Thomas Bächler 2014-04-04 08:42:00 UTC
Is the patch from comment #173 available in any git tree yet?
Comment 181 Michelle M. Ludvigsen 2014-04-21 16:12:54 UTC
Just wanted to report working acpi events (FINALLY!).
Plug/unplug power, suspend on lid-close etc.

Running Gentoo and just installed kernel from gentoo-sources 3.14.1

# dmidecode -s baseboard-product-name
NP900X3E-A02SE

# dmidecode -s system-product-name   
900X3C/900X3D/900X3E/900X4C/900X4D
Comment 182 Pablo Caron 2014-04-30 17:25:06 UTC
I just downloaded kernel 3.14.2 form kernel.org and patched it. I compiled and installed it. Everything is working as expected, plug/unplug power-cord is detected by KDE, and the system goes to suspend when I close the lid.

I'm running OpenSUSE 13.1 x86_64 and KDE 4.13.0.

# dmidecode -s baseboard-product-name
SAMSUNG_NP1234567890

# dmidecode -s system-product-name
530U3C/530U4C

# dmesg | grep "ACPI : EC"
[    0.163833] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[    0.163837] ACPI : EC: acpi_ec_clear: EC_SC 0x00
[    0.164158] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.164536] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.164848] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.165164] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.165540] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.165922] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.166236] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.166549] ACPI : EC: acpi_ec_clear: EC_SC 0x08
[    0.166862] ACPI : EC: 8 stale EC events cleared
[  546.002726] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[  546.003455] ACPI : EC: acpi_ec_clear: EC_SC 0x28
[  546.004445] ACPI : EC: 1 stale EC events cleared

Thank you for fixing it!!!!!

Cheers, Pablo
Comment 183 Kieran Clancy 2014-04-30 17:39:57 UTC
Thanks to everyone who tested patches!

A second patch has just been accepted to the kernel to fix a regression on earlier Samsung machines N150/N210/N220.

Thanks to those who tested to make sure the new patch didn't break anything on Samsung series 5/7/9.

The EC_SC data which you've provided should be of some help if future patches are required.
Comment 184 Andrea Fagiani 2014-10-04 10:20:55 UTC
On my NP900X3C, since kernel 3.16.3 (all the way up to 3.17rc7), I've been experiencing this bug again.

I've narrowed it down to the following commits:

"ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set"
https://github.com/torvalds/linux/commit/3afcf2ece453e1a8c2c6de19cdf06da3772a1b08

"ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"
https://github.com/torvalds/linux/commit/558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4

On 3.17rc7, reverting the latter seems to solve the issue for acpi events related to the power supply, however LID switch is still broken. Reverting both seems to resolve the issue completely.

Anyone else experiencing similar behavior?
Comment 185 Kieran Clancy 2014-10-04 10:52:39 UTC
@Andrea

I have the same problem on my NP900X3C. I noticed it a week ago and started drafting a regression notice for lkml but things got busy and I forgot about it until now.

I'll post the notice and hopefully the author of those two patches will be able to offer some help, since he knows much more about handling the EC than I do.
Comment 186 parchd 2014-10-04 12:26:45 UTC
Since updating to kernel 3.16.3, same problem has arisen on a Samsung NP900X4D.
Comment 187 Žygimantas Beručka 2014-10-05 13:08:13 UTC
I can confirm that 3.17rc7 (earlier RCs too) on Samsung NP530U3C-A05EE exhibits the same bug. That is, it does not detect charging/discharging state changes and does not suspend on closing the laptop lid.
Comment 188 San 2014-10-06 06:27:49 UTC
Since i turn off suspend on lid close i do not know if the bug of lid persist (my LVDS does not wake up after return from suspend.

I belive changing form ac to dc and backwards does work but with huge delay (few minutes.

Just to recall: NP535U3C-A01PL.
Comment 189 Bjørn Snoen 2014-10-06 12:57:26 UTC
@Andrea
Both of those also affect the Samsung NP900X3E model as well. Started noticing the problem about a week ago, running on Arch Linux, but guessing from the build time of the kernel it's been there a bit longer than that.

With charger plugged in:
acpi -i -b -a
Battery 0: Charging, 92%, 00:08:30 until charged
Battery 0: design capacity 5880 mAh, last full capacity 5500 mAh = 93%
Adapter 0: on-line

After disconnecting charger:
acpi -i -b -a
Battery 0: Charging, 88%, 00:24:19 until charged
Battery 0: design capacity 5880 mAh, last full capacity 5500 mAh = 93%
Adapter 0: off-line

uname -a
Linux Geneva 3.16.3-1-ARCH #1 SMP PREEMPT Wed Sep 17 21:54:13 CEST 2014 x86_64 GNU/Linux
Comment 190 San 2014-10-07 17:32:44 UTC
No longer updates for me.

uname -a
Linux sammie 3.17.0-031700rc7-generic #201409281835 SMP Sun Sep 28 22:36:30 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Comment 191 Ortwin Glück 2014-10-25 11:11:02 UTC
(In reply to Andrea Fagiani from comment #184)
> Anyone else experiencing similar behavior?

Same here on 900X3D. Reverting the two patches fixes it.

I think you should open a new issue instead of commenting on this closed one, as it is a regression.
Comment 192 parchd 2014-10-25 14:32:40 UTC
@Ortwin, if you open a new issue please link to it from here so we can subscribe.
Comment 193 Ortwin Glück 2014-10-26 12:21:31 UTC
For now I have emailed the author and ACPI list:

http://thread.gmane.org/gmane.linux.kernel/1813529
Comment 194 Aaron Lu 2014-10-27 02:12:44 UTC
Assign to Lv.
Comment 195 Lv Zheng 2014-10-27 03:22:44 UTC
(In reply to Ortwin Glück from comment #193)
> For now I have emailed the author and ACPI list:
> 
> http://thread.gmane.org/gmane.linux.kernel/1813529

Thanks for the report.
I'll post permanent fixes here for Samsung laptops.
IRQs need to be both "polled" and "handled using interrupts" for a driver that tries to support hardware/firmware variants.

For me, it looks like that we need to implement "EC GPE polling" in the driver.
Currently we have "EC GPE polling" implemented for OBF/IBF interrupts.
But for SCI_EVT, we don't have such a polling mode.

I can see that a poller in the userspace can fix this.
So what I need to do is to implement the threaded IRQ (IRQ is polled in the thread and is also reported through interrupts) for EC GPEs in the kernel.

The patches are already ready as I mentioned in the mailing list.
I'll post them for you.
But please wait for a while because I need to do a rebase and basic unit test locally before posting.
Comment 196 Kieran Clancy 2014-10-27 06:48:19 UTC
Note that I also reported this on the mailing lists 4 weeks ago, not sure why that was missed.

Lv, I disagree that we need to implement GPE polling in a new kernel thread. It was working perfectly without that before the two recent patches. It simply needs to be cleared on boot or resume from suspend. After that, the interrupts work fine as intended.
Comment 197 Lv Zheng 2014-10-27 14:14:49 UTC
Created attachment 155351 [details]
[PATCH] ACPI/EC: Fix a regression caused by conflict firmware behavior between Samsung and Acer.

If Samsung firmware doesn't require driver to issue QR_EC until 0 (no event) is returned or Samsung firmware doesn't require driver to timely issue QR_EC to poll events, then this fix is sufficient.
Comment 198 Lv Zheng 2014-10-27 14:26:22 UTC
This patch is generated on top of the following git branch:
repository: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
branch: linux-next

Please first help to check if only this patch (attachment 155351 [details]) can quickly fix the regression.

If this is not sufficient, we will need to implement a QR_EC polling thread to timely poll events or poll events until firmware returning 0 (no event).
I'll post the QR_EC polling thread patches after this comment.
Comment 199 Lv Zheng 2014-10-27 14:29:20 UTC
Created attachment 155371 [details]
[PATCH] ACPI/EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag.

My SCI_EVT polling thread is implemented on top of previous patch and a flushing support.
This is the 1st patch to support flushing.
Comment 200 Lv Zheng 2014-10-27 14:39:38 UTC
Created attachment 155381 [details]
[PATCH] ACPI/EC: Enhance the checks to apply to QR_EC transactions.

The 2nd patch for flushing support.
Comment 201 Lv Zheng 2014-10-27 14:40:50 UTC
Created attachment 155391 [details]
[PATCH] ACPI/EC: Add reference counting for query handlers.

The 3rd patch for flushing support.
Comment 202 Lv Zheng 2014-10-27 14:41:53 UTC
Created attachment 155401 [details]
[PATCH] ACPI/EC: Add reference counting for query handlers.

The flushing support.
Comment 203 Lv Zheng 2014-10-27 14:44:23 UTC
Created attachment 155411 [details]
[PATCH] ACPI/EC: Add command flushing support.

The flushing support.
Comment 204 Lv Zheng 2014-10-27 14:46:18 UTC
Created attachment 155421 [details]
[PATCH] ACPI/EC: Cleanup QR_SC command processing by adding a kernel thread to poll EC events.

The polling event thread support.
Comment 205 Lv Zheng 2014-10-27 14:49:39 UTC
Created attachment 155431 [details]
[PATCH] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.

First let's try to use timely polling instead of clear_on_resume for Samsung laptops.
The other option is to let the polling thread to poll events until the firmware returning 0 (no event).

I'll add test requirement after this post.
Comment 206 Lv Zheng 2014-10-27 14:55:00 UTC
The test requests are:

1. Trying the regression fix as mentioned in comment 198.
2. If this cannot fix everything, please try:
   Using the same git branch and apply the following patches orderly:
   attachment 155351 [details]
   attachment 155371 [details]
   attachment 155381 [details]
   attachment 155391 [details]
   attachment 155411 [details]
   attachment 155421 [details]
   attachment 155431 [details]
   And try again. Since we have kernel polling thread implemented, the old clear_on_resume quirk code is deleted.
Comment 207 Ortwin Glück 2014-10-28 09:20:09 UTC
Somehow during my tests the power plug detection started working again with any kernel. I don't know what caused it. But the LID switch is fixed only with the complete patch series.

Test results:

vanilla 3.18-rc2:
 - LID switch not detected
 - power plug sometimes stops working

vanilla linux-pm/linux-next:
 - same as 3.18-rc2

vanilla linux-pm/linux-next + ec-flush1.patch:
 - same as 3.18-rc2

vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch:
 - LID switch works
 - power plug works
Comment 208 Lv Zheng 2014-10-28 14:00:23 UTC
Hi, Kieran

(In reply to Kieran Clancy from comment #196)
> Note that I also reported this on the mailing lists 4 weeks ago, not sure
> why that was missed.
> 
> Lv, I disagree that we need to implement GPE polling in a new kernel thread.
> It was working perfectly without that before the two recent patches. It
> simply needs to be cleared on boot or resume from suspend. After that, the
> interrupts work fine as intended.

Threaded IRQ is very common in Linux, please see kernel/irq/manage.c.
Software has reasons to do so.
IMO, Windows might already handle EC events in both "polling" and "interrupt" modes, so it doesn't suffer from such firmware variants.
If we switched our driver into this mode, then we needn't do additional "drain" after resuming.

OTOH, ec-flush1.patch ensures that the following commit won't affect Samsung laptops.
"ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set"
https://github.com/torvalds/linux/commit/3afcf2ece453e1a8c2c6de19cdf06da3772a1b08

If ec-flush1.patch cannot fix everything, we should check what the following commit brings to us:
"ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"
https://github.com/torvalds/linux/commit/558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4
This commit only ensures there's only one query running. If it must be reverted, then it means Samsung laptops need multiple queries to be issued in an out of control way - which sounds like a "happen to work" environment.
We shouldn't go back to a "happen to work" environment where bugs could be hidden by the hidden logics.
Comment 209 Lv Zheng 2014-10-28 14:01:15 UTC
(In reply to Ortwin Glück from comment #207)
> Somehow during my tests the power plug detection started working again with
> any kernel. I don't know what caused it. But the LID switch is fixed only
> with the complete patch series.
> 
> Test results:
> 
> vanilla 3.18-rc2:
>  - LID switch not detected
>  - power plug sometimes stops working
> 
> vanilla linux-pm/linux-next:
>  - same as 3.18-rc2
> 
> vanilla linux-pm/linux-next + ec-flush1.patch:
>  - same as 3.18-rc2
> 
> vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch:
>  - LID switch works
>  - power plug works

Maybe we need more test results, also test results after a suspend/resume...
Comment 210 Lv Zheng 2014-10-29 02:44:25 UTC
Hi, Kieran and Ortwin

The 2nd commit need to be reverted means Samsung firmware behaves like:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT clear
Then without commit 2, Samsung can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
                           can prepare next QR_EC because SCI_EVT is set
                           event returned
[ -event 1 ] SCI_EVT clear
                           QR_EC
                           event returned
[ -event 2 ] SCI_EVT clear
And with commit 2, Samsung cannot work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT clear
                           cannot prepare next QR_EC because SCI_EVT is cleared
The Samsung specific behavior conflicts with Acer one:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT clear
[ +event 2 ] SCI_EVT set
Without commit 2, Acer cannot work when there is only 1 event:
[ +event 1 ] SCI_EVT set
                           QR_EC
                           can prepared next QR_EC because SCI_EVT is set
                           event returned
[ -event 1 ] SCI_EVT clear
                           QR_EC (but SCI_EVT is cleared!!!)
And with commit 2, Acer can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
                           QR_EC
                           event returned
[ -event 1 ] SCI_EVT set
                           can prepare next QR_EC because SCI_EVT is cleared

For Samsung behavior, reverting commit 2 will still have chances to have events lost:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
                           QR_EC
---
                           event returned
[ -event 1 ] SCI_EVT clear
                           can prepare next QR_EC because SCI_EVT is set
---
[ -event 2 ] SCI_EVT clear
The --- marked period is racy, after sending the QR_EC command, firmware can behave faster than driver, and we still cannot prepare another QR_EC when this happens.
So I believe this is a Samsung EC firmware specific bug. And the better solution for Samsung is to implement timely event polling on top of the polling thread commit so that when this race happens the lost events can still be found back.

But the polling thread is a new feature, and we need to be more careful with new features, it shouldn't be merged as a regression fix. So I'll propose them to the upstream after proposing a regression fix and seeing more test reports here.

As a regression fix, I'll propose 2 simpler patches:
The first patch will revert the 2nd commit - 558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4, since it is not required by Acer as Acer platform can be fixed the 1st commit.
The second patch will add a Acer specific quirk flag to the 1st commit - 3afcf2ece453e1a8c2c6de19cdf06da3772a1b08.

Thanks
Comment 211 Lv Zheng 2014-10-29 02:56:14 UTC
Hi, Ortwin

To be more careful, can you help to confirm:
1. the test result of vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch. Without ec-flush7.patch, can this still work for Samsung?
2. the post-resuming test result of vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch - originally Samsung laptops suffer from post-resuming event problems, and we need to confirm if this set can solve the old problem.

You can perform suspend/resume test in this way:
# echo mem > /sys/power/state
During this period, do something like opening/closing lid
Push the power button to resume

Thanks in advance
Comment 212 Kieran Clancy 2014-10-29 03:05:28 UTC
Thanks Lv, I will do some testing on my own laptop this coming weekend.

To others testing, the right way to ensure that the system is fixed or not with a given patch set is to:
Suspend and then while it is suspended, unplug and replug the AC adapter 8 times, leaving a couple of seconds between each plug/unplug.

If the bug is still present, the EC buffer will become full and cease to work properly upon resume (further AC plugs/unplugs will not be detected, amongst other problems).
Comment 213 Ortwin Glück 2014-10-29 20:43:40 UTC
(In reply to Lv Zheng from comment #211)
I tested both kernels and couldn't see any difference. The power plug detection always worked. But LID switch didn't.

However, due to another regression I get a dark screen after resume, so I shut down cleanly (hacking blindly) and rebooted into the same kernel and tested if the events work.
Comment 214 Lv Zheng 2014-10-30 02:42:05 UTC
(In reply to Ortwin Glück from comment #213)
> (In reply to Lv Zheng from comment #211)
> I tested both kernels and couldn't see any difference. The power plug
> detection always worked. But LID switch didn't.

What do you mean by both?
This - vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch - and which one?

Without timely polling, I think any kernel, even backing to where the Kieran's quirk is first introduced, will still have issues.
Because, Samsung EC will not flag SCI_EVT as long as there is already events queued up.

So IMO, during runtime, we can only allow up to 2 events queued up. While doing testing, frequent actions could cause many events to be queued up, and our EC driver will surely cannot prepare QR_EC from 3rd events because at that time, SCI_EVT surely will not be flagged:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
[ +event 3 ]
                           1st QR_EC
                           can prepare 2nd QR_EC because SCI_EVT=1
                           event returned
[ -event 1 ] SCI_EVT clear
                           2nd QR_EC
                           event returned
                           cannot prepare 3rd QR_EC because SCI_EVT=0
[ -event 2 ] SCI_EVT clear
So IMO, only sending QR_EC timely without checking SCI_EVT or draining events until 0x0 returned can fix this.

Thus I guess the test result depends on how frequently you trigger EC events.

> However, due to another regression I get a dark screen after resume, so I
> shut down cleanly (hacking blindly) and rebooted into the same kernel and
> tested if the events work.

Do you mean currently we are not able to confirm if the patchset can solve old problems because a regression prevents the suspend/resume from working?
I also noticed using linux-3.18-rc1, I could not perform suspend/resume any more. All my local tests are done using patched linux-3.17-rc4.
Maybe other guys here have platforms that are not suffered from this regression.
Comment 215 Lv Zheng 2014-10-30 07:19:02 UTC
Created attachment 155851 [details]
[PATCH 7/8] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.

(In reply to Lv Zheng from comment #205)
> Created attachment 155431 [details]
> [PATCH] ACPI/EC: Add timely SCI_EVT polling support for Samsung models.
> 
> First let's try to use timely polling instead of clear_on_resume for Samsung
> laptops.
> The other option is to let the polling thread to poll events until the
> firmware returning 0 (no event).
> 
> I'll add test requirement after this post.

I can see an issue in this patch.
We need to add acpi_ec_started(ec) check for the timely polling support, or it may prevent system from suspending.

I've updated the patch and please use the new one next time.

Thanks
-Lv
Comment 216 Ortwin Glück 2014-11-01 11:54:05 UTC
(In reply to Lv Zheng from comment #214)
> What do you mean by both?
> This - vanilla linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch - and which
> one?

The two that you asked me to test: linux-pm/linux-next + ec-flush{1,2,3,4,5,6}.patch and linux-pm/linux-next + ec-flush{1,2,3,4,5,6,7}.patch
 

> Do you mean currently we are not able to confirm if the patchset can solve
> old problems because a regression prevents the suspend/resume from working?

Yes, once the power plug detection started working again, I was unable to make it fail even if 3.17.y, using plug/unplug/lid switches during suspend. I think we still lack an actual test case...
Comment 217 Lv Zheng 2014-11-03 01:59:30 UTC
Hi, Ortwin

I'm sorry I cannot understand your test results clearly.
I was confused by what you said on the bugzilla.

The results have conflicts here:
You said in comment #207, without doing suspend/resume:
linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch
  - LID switch works
  - power plug works
But in comment 213, you said you cannot perform suspend/resume.
And in comment 216, you said this combination works differently:
linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch
  - LID switch not detected
  - power plug sometimes stops working

Can I confirm this again with you:
1. What's the actual test result of linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch before a suspend/resume test?
2. Using linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch, you cannot perform a suspend/resume test, can you?
3. Using linux-pm/linux-next, can you perform a suspend/resume test?
4. Using linux-pm/linux-next+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch, can you perform a suspend/resume test?
5. Then what's the actual test result of linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch after a suspend/resume test?

I think for some distribution, each time you've built a new kernel, you may need to cleanup old binaries by deleting /lib/modules/<kernel version> and /boot/vmlinuz-<kernel version> before typing "make modules_install install".
According to my experience, if I don't delete the old binaries, I can always see confused result to do with "git bisect" using the same kernel.

Thanks and best regards
Comment 218 Lv Zheng 2014-11-05 03:15:30 UTC
Created attachment 156591 [details]
[PATCH] ACPI/OSL: Add IRQ handler flushing support in the OSL.

This patchset was originally dependent on another "GPE dead lock" fix series.
After rebasing, the dead lock may happen during suspend.
And without the "GPE dead lock" fix series, we need this patch to be applied to prevent this dead lock as a work around.

Let me describe the patch applying order after posting.
Comment 219 Lv Zheng 2014-11-05 03:40:31 UTC
Hi, Ortwin:

Sorry for the inconvenience.

The patchset was originally based on another GPE fix series.
The fix has been discussed for more than 3 months, and didn't get merged.
So my environment is a bit different from yours.

You may suffer from a dead lock during suspending without the fix.
In order to make this patchset working for "system suspend" without the fix, the gpe-handle21.patch workaround is required.

For this test:
4. Using linux-pm/linux-next+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch, can you perform a suspend/resume test?
It should be updated as:
4. Using linux-pm/linux-next+gpe-handle21.patch+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch, can you perform a suspend/resume test?

linux-pm/linux-next maintainer has merged the regression fixes, so if you have synchronized the branch, you need to:
1. git revert 79149001
2. git revert df9ff918
3. git am gpe-handle21.patch (attachment 156591 [details])
4. git am ec-flush1.patch (attachment 155351 [details])
5. git am ec-flush2.patch (attachment 155371 [details])
6. git am ec-flush3.patch (attachment 155381 [details])
7. git am ec-flush4.patch (attachment 155391 [details])
8. git am ec-flush5.patch (attachment 155411 [details])
9. git am ec-flush6.patch (attachment 155421 [details])
10.git am ec-flush7.patch (use updated attachment 155851 [details], do not use attachment 155431 [details])
11. rebuild and boot the kernel.
12. try the suspend/resume test again.

For this test:
5. Then what's the actual test result of linux-pm/linux-next +ec-flush{1,2,3,4,5,6,7}.patch after a suspend/resume test?
It also should be updated as:
5. Then what's the actual test result of linux-pm/linux-next +gpe-handle21.patch+ec-flush{1,2,3,4,5,6}.patch+updated ec-flush7.patch after a suspend/resume test?

Sorry again.

Thanks and best regards
Comment 220 San 2014-11-05 11:08:07 UTC
Hi!

I'm not sure what i have to do. Some time ago i made kernel and uploaded it onto my server.
I would like to do it again. So I'm now compiling this way:

git clone git://kernel.ubuntu.com/ubuntu/ubuntu-utopic.git
cd ubuntu-utopic/
patch -p1 < ../155351.patch 
make mrproper
wget http://beta.paragrafówka.pl/eksport/config-3.16.3 && mv config-3.16.3 .config
fakeroot debian/rules clean && fakeroot debian/rules binary-headers binary-generic

I'm building as suggested in comment 206 (https://bugzilla.kernel.org/show_bug.cgi?id=44161#c206) for now and then I'll try whole set (as suggested in second point).

Is that right way?
Comment 221 Ortwin Glück 2014-11-05 20:46:00 UTC
Lv,

Sorry for the confusion. I am trying to clear the clouds:

1. Power plug detection is no longer a problem once it started working again. I am unable to make it fail on purpose, so I can't test if any kernel fixes that.

2. The lid switch is now non-functional with any kernel (state is always "open"):
- 3.17
- 3.18
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
- linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} + ec-flush7-updated
However during ~40 lid open/close cycle with the latter kernel it worked 2 times (state = "closed").

3. suspend: doesn't work by closing the lid, works fine by suspending manually (KDE start menu)

4. resume: always works, but there is a backlight regression which has nothing to do with this issue. See https://bugs.freedesktop.org/show_bug.cgi?id=80773

Don't worry about my kernel building skills. I do that very very often and have established a workflow that always works. They are all straight from git, hand-configured per the specific hardware and use no modules. They are loaded by a hand-configured grub2 straight without initramfs.
Comment 222 Lv Zheng 2014-11-07 02:09:25 UTC
Hi, San

Thanks for your help.

(In reply to San from comment #220)
> Hi!
> 
> I'm not sure what i have to do. Some time ago i made kernel and uploaded it
> onto my server.
> I would like to do it again. So I'm now compiling this way:
> 
> git clone git://kernel.ubuntu.com/ubuntu/ubuntu-utopic.git
> cd ubuntu-utopic/
> patch -p1 < ../155351.patch 
> make mrproper
> wget http://beta.paragrafówka.pl/eksport/config-3.16.3 && mv config-3.16.3
> .config
> fakeroot debian/rules clean && fakeroot debian/rules binary-headers
> binary-generic
> 
> I'm building as suggested in comment 206
> (https://bugzilla.kernel.org/show_bug.cgi?id=44161#c206) for now and then
> I'll try whole set (as suggested in second point).
> 
> Is that right way?

Ortwin is providing me the testing using private email.
I'll post some of the results here after the progress are made for the investigation.
Maybe you can hold a while until the problem is correctly figured out.

The 2 commits are now reverted in linux-pm, it will appear in the Linux mainline in 1 month.

Best regards
Comment 223 Lv Zheng 2014-11-07 03:13:22 UTC
Hi, Ortwin

(In reply to Ortwin Glück from comment #221)
> Lv,
> 
> Sorry for the confusion. I am trying to clear the clouds:
> 
> 1. Power plug detection is no longer a problem once it started working
> again. I am unable to make it fail on purpose, so I can't test if any kernel
> fixes that.

So the 2 commits didn't make a regression on the power plug?

> 2. The lid switch is now non-functional with any kernel (state is always
> "open"):
> - 3.17
> - 3.18
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6,7}
> - linux-pm /linux-next + gpe-handle21.patch + ec-flush{1,2,3,4,5,6} +
> ec-flush7-updated
> However during ~40 lid open/close cycle with the latter kernel it worked 2
> times (state = "closed").
> 
> 3. suspend: doesn't work by closing the lid, works fine by suspending
> manually (KDE start menu)
> 
> 4. resume: always works, but there is a backlight regression which has
> nothing to do with this issue. See
> https://bugs.freedesktop.org/show_bug.cgi?id=80773

Can you perform a resume by opening the LID?
Also please upload the output of /proc/acpi/wakeup here.

> Don't worry about my kernel building skills. I do that very very often and
> have established a workflow that always works. They are all straight from
> git, hand-configured per the specific hardware and use no modules. They are
> loaded by a hand-configured grub2 straight without initramfs.

Good to know this!
That makes every testing possible. :-)

Thanks and best regards
-Lv
Comment 224 Andrea Fagiani 2014-11-07 23:19:24 UTC
Created attachment 157011 [details]
proc/acpi/wakeup

I was finally able to do some tests, here are my results.
- Samsung NP900X3C
- using linux-pm/linux-next with patches from Comment 219.

Upon boot, I get the following behavior:
- Power plug detection appears to be working no matter what.
- Lid switch is recognized (open/close states are reported correctly using acpi_listen), but suspend does NOT trigger.

However, after a while, (maybe after a few ACPI events?), the LID switch actually triggers suspend correctly; I'm not sure the issue is related.
It does not seem to revert to the broken behavior even after some time. It works both on battery and AC power. 

Opening the LID does not trigger a resume, but tbh that was never the case on this machine.

Lv, if you have more tests in mind / need any logs just let me know.

Attaching my /proc/acpi/wakeup.

Thanks,
Andrea
Comment 225 San 2014-11-09 15:45:25 UTC
I'm unable to compile kernel. It patches ok, but i get:

== code ==
  Using /home/publiczny2/kernel/ubuntu-utopic as source for kernel
  /home/publiczny2/kernel/ubuntu-utopic is not clean, please run 'make mrproper'
  in the '/home/publiczny2/kernel/ubuntu-utopic' directory.
==/code ==
Comment 226 Lv Zheng 2014-11-10 03:41:28 UTC
Hi,

(In reply to Andrea Fagiani from comment #224)
> Created attachment 157011 [details]
> proc/acpi/wakeup
> 
> I was finally able to do some tests, here are my results.
> - Samsung NP900X3C
> - using linux-pm/linux-next with patches from Comment 219.
> 
> Upon boot, I get the following behavior:
> - Power plug detection appears to be working no matter what.
> - Lid switch is recognized (open/close states are reported correctly using
> acpi_listen), but suspend does NOT trigger.

Can you confirm if they still work after suspend/resume?
See test requirement A below.
 
> However, after a while, (maybe after a few ACPI events?), the LID switch
> actually triggers suspend correctly; I'm not sure the issue is related.
> It does not seem to revert to the broken behavior even after some time. It
> works both on battery and AC power. 
> 
> Opening the LID does not trigger a resume, but tbh that was never the case
> on this machine.

According to the /proc/acpi/wakeup - yes, only power button can wakeup the system.

> Lv, if you have more tests in mind / need any logs just let me know.

A. We want to confirm that this set of patches can fix old issues. We need to be careful before removing old quirks.
1. using linux-pm/linux-next with patches from Comment 219.
2. build/boot the kernel
3. do a suspend/resume
Can power plug and LID switch still work after resuming?

B. We want to see some proofs that Samsung EC firmware acts exactly as what we've inferred from the reversion requirements. So could you:
1. using linux-pm/linux-next with patches from Comment 219.
2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */
3. build/boot the kernel
4. do some LID state changes and make sure they are captured by acpi_listen.
5. upload the dmesg output here.

C. We want to confirm if the timely polling support is really required by Samsung EC firmware:
1. using linux-pm/linux-next
3. git am gpe-handle21.patch (attachment 156591 [details])
4. git am ec-flush1.patch (attachment 155351 [details])
5. git am ec-flush2.patch (attachment 155371 [details])
6. git am ec-flush3.patch (attachment 155381 [details])
7. git am ec-flush4.patch (attachment 155391 [details])
8. git am ec-flush5.patch (attachment 155411 [details])
9. git am ec-flush6.patch (attachment 155421 [details])
Do not apply the ec-flush7.patch, can power plug and LID switch still work, and can still work even after resuming?

D. We want to confirm if this is really a regression and current fix can:
1. Using linux-pm/linux-next
2. git revert 79149001
3. git revert df9ff918
Do not apply any of the posted patches, can power plug and LID switch still work? And
1. Using linux-pm/linux-next
Do not revert the "regression fix" commits and do not apply any of the posted patches, can power plug and LID switch still work?

Thanks in advance.
Comment 227 Andrea Fagiani 2014-11-10 09:03:42 UTC
> A. We want to confirm that this set of patches can fix old issues. We need
> to be careful before removing old quirks.
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. build/boot the kernel
> 3. do a suspend/resume
> Can power plug and LID switch still work after resuming?

Yes, both power plug and LID switch work and are correctly captured by acpi_listen.

> B. We want to see some proofs that Samsung EC firmware acts exactly as what
> we've inferred from the reversion requirements. So could you:
> 1. using linux-pm/linux-next with patches from Comment 219.
> 2. Uncomment the following line in the drivers/acpi/ec.c:
>      /* #define DEBUG */
> 3. build/boot the kernel
> 4. do some LID state changes and make sure they are captured by acpi_listen.
> 5. upload the dmesg output here.

Opened/closed 5 times, all of them showed up in acpi_listen.
Attaching dmesg.

> C. We want to confirm if the timely polling support is really required by
> Samsung EC firmware:
> 1. using linux-pm/linux-next
> 3. git am gpe-handle21.patch (attachment 156591 [details])
> 4. git am ec-flush1.patch (attachment 155351 [details])
> 5. git am ec-flush2.patch (attachment 155371 [details])
> 6. git am ec-flush3.patch (attachment 155381 [details])
> 7. git am ec-flush4.patch (attachment 155391 [details])
> 8. git am ec-flush5.patch (attachment 155411 [details])
> 9. git am ec-flush6.patch (attachment 155421 [details])
> Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> and can still work even after resuming?

Both work fine, even after a suspend/resume.

Note that the ec-flush patches depend on the reverts, so I applied them even if they weren't explicitly listed. I can test without them later today if required.

> D. We want to confirm if this is really a regression and current fix can:
> 1. Using linux-pm/linux-next
> 2. git revert 79149001
> 3. git revert df9ff918
> Do not apply any of the posted patches, can power plug and LID switch still
> work? And

Upon boot, it works correctly. After a suspend/resume, power plug detection does not work anymore (it is greatly delayed) and neither does the LID switch (doesn't trigger at all).

> 1. Using linux-pm/linux-next
> Do not revert the "regression fix" commits and do not apply any of the
> posted patches, can power plug and LID switch still work?

I wasn't able to do this last test due to time constraints, I'll try and do that later today.
Comment 228 Andrea Fagiani 2014-11-10 09:04:33 UTC
Created attachment 157111 [details]
dmesg - #DEBUG enabled
Comment 229 Lv Zheng 2014-11-11 03:19:50 UTC
Hi,

The test results are very useful for us to understand this issue. :-)

(In reply to Andrea Fagiani from comment #227)
> > B. We want to see some proofs that Samsung EC firmware acts exactly as what
> > we've inferred from the reversion requirements. So could you:
> > 1. using linux-pm/linux-next with patches from Comment 219.
> > 2. Uncomment the following line in the drivers/acpi/ec.c:
> >      /* #define DEBUG */
> > 3. build/boot the kernel
> > 4. do some LID state changes and make sure they are captured by
> acpi_listen.
> > 5. upload the dmesg output here.
> 
> Opened/closed 5 times, all of them showed up in acpi_listen.
> Attaching dmesg.

Yes. I also found the 0x5E (5 times) and 0x5F (4 times) in the dmesg.
TBH, I didn't find proofs for comment 210 in it.
So the issue isn't as what I imagined before.

> > C. We want to confirm if the timely polling support is really required by
> > Samsung EC firmware:
> > 1. using linux-pm/linux-next
> > 3. git am gpe-handle21.patch (attachment 156591 [details])
> > 4. git am ec-flush1.patch (attachment 155351 [details])
> > 5. git am ec-flush2.patch (attachment 155371 [details])
> > 6. git am ec-flush3.patch (attachment 155381 [details])
> > 7. git am ec-flush4.patch (attachment 155391 [details])
> > 8. git am ec-flush5.patch (attachment 155411 [details])
> > 9. git am ec-flush6.patch (attachment 155421 [details])
> > Do not apply the ec-flush7.patch, can power plug and LID switch still work,
> > and can still work even after resuming?
> 
> Both work fine, even after a suspend/resume.

Thus, in order to fix Samsung issue.
Either we need timely polling quirk (since test A works)
Or we need old CLEAR_ON_RESUME quirk (since test C works).

> Note that the ec-flush patches depend on the reverts, so I applied them even
> if they weren't explicitly listed. I can test without them later today if
> required.

Not necessary :-). I only want the kernel test results __wih__ the reverts (as they conflict with the patches posted here) so I don't need to post another set of rebased patches for current linux-pm/linux-next.

> > D. We want to confirm if this is really a regression and current fix can:
> > 1. Using linux-pm/linux-next
> > 2. git revert 79149001
> > 3. git revert df9ff918
> > Do not apply any of the posted patches, can power plug and LID switch still
> > work? And
> 
> Upon boot, it works correctly. After a suspend/resume, power plug detection
> does not work anymore (it is greatly delayed) and neither does the LID
> switch (doesn't trigger at all).

Thus the regression seems to be: "the 2 commits have broken the CLEAR_ON_RESUME quirk".

I guess SCI_EVT is set after resuming but GPE doesn't fire.
If this is the case, this isn't a silicon/firmware bug but a driver bug.
As EC GPE is edge triggered and GPE might have already been fired before suspending.
If so, we need to improve the driver and shouldn't fix the driver bug using any kind of quirk.

So I want to check if the SCI_EVT is set after resuming.
Let me think about a way to confirm this.
I'll add another test later.

> > 1. Using linux-pm/linux-next
> > Do not revert the "regression fix" commits and do not apply any of the
> > posted patches, can power plug and LID switch still work?
> 
> I wasn't able to do this last test due to time constraints, I'll try and do
> that later today.

According to your previous test results. The result might be "both can work".
Though we might still need to improve the driver, we'll also be glad to see this regression has been prevented in the mainline.
So it's better to confirm if the upstreamed regression fixes can also work.

Best regards
-Lv
Comment 230 Lv Zheng 2014-11-11 03:40:37 UTC
Hi, Andrea

Please use either of the following working combinations:
1. test A
2. test C
3. test D2 (which you haven't done yet)

If you'll do test D2 for us, you can just do it and capture the confirmation information at same time.

Please do:
1. suspend
2. do some LID opening/closing to trigger old issue
3. resume
4. dmesg > dmesg-right-after-resume.txt

And post the dmesg-right-after-resume.txt here.
If the kernel log buffer size is not big enough to contain all messages right after resuming, please increase it using the kernel boot parameter - log_buf_len=1M.

Thanks and best regards
Comment 231 Lv Zheng 2014-11-11 04:25:31 UTC
Hi, Andrea

Sorry that I forgot to ask you to enable the EC log to capture the debug information.

2. Uncomment the following line in the drivers/acpi/ec.c:
     /* #define DEBUG */

(In reply to Lv Zheng from comment #230)
> Hi, Andrea
> 
> Please use either of the following working combinations:
> 1. test A
> 2. test C
> 3. test D2 (which you haven't done yet)
> 
> If you'll do test D2 for us, you can just do it and capture the confirmation
> information at same time.
> 
> Please do:
> 1. suspend
> 2. do some LID opening/closing to trigger old issue
> 3. resume
> 4. dmesg > dmesg-right-after-resume.txt
> 
> And post the dmesg-right-after-resume.txt here.
> If the kernel log buffer size is not big enough to contain all messages
> right after resuming, please increase it using the kernel boot parameter -
> log_buf_len=1M.

Thanks
-Lv
Comment 232 Andrea Fagiani 2014-11-11 21:32:14 UTC
Created attachment 157351 [details]
dmesg-right-after-resume_A
Comment 233 Andrea Fagiani 2014-11-11 21:32:36 UTC
Created attachment 157361 [details]
dmesg-right-after-resume_C
Comment 234 Andrea Fagiani 2014-11-11 21:33:01 UTC
Created attachment 157371 [details]
dmesg-right-after-resume_D2
Comment 235 Andrea Fagiani 2014-11-11 21:36:10 UTC
Hi Lv,
I captured and uploaded the dmesg as requested.

Concerning test D2, both power plug and LID switch seem to work fine even after a suspend/resume. You'll notice I've played with it a little bit before performing the suspend test.

Hope it helps,
Andrea
Comment 236 Lv Zheng 2014-11-12 03:27:48 UTC
Hi, Andrea

The results make something clear to me.
Thanks for the great job.

For the 3 test results, SCI_EVT is always flagged after resuming:
Test A (regression fixes + cleanup + timely polling quirk):
[   42.608720] PM: Restoring platform NVS memory
...
[   42.649355] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test C (regression fixes + cleanup + CLEAR_ON_RESUME quirk):
[   21.764941] PM: Restoring platform NVS memory
...
[   21.811916] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
Test D2 (regression fixes + CLEAR_ON_RESUME quirk):
[  172.102165] PM: Restoring platform NVS memory
...
[  172.149860] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
This can explain the cause of the original bug identified in comment 129.
The EC behavior is not broekn here. We shouldn't implement a quirk for it since this is a driver bug - if the EC GPE is not fired, we don't have means to poll the events.

For the regression reported in comment 184, I only find the proof in test A that we need both the 2 commits to be reverted:
Test A (regression fixes + cleanup + timely polling quirk):
[   42.662695] ACPI : EC: ***** Command(QR_EC) started *****
[   42.662697] ACPI : EC: ===== TASK (2) =====
[   42.662702] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[   42.662703] ACPI : EC: EC_SC(W) = 0x84
[   42.662707] ACPI : EC: ***** Event running *****
[   42.662827] ACPI : EC: ##### Query(0x5e) stopped #####
[   42.666037] ACPI : EC: ===== TASK (2) =====
[   42.666042] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   42.666045] ACPI : EC: EC_DATA(R) = 0x5f
[   42.666046] ACPI : EC: ***** Command(QR_EC) hardware completion *****
[   42.666047] ACPI : EC: ***** Command(QR_EC) stopped *****
The event of 0x5F is returned when SCI_EVT=0.
So we need to revert the 2 commits to allow QR_EC to be sent when SCI_EVT=0 and there is really non 0x00 events need to be drained when SCI_EVT=0.
TBH, the EC behavior is broekn here, thus IMO this is a different issue than the one identified in comment 129.
It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way) can work it around. If this issue can be reproduced during runtime, then the CLEAR_ON_RESUME can not work it around at that time. According to your tests, the problem cannot be easily reproduced during runtime. So we can stay unchanged until someone else blaims.

If we have to assume this is only a problem happened during resuming and if we want to root cause why the SCI_EVT is suddently cleared after resuming, we will have to assume that something during the resuming steps has caused this issue because:
1. In test C, 2 stale events are returned with SCI_EVT=1:
[   21.814019] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[   21.814022] ACPI : EC: EC_DATA(R) = 0x5e
[   21.817333] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[   21.817351] ACPI : EC: EC_DATA(R) = 0x5f
So SCI_EVT will not be spontaneously cleared during acpi_ec_unblock_transactions().
2. In test A, 2 stale events are returned with different SCI_EVT value:
[   42.656421] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[   42.656424] ACPI : EC: EC_DATA(R) = 0x5e
[   42.666042] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   42.666045] ACPI : EC: EC_DATA(R) = 0x5f
I can see this happens in test A after:
[   42.656442] ACPI: Waking up from system sleep state S3
Perhaps one of the resuming steps in acpi_pm_finish() (most likely in acpi_hw_legacy_wake()) has cleared the SCI_EVT.
If it is caused by register accesses occurred in acpi_hw_legacy_wake(), this is an ACPICA bug.
If it happens in _WAK evaluation, then this is most likely a firmware bug. Samsung guys need to take care of such firmware bugs.

IMO it is not worthy to track down to the root cause there, given the fact that it would require very complex debugging steps to root cause it and I cannot do it myself. So I'm going to mark this issue as resolved.

Thanks and best regards
Comment 237 Lv Zheng 2014-11-12 07:59:21 UTC
Hi, Andrea

I checked the log again. And the following seems to be wrong.

> It seems both the quirks (drain way of CLEAR_ON_RESUME, timely polling way)
> can work it around. If this issue can be reproduced during runtime, then the
> CLEAR_ON_RESUME can not work it around at that time. According to your
> tests, the problem cannot be easily reproduced during runtime. So we can
> stay unchanged until someone else blaims.

Though you didn't face real functional issues, there are following entries in the test D2 log:
[   31.072607] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   31.072613] ACPI : EC: EC_DATA(R) = 0x66
[   77.835639] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   77.835647] ACPI : EC: EC_DATA(R) = 0x66
[   79.939044] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   79.939053] ACPI : EC: EC_DATA(R) = 0x66
[   82.006084] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[   82.006087] ACPI : EC: EC_DATA(R) = 0x66
Thanks for you long time test or we won't find them out.
The _Q66 is as follows:
                    Method (_Q66, 0, NotSerialized)
                    {
                        P8XH (Zero, 0x66)
                        If (LEqual (B1EX, One))
                        {
                            Notify (BAT1, 0x80)
                        }
                    }
It updates battery status.
Then it means the assumption in comment 210 might be real and we need the "timely polling quirk" for samsung to have everything wroked around...
So I'll prepare the whole patchset posted here for upstream according to this result.

In test A, I can even see such log entries:
[    1.788647] ACPI : EC: ***** Command(QR_EC) started *****
[    1.788651] ACPI : EC: ===== TASK (1) =====
[    1.788656] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[    1.788659] ACPI : EC: EC_SC(W) = 0x84
...
[    1.800289] ACPI : EC: ===== IRQ (0) =====
[    1.800295] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[    1.800300] ACPI : EC: EC_DATA(R) = 0x70
...
[    1.800385] ACPI : EC: ***** Command(QR_EC) stopped *****
It seems for Samsung EC, the firmware can even return event that happens after we send the query command. There can really be a big time slot between driver pushes the command byte and firmware fetches it.

Thanks and best regards
Comment 238 Kieran Clancy 2014-11-13 10:49:27 UTC
Hi Lv,

Thanks for all your work on this issue. It's certainly not easy, especially if you don't have the hardware available to test on yourself.

You are probably right about the racy behaviour. One important point though, is that you would need to miss either 8 (or was it 16?) events in a row during run time for the buffer to fill and for SCI_EVT to stop being given entirely.

Under normal usage, I think there's almost always going to be an event regularly enough that this "full" condition never occurs. It was only found to be a problem during suspend, where the EC controller seemed to continue logging, and after enough battery level or AC adapter changes it would stop triggering events even after resume.

My HDD was too full recently to build kernels, but I have just freed some space (a bit late, sorry) so if you want me to test anything please let me know.
Comment 239 Lv Zheng 2014-11-14 01:39:00 UTC
(In reply to Kieran Clancy from comment #238)
> Hi Lv,
> 
> Thanks for all your work on this issue. It's certainly not easy, especially
> if you don't have the hardware available to test on yourself.
> 
> You are probably right about the racy behaviour. One important point though,
> is that you would need to miss either 8 (or was it 16?) events in a row
> during run time for the buffer to fill and for SCI_EVT to stop being given
> entirely.
> 
> Under normal usage, I think there's almost always going to be an event
> regularly enough that this "full" condition never occurs. It was only found
> to be a problem during suspend, where the EC controller seemed to continue
> logging, and after enough battery level or AC adapter changes it would stop
> triggering events even after resume.
> 
> My HDD was too full recently to build kernels, but I have just freed some
> space (a bit late, sorry) so if you want me to test anything please let me
> know.

Hi,

Thanks a lot!
We failed to find this platform selling on China market.

I was thinking this issue doesn't exist during runtime. That's why I said in comment 236 that it might be a BIOS _WAK issue or something in acpi_hw_legacy_wake() has cleared SCI_EVT and we may need further investigation. But finally in comment 237, I found it occurred not only after resuming, but also runtime, so further investigation doesn't matter any more because this is definitely an EC firmware behavior according to the facts.

In our driver, we will queue 2nd QR_EC after sending the 1st QR_EC and before fetching the returned event. At that time, SCI_EVT should still be 1, so we can always send 2 QR_EC for 1 SCI_EVT=1 indication. This somewhat can reduce the reproduction ratio of the Samsung EC issue and that's why during runtime it can hardly be reproduced as we can drain events faster than they are accumulated in the firmware.

But the above code can only make a happen to work environment. If the host acts a bit slower than the target, this issue might still occurr during runtime when there are more than 3 events queued up and the driver will be unable to queue 3rd QR_EC if SCI_EVT is cleared after fetching the 1st event.

It seems the best way for Samsung is always sending QR_EC whatever SCI_EVT is until 0x00 returned. This driver behavior is reasonable from ACPI spec's point of view. The CLEAR_ON_RESUME quirk provided by you did this after resuming, we may need to extend this behavior to runtime.

But we do have many platforms act differently:
1. Some platforms never return 0x00 even when SCI_EVT=0, they return certain event value. Some of them even return the event value for which there is no _Qxx prepared in the namespace. So if we continously send QR_EC until 0x00 is returned, the process will never end on such platforms. The users of such platforms will sure blame us.
2. Some platforms (Acer) do not return anything if SCI_EVT=0, the EC query transaction will be blocked, and our driver cannot issue further EC transaction unless previous one is completed. So if we allow QR_EC to be sent without checking SCI_EVT, users of such platforms will complain.

What I'm going to do is:
1. extending the draining behavior - polling event until 0x00 is returned after seeing SCI_EVT indicator - to the runtime and
2. making it specific for Samsung.
So that we don't need to timely poll it and it can prevent this issue from happening after resuming and during runtime.

I can post the improvement after having done some local tests.

Best regards
Comment 240 Lv Zheng 2014-11-14 02:27:22 UTC
Hi, Kieran

Can you:
1. use current linux-pm/linux-next branch
2. enable EC debugging
3. disable quirk
4. trigger the bug
5. capture the dmesg right after resume
6. upload the dmesg here
for investigation?

Thanks in advance
Comment 241 Kieran Clancy 2014-11-14 16:05:05 UTC
Created attachment 157601 [details]
dmesg on linux-next without quirk showing EC working before but not after triggering bug

(In reply to Lv Zheng from comment #240)
> Hi, Kieran
> 
> Can you:
> 1. use current linux-pm/linux-next branch
> 2. enable EC debugging

Is it enough to just uncomment the #define DEBUG?

> 3. disable quirk

Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this what you meant?

> 4. trigger the bug
> 5. capture the dmesg right after resume
> 6. upload the dmesg here

See attached. I annotated the dmesg with a few extra > /dev/kmsg lines starting with 'KC'. They are:

[  107.787915] KC just booted and logged in
[  137.240937] KC screen close/open works as intended
[  203.943414] KC about to suspend, but not trigger bug this time
[  226.584995] KC back from suspend
[  265.000194] KC screen close/open works as intended
[  298.608956] KC AC change detected as intended
[  318.874279] KC about to suspend and trigger bug
[  348.834217] KC back from suspend
[  374.777099] KC screen open/close not detected
[  398.000040] KC AC change not detected
[  425.733615] KC about to manually clear EC events with script
[  442.663539] KC EC events cleared
[  469.766095] KC AC change detected again
[  496.219180] KC screen open/close detected again

So between 0 and 318 you can see the EC operating as intended including one suspend cycle.

After 318 I trigger the bug (by unplugging AC a bunch of times), and after that you can see that there is nothing at all logged when I either closed the screen or unplugged AC.

At 425 I manually simulated the quirk by sending 0x84 EC command queries and reading the data until the data is 0x00 using Juan Manuel's userspace program.

After that, EC events are detected properly again.

----

My analysis is below:

After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately after resume:

[  337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0

It does not seem as though we ever handle this event properly though. Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally different?

----

Further testing shows that this SCI_EVT=1 happens for the first resume after the bug is triggered, but not for the second or subsequent resumes.

That means that we are still going to need a wakeup quirk because if for some reason we fail to clear the EC state before the next suspend, we will never get another SCI_EVT=1 (even after a power cycle, I believe).
Comment 242 Kieran Clancy 2014-11-14 16:27:37 UTC
Created attachment 157611 [details]
dmesg on linux-next without quirk showing bug persisting over several suspend cycles

Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug during the first suspend time.

It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes do not have this. In a moment I will confirm if this persists across power cycles.

It seems the right behaviour for affected Samsung machines is to send QR_EC until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or resume.
Comment 243 Kieran Clancy 2014-11-14 16:43:58 UTC
Created attachment 157631 [details]
dmesg on linux-next without quirk showing bug persisting over power cycle

Confirming that once the bug is initially triggered, we don't get SCI_EVT=1 even on a power cycle, so we still need the boot/resume quirk. At least, that's my interpretation.

I hope the dmesgs were helpful. Unfortunately I'm not going to have internet access for the next week (going to the outback where there is not even any mobile reception), but I'm happy to test more things when I return.
Comment 244 Lv Zheng 2014-11-17 05:37:34 UTC
(In reply to Kieran Clancy from comment #241)
> Created attachment 157601 [details]
> dmesg on linux-next without quirk showing EC working before but not after
> triggering bug
> 
> (In reply to Lv Zheng from comment #240)
> > Hi, Kieran
> > 
> > Can you:
> > 1. use current linux-pm/linux-next branch
> > 2. enable EC debugging
> 
> Is it enough to just uncomment the #define DEBUG?

Yes.

> > 3. disable quirk
> 
> Is it enough to prevent EC_FLAGS_CLEAR_ON_RESUME from being set? Is this
> what you meant?

Yes.

> > 4. trigger the bug
> > 5. capture the dmesg right after resume
> > 6. upload the dmesg here
> 
> See attached. I annotated the dmesg with a few extra > /dev/kmsg lines
> starting with 'KC'. They are:
> 
> [  107.787915] KC just booted and logged in
> [  137.240937] KC screen close/open works as intended
> [  203.943414] KC about to suspend, but not trigger bug this time
> [  226.584995] KC back from suspend
> [  265.000194] KC screen close/open works as intended
> [  298.608956] KC AC change detected as intended
> [  318.874279] KC about to suspend and trigger bug
> [  348.834217] KC back from suspend
> [  374.777099] KC screen open/close not detected
> [  398.000040] KC AC change not detected
> [  425.733615] KC about to manually clear EC events with script
> [  442.663539] KC EC events cleared
> [  469.766095] KC AC change detected again
> [  496.219180] KC screen open/close detected again

Great information!
Thanks.

> So between 0 and 318 you can see the EC operating as intended including one
> suspend cycle.
> 
> After 318 I trigger the bug (by unplugging AC a bunch of times), and after
> that you can see that there is nothing at all logged when I either closed
> the screen or unplugged AC.
> 
> At 425 I manually simulated the quirk by sending 0x84 EC command queries and
> reading the data until the data is 0x00 using Juan Manuel's userspace
> program.
> 
> After that, EC events are detected properly again.

Great test cases!
Thanks.

> My analysis is below:
> 
> After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> after resume:
> 
> [  337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
> 
> It does not seem as though we ever handle this event properly though.
> Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> different?

This is a bug, in ec_poll(), there is no code to check SCI_EVT.
And event will thus be lost.

I think this has been fixed in the ec-flush6.patch.
+out:
+	if (status & ACPI_EC_FLAG_SCI &&
+	    (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
+		__acpi_ec_set_event(ec);

We will have this flag checked in the advance_transaction.
So you won't see this with ec-flush[1-6].patch applied.

> Further testing shows that this SCI_EVT=1 happens for the first resume after
> the bug is triggered, but not for the second or subsequent resumes.
> 
> That means that we are still going to need a wakeup quirk because if for
> some reason we fail to clear the EC state before the next suspend, we will
> never get another SCI_EVT=1 (even after a power cycle, I believe).

Yes, I think this is required.
What I'm going to do is to add a flag "EC_FLAGS_EVENT_DRAINED" in the driver.
It is cleard after boot/resume and when SCI_EVT=1 is detected.
When it is cleared, QR_EC will be continously issued until 0x00 is returned.
After seeing 0x00, it will be set again to indicate events have been drained.
I'll make this piece of code as Samsung specific quirk for now.

Thanks and best regards
-Lv
Comment 245 Lv Zheng 2014-11-17 05:39:54 UTC
(In reply to Kieran Clancy from comment #242)
> Created attachment 157611 [details]
> dmesg on linux-next without quirk showing bug persisting over several
> suspend cycles
> 
> Of the three suspend/resume cycles shown in this dmesg, I trigger the EC bug
> during the first suspend time.
> 
> It comes back with SCI_EVT=1 set the first time, but the 2nd and 3rd resumes
> do not have this. In a moment I will confirm if this persists across power
> cycles.
> 
> It seems the right behaviour for affected Samsung machines is to send QR_EC
> until data is 0x00 not just if we get SCI_EVT=1, but additionally on boot or
> resume.

Yes.
I was planning to support this in this way.

Thanks
Comment 246 Lv Zheng 2014-11-17 05:40:56 UTC
(In reply to Kieran Clancy from comment #243)
> Created attachment 157631 [details]
> dmesg on linux-next without quirk showing bug persisting over power cycle
> 
> Confirming that once the bug is initially triggered, we don't get SCI_EVT=1
> even on a power cycle, so we still need the boot/resume quirk. At least,
> that's my interpretation.
> 
> I hope the dmesgs were helpful. Unfortunately I'm not going to have internet
> access for the next week (going to the outback where there is not even any
> mobile reception), but I'm happy to test more things when I return.

Yes, it's very helpful.
Thanks for the help!

Best regards
-Lv
Comment 247 Lv Zheng 2014-11-17 06:08:06 UTC
(In reply to Lv Zheng from comment #244)
> (In reply to Kieran Clancy from comment #241)
> > My analysis is below:
> > 
> > After the bug is triggered, SCI_EVT=1 is set just ONE time, immediately
> > after resume:
> > 
> > [  337.319012] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0
> OBF=0
> > 
> > It does not seem as though we ever handle this event properly though.
> > Namely, there seems to be no corresponding "EC_SC(W) = 0x84". There is a
> > couple of "EC_DATA(W) = 0x84" but I'm pretty sure these are totally
> > different?
> 
> This is a bug, in ec_poll(), there is no code to check SCI_EVT.
> And event will thus be lost.
> 
> I think this has been fixed in the ec-flush6.patch.
> +out:
> +     if (status & ACPI_EC_FLAG_SCI &&
> +         (!t || t->flags & ACPI_EC_COMMAND_COMPLETE))
> +             __acpi_ec_set_event(ec);
> 
> We will have this flag checked in the advance_transaction.
> So you won't see this with ec-flush[1-6].patch applied.

I should take this back. :-)
This still need to be improved to indicate SCI_EVT even when there is a transaction running.

Thanks
-Lv
Comment 248 Len Brown 2015-07-21 15:51:11 UTC
commit 74443bbed72ab22ee005ecb6ecdc657a8018e1db
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Wed Jan 14 19:28:47 2015 +0800

    ACPI / EC: Fix issues related to the SCI_EVT handling


shipped in Linux-4.0-rc1
closed.
Comment 249 Elvis Pranskevichus 2018-01-22 04:12:14 UTC
commit 4c237371f290d1ed3b2071dd43554362137b1cce
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Wed Jan 4 11:17:17 2017 +0800

    ACPI / EC: Remove old CLEAR_ON_RESUME quirk

The above reintroduced this bug on my NP900x4c.  Reverting the commit makes things work again.
Comment 250 Balazs Varga 2018-02-10 20:13:46 UTC
(In reply to Elvis Pranskevichus from comment #249)
> commit 4c237371f290d1ed3b2071dd43554362137b1cce
> Author: Lv Zheng <lv.zheng@intel.com>
> Date:   Wed Jan 4 11:17:17 2017 +0800
> 
>     ACPI / EC: Remove old CLEAR_ON_RESUME quirk
> 
> The above reintroduced this bug on my NP900x4c.  Reverting the commit makes
> things work again.

Confirmed, the bug is back again on samsung latops :-(


https://bbs.archlinux.org/viewtopic.php?pid=1767061#p1767061
Comment 251 Mark Syms 2018-02-11 07:50:57 UTC
Not exactly surprising when the change that fixed then gets reverted out of the kernel. Seems like Intel trying to force obsolescence of these systems.
Comment 252 Balazs Varga 2018-02-11 11:02:19 UTC
(In reply to Mark Syms from comment #251)
> Not exactly surprising when the change that fixed then gets reverted out of
> the kernel. Seems like Intel trying to force obsolescence of these systems.

what has to be done to bring the fix back into the kernel code?
I mean at the first glance it was only for `Samsung hardware`, so I guess it does not affect any other vendors/hardwares.
The mentioned commit looks like just a `code cleanup` changes. but I'm not familiar with the kernel code. 

Shall I file a new issue or could this be opened again?
Comment 253 Zhang Rui 2018-02-12 03:09:31 UTC
Lv is not working on this now, and I'm new to the EC code, first let's confirm a quick revert patch.
Comment 254 Zhang Rui 2018-02-12 03:10:40 UTC
Created attachment 274121 [details]
revert patch

please confirm
1. the problem exists with the latest upstream kernel, say, 4.16-rc1
2. the problem is gone after applying the patch attached
Comment 255 Ortwin Glück 2018-02-12 21:17:17 UTC
I can confirm that the regression exists with 4.15.3 and that your patch fixes it. 4.16 not tested.
Comment 256 Francisco Cribari 2018-02-17 14:37:26 UTC
I am experiencing the same problem (Arch Linux, kernel 4.15.3, KDE). Hardware: Samsung laptop model 900X3L. Will the patch be included in the Linux kernel?
Comment 257 Balazs Varga 2018-02-25 10:29:59 UTC
Created attachment 274447 [details]
revert-patch works

confirmed, pacth is working on top of `4.15`
Comment 258 Francisco Cribari 2018-04-12 00:04:03 UTC
I am still facing this problem with Arch Linux + kernel 4.16.0-2. Hardware is a Samsung laptop (model: NP900X3L-KW1BR). Suggestions are welcome.
Comment 259 Francisco Cribari 2018-04-17 01:21:27 UTC
I compiled kernels 4.16.0 and 4.16.2 (in Arch Linux) using the patch in Comment #254 and I confirm that the patch fixes the problem. (Hardware: Samsung model NP900X3L-KW1BR.)
Comment 260 Chen Yu 2018-06-13 03:07:06 UTC
Hi @Rui,
may I know if the patch #Comment 254 will be pushed upstream ?
Comment 261 moraus 2018-07-13 08:37:46 UTC
Hi

Poll on status for this bug. Does anyone know if/when it will be fixed upstream?
Comment 262 moraus 2018-07-13 08:39:33 UTC
Hi

Poll on status for this bug. Does anyone know if/when it will be fixed upstream?
Comment 263 Francisco Cribari 2018-08-08 11:04:04 UTC
Has this been fixed upstream? Despite what we read at  

https://www.systutorials.com/linux-kernels/59801/acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling-linux-4-14-3/

some people still face the problem. Thank you.
Comment 264 Zhang Rui 2018-08-28 07:48:24 UTC
sorry that I thought I have already sent this patch upstream, but apparently I didn't.


(In reply to Francisco Cribari from comment #263)
> Has this been fixed upstream? Despite what we read at  
> 
> https://www.systutorials.com/linux-kernels/59801/acpi-ec-fix-regression-
> related-to-triggering-source-of-ec-event-handling-linux-4-14-3/
> 
I can not access this page.
But if you can confirm the problem still exists in 4.19-rc1, and the revert indeed fixes it, I will send the revert patch out.
Comment 265 Zhang Rui 2018-09-27 07:03:44 UTC
Created attachment 278797 [details]
revert patch on top of 4.19-rc4

please confirm this revert patch works for you on top of 4.19-rc kernel.
I will send it out after got your confirmation.
Comment 266 Ortwin Glück 2018-09-27 20:36:51 UTC
(In reply to Zhang Rui from comment #265)
> please confirm this revert patch works for you on top of 4.19-rc kernel.
> I will send it out after got your confirmation.

It does! Also the CPU max frequency is now back to 2.4GHz after unplug/replug cycle.

VANILLA 4.19-rc4

plugged:
bat ~ # cat /sys/class/power_supply/ADP1/online
1
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging

unplugged:
bat ~ # cat /sys/class/power_supply/ADP1/online
0
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging


PATCHED 4.19-rc4

plugged:
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging

unplugged:
bat ~ # cat /sys/class/power_supply/ADP1/online 
0
bat ~ # cat /sys/class/power_supply/BAT1/status 
Discharging

replugged:
bat ~ # cat /sys/class/power_supply/ADP1/online
1
bat ~ # cat /sys/class/power_supply/BAT1/status
Charging
Comment 267 moraus 2018-10-18 09:19:36 UTC
(In reply to Zhang Rui from comment #265)
> Created attachment 278797 [details]
> revert patch on top of 4.19-rc4
> 
> please confirm this revert patch works for you on top of 4.19-rc kernel.
> I will send it out after got your confirmation.

Hi Rui,

Any chance this will hit 4.19 rc before final release?
Comment 268 Ortwin Glück 2019-01-07 14:53:54 UTC
ping... what else is required to get this merged?
Comment 269 Pablo Caron 2019-01-22 13:19:54 UTC
The bug is still present in kernel 4.20. I am using Ubuntu and installed the "standard" kernel downloaded from:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20/
Comment 270 Michael J Gruber 2019-02-27 18:44:22 UTC
Same problem on 4.20.11 on Fedora 29 (which has a patched upower, so it's not that other bug).

With the revert patch fromm comment #265, everything works as expected (and as it used to before the regression).

Please let me know if you need more info.

In case someone wants to test on Fedora:

https://copr.fedorainfracloud.org/coprs/mjg/kernel-book9/
Comment 271 Zhang Rui 2019-02-28 00:06:49 UTC
The revert patch has been merged by Rafael, and it will show up in v5.1
Comment 272 Zhang Rui 2019-02-28 00:07:38 UTC
https://patchwork.kernel.org/patch/10791837/
Comment 273 Michael J Gruber 2019-04-23 14:21:38 UTC
(In reply to Zhang Rui from comment #271)
> The revert patch has been merged by Rafael, and it will show up in v5.1

Just for anyone wondering: it's in the stable kernel 5.0.9, also.
Comment 274 Pablo Caron 2019-04-23 17:13:35 UTC
(In reply to Michael J Gruber from comment #273)
> (In reply to Zhang Rui from comment #271)
> > The revert patch has been merged by Rafael, and it will show up in v5.1
> 
> Just for anyone wondering: it's in the stable kernel 5.0.9, also.

I can confirm that the bug has been fixed in the stable kernel 5.0.9 downloaded from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.9/

Thank you!

You can find some test I did to check the correct behavior.

~$ uname -sr
Linux 5.0.9-050009-generic
Plugged
~$ cat /sys/class/power_supply/BAT1/status
Charging

Unplugged
~$ cat /sys/class/power_supply/ADP1/online
0
~$ cat /sys/class/power_supply/BAT1/status 
Discharging

Replugged
~$ cat /sys/class/power_supply/ADP1/online
1
~$ cat /sys/class/power_supply/BAT1/status
Charging