Bug 77431
Summary: | ACPI Events not being reported to OS without intel_idle.max_cstate=0 - Notebook Clevo w350etq | ||
---|---|---|---|
Product: | ACPI | Reporter: | qbanin |
Component: | EC | Assignee: | Lv Zheng (lv.zheng) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | lenb, lv.zheng, rui.zhang, tianyu.lan |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.14.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
acpidump log
Dmesg output Dmesg with ec.c dirty patch applied dmesg_broken_acpi dmesg_broken_acpi updated dmesg_broken_acpi_after_bios_reset dmesg_broken_acpi_3 dmesg_ec_flag_msi ec.patch dmesg_patched_ec Dmesg with applied patch from #37 dmicode output ec.patch [PATCH] Debugging if udelay is required by this hardware [PATCH] Debugging if BURST mode can fix this issue The EC event polling implementation |
Description
qbanin
2014-06-06 18:47:51 UTC
Please provide the output of acpidump. http://pastebin.com/hKUPnDy4 After lot of googling and trying different boot parameters I found out that this issue is somehow r/elated to CPU's C-states. Added "intel_idle.max_cstate=0" to the grub 2 days ago and the ACPI haven't stuck anymore. Now I'm booting my PC with "intel_idle.max_cstate=0 processor.max_cstate=0 idle=mwait" and will keep testing. I belive that the con of this solution is higher power usage due to the CPU doesn't enter C7 state? It'd be nice if this issue could be fixed by software patch because there's no BIOS nor EC firmware upgrade available for my notebook. Regards Qba please attach the acpidump to bugzilla report as I can not access the pastebin page. I don't know why this could be related with Intel idle, Len, have you seen similar problems before? Created attachment 139061 [details]
acpidump log
Today "my" workaround for this issue stopped working :( and I have no idea why. Could you apply the following patch to open EC debug option and send out the dmesg? diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index ad11ba4..0708d71 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -27,7 +27,7 @@ */ /* Uncomment next line to get verbose printout */ -/* #define DEBUG */ +#define DEBUG #define pr_fmt(fmt) "ACPI : EC: " fmt #include <linux/kernel.h> Created attachment 140941 [details]
Dmesg output
Dmesg captured after fresh boot + few brightness changes and AC adapter plug/unplug. ACPI events reported correctly (so far).
Created attachment 140951 [details]
Dmesg with ec.c dirty patch applied
Regarding comment #8. With debug enabled vanilla ec.c I can change brightness and my notebooks reacts to AC unplug but "acpi_listen" output is empty, brightness level bar in KDE is missing and battery applet is still showing "charging 100%" even if AC is unplugged. 3 days ago right after my previous message I applied a dirty patch to ec.c: *** linux-3.14.8/drivers/acpi/ec.c.orig 2014-06-16 22:41:19.000000000 +0200 --- linux-3.14.8/drivers/acpi/ec.c 2014-06-26 00:34:17.489809373 +0200 *************** *** 1032,1037 **** --- 1032,1046 ---- { ec_clear_on_resume, "Samsung hardware", { DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD.")}, NULL}, + { + ec_clear_on_resume, "CLEVO hardware", { + DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL}, + { + ec_skip_dsdt_scan, "CLEVO hardware", { + DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL}, + { + ec_flag_msi, "CLEVO hardware", { + DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL}, {}, }; ------------------- This "patch" solved all my ACPI related issues (except ec_clear_on_resume, it doesn't change anything). Forgive me the mess in my reports. I hope they will help you to locate the bug. :) Please see attached second debug-enabled dmesg output from my previous message with this patch applied. Regards Qba (In reply to qbanin from comment #9) > Regarding comment #8. > > With debug enabled vanilla ec.c I can change brightness and my notebooks > reacts to AC unplug Do you mean these functions work just after enabling debug option? > but "acpi_listen" output is empty, brightness level bar > in KDE is missing and battery applet is still showing "charging 100%" even > if AC is unplugged. > > 3 days ago right after my previous message I applied a dirty patch to ec.c: > > *** linux-3.14.8/drivers/acpi/ec.c.orig 2014-06-16 22:41:19.000000000 > +0200 > --- linux-3.14.8/drivers/acpi/ec.c 2014-06-26 00:34:17.489809373 +0200 > *************** > *** 1032,1037 **** > --- 1032,1046 ---- > { > ec_clear_on_resume, "Samsung hardware", { > DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD.")}, NULL}, > + { > + ec_clear_on_resume, "CLEVO hardware", { > + DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL}, > + { > + ec_skip_dsdt_scan, "CLEVO hardware", { > + DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL}, > + { > + ec_flag_msi, "CLEVO hardware", { > + DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL}, > {}, > }; > > > ------------------- > > This "patch" solved all my ACPI related issues (except ec_clear_on_resume, > it doesn't change anything). Forgive me the mess in my reports. I hope they > will help you to locate the bug. :) Please see attached second debug-enabled > dmesg output from my previous message with this patch applied. Could you try the following patchset? http://marc.info/?l=linux-acpi&m=140279290812659&w=2 http://marc.info/?l=linux-acpi&m=140279291012660&w=2 http://marc.info/?l=linux-acpi&m=140279292712665&w=2 http://marc.info/?l=linux-acpi&m=140279300712697&w=2 http://marc.info/?l=linux-acpi&m=140279300012696&w=2 http://marc.info/?l=linux-acpi&m=140279299612694&w=2 http://marc.info/?l=linux-acpi&m=140279299312692&w=2 > > Regards > Qba (In reply to Lan Tianyu from comment #10) > Do you mean these functions work just after enabling debug option? No :) By enabling debug only I mean without my dirty patch. > > Could you try the following patchset? > > http://marc.info/?l=linux-acpi&m=140279290812659&w=2 > http://marc.info/?l=linux-acpi&m=140279291012660&w=2 > http://marc.info/?l=linux-acpi&m=140279292712665&w=2 > http://marc.info/?l=linux-acpi&m=140279300712697&w=2 > http://marc.info/?l=linux-acpi&m=140279300012696&w=2 > http://marc.info/?l=linux-acpi&m=140279299612694&w=2 > http://marc.info/?l=linux-acpi&m=140279299312692&w=2 > > > > > Ok, I'll try it and let you know. Applied your patches except 6/7 which seems to be already included in 3.14.8. Everything was working fine for about 1h. Now the issue is back (no ACPI events reported, no reaction to brightness up/down buttons nor AC (un)plug). (In reply to qbanin from comment #12) > Applied your patches except 6/7 which seems to be already included in > 3.14.8. Everything was working fine for about 1h. Now the issue is back (no > ACPI events reported, no reaction to brightness up/down buttons nor AC > (un)plug). Do you mean it worked normally for 1 hour and then broke again? (In reply to Lan Tianyu from comment #13) > (In reply to qbanin from comment #12) > > Applied your patches except 6/7 which seems to be already included in > > 3.14.8. Everything was working fine for about 1h. Now the issue is back (no > > ACPI events reported, no reaction to brightness up/down buttons nor AC > > (un)plug). > > Do you mean it worked normally for 1 hour and then broke again? Yes, but it wasn't exactly 1 hour. I was checking it for about 10 mins after fresh boot and it was working fine, then rechecked after ~1h+ and it wasn't working anymore. Could you show the dmesg with EC debug option after not working? BTW, please add kernel parameter "log_buf_len=10M" to increase the log buffer. Created attachment 141081 [details]
dmesg_broken_acpi
This time it didn't work at all after boot.
From the log, there is EC event for thermal zone. Did you plug/unplug AC? [ 47.439317] ACPI : EC: ===== TASK ===== [ 47.439320] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 47.439321] ACPI : EC: EC_SC(W) = 0x84 [ 47.440283] ACPI : EC: ===== TASK ===== [ 47.440289] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 47.440292] ACPI : EC: EC_DATA(R) = 0x1c <== event number. [ 47.440294] ACPI : EC: push query execution (0x1c) on queue [ 47.440296] ACPI : EC: transaction end [ 47.440300] ACPI : EC: start query execution [ 47.440544] ACPI : EC: transaction start (cmd=0x82, addr=0x00) Method (_Q1C, 0, NotSerialized) // _Qxx: EC Query { P8XH (Zero, 0x1C) Notify (\_TZ.TZ0, 0x80) Notify (\_TZ.TZ0, 0x81) ADJP () } I think I did, but I'm not sure. At last, I don't see EC event for AC. Could you doublecheck again and attach the log? Any update? Created attachment 142071 [details]
dmesg_broken_acpi updated
Booted up my netbook -> logged into KDE -> tried to change brightness up/down few times (no reaction) -> unplugged and replugged AC (no reaction) -> caputured dmesg.
UPDATE: Unplugged and replugged AC twice I see one EC event for thermal in the log and no for AC or Backlight. Could you reset Bios? Don't know why no event for AC and Backlight. This bug has different symptoms with the issue on Samsung machine because thermal event is available. Created attachment 142271 [details] dmesg_broken_acpi_after_bios_reset Bios reset + steps from comment #21 . The only difference was brightness change ONCE to lower value after ~1 min from button press. The rest was the same (no reaction to AC, brightness up/down). Yes, I saw some events for backlight in the log. [ 151.157294] ACPI : EC: push query execution (0x1c) on queue Method (_Q11, 0, NotSerialized) // _Qxx: EC Query { If (LEqual (^^^GFX0.CDDS (0x0410), 0x1F)) { P8XH (Zero, 0x11) Notify (^^^GFX0.LCD0, 0x87) <== notify video driver. If (LEqual (ECOS, 0x02)) { Store (0xE0, ^^^^WMI.EVNT) Notify (WMI, 0xD0) } Else { Add (OEM2, 0xE0, ^^^^WMI.EVNT) Notify (WMI, 0xD0) } } } So far, I don't have idea since the EC event is triggered by GPE and totally depends on hardware. Further more, the thermal event works. How about your previous quirk workaround in the comment 9? Does it still work? (In reply to Lan Tianyu from comment #25) > Yes, I saw some events for backlight in the log. > > [ 151.157294] ACPI : EC: push query execution (0x1c) on queue Sorry. Attached wrong log and the following one is for backlight. [ 151.159291] ACPI : EC: push query execution (0x11) on queue (In reply to Lan Tianyu from comment #25) > How about your > previous quirk workaround in the comment 9? Does it still work? Yes, it works perfecly. No issues so far. How about just add ec_skip_dsdt_scan quirk on the new patchset in the comment 11 which has been merged into linux-pm tree? Created attachment 142441 [details]
dmesg_broken_acpi_3
Done. Compiled kernel with your patchsets + ec_skip_dsdt_scan quirk and performed the same testing procedure as usual. System reactes to ~50% of events. I managed to change brightnes down/up twice or so, and un(re)plug AC once. System reaction to ACPI events it's either delayed by 15+ seconds or unregistered at all.
It looks like ec_skip_dsdt_scan is not a cure to my issue.
(In reply to qbanin from comment #29) > Created attachment 142441 [details] > dmesg_broken_acpi_3 > > Done. Compiled kernel with your patchsets + ec_skip_dsdt_scan quirk and > performed the same testing procedure as usual. System reactes to ~50% of > events. Thanks. This means ec_skip_dsdt_scan does some thing to make some events work. But "ec_skip_dsdt_scan" just makes EC device to be probed later. > I managed to change brightnes down/up twice or so, and un(re)plug AC > once. System reaction to ACPI events it's either delayed by 15+ seconds or > unregistered at all. What do you mean "unregistered at all"? The events were sent 15s later? > > It looks like ec_skip_dsdt_scan is not a cure to my issue. Could you try just add "ec_flag_msi" quirk? (In reply to Lan Tianyu from comment #30) > What do you mean "unregistered at all"? The events were sent 15s later? I mean If I press brightness down btn, there's 50% chance the screen will dim after ~15s or nothing will happen :D > > Could you try just add "ec_flag_msi" quirk? Will try soon and report back. Created attachment 142591 [details]
dmesg_ec_flag_msi
Well... it looks like "ec_flag_msi" fixed my issue.
How about commenting the following lines? diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index ff16132..23ee51e 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -281,8 +281,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec, { unsigned long tmp; int ret = 0; - if (EC_FLAGS_MSI) - udelay(ACPI_EC_MSI_UDELAY); +// if (EC_FLAGS_MSI) +// udelay(ACPI_EC_MSI_UDELAY); /* start transaction */ spin_lock_irqsave(&ec->lock, tmp); /* following two actions should be kept atomic */ (In reply to Lan Tianyu from comment #33) > How about commenting the following lines? > > diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c > index ff16132..23ee51e 100644 > --- a/drivers/acpi/ec.c > +++ b/drivers/acpi/ec.c > @@ -281,8 +281,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec > *ec, > { > unsigned long tmp; > int ret = 0; > - if (EC_FLAGS_MSI) > - udelay(ACPI_EC_MSI_UDELAY); > +// if (EC_FLAGS_MSI) > +// udelay(ACPI_EC_MSI_UDELAY); > /* start transaction */ > spin_lock_irqsave(&ec->lock, tmp); > /* following two actions should be kept atomic */ After commenting out these 2 lines issue is back. Created attachment 142891 [details]
ec.patch
Please try this patch.
Created attachment 142941 [details] dmesg_patched_ec Tried. Symptoms are more or less the same as in comment #31. Thanks for test. I found there is no EC interrupt any morewhen the bug take places. Please try the following patch. diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index a66ab65..e30bfb1 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -264,9 +264,9 @@ static int ec_poll(struct acpi_ec *ec) msecs_to_jiffies(1))) return 0; } - spin_lock_irqsave(&ec->lock, flags); - (void)advance_transaction(ec); - spin_unlock_irqrestore(&ec->lock, flags); +// spin_lock_irqsave(&ec->lock, flags); +// (void)advance_transaction(ec); +// spin_unlock_irqrestore(&ec->lock, flags); } while (time_before(jiffies, delay)); pr_debug("controller reset, restart transaction\n"); spin_lock_irqsave(&ec->lock, flags); This patch cannot be apllied to 3.14.8 ------------ patching file drivers/acpi/ec.c patch unexpectedly ends in middle of line Hunk #1 FAILED at 264. 1 out of 1 hunk FAILED -- saving rejects to file drivers/acpi/ec.c.rej ------------ This function is different in 3.14.8 static int ec_poll(struct acpi_ec *ec) { unsigned long flags; int repeat = 5; /* number of command restarts */ while (repeat--) { unsigned long delay = jiffies + msecs_to_jiffies(ec_delay); do { /* don't sleep with disabled interrupts */ if (EC_FLAGS_MSI || irqs_disabled()) { udelay(ACPI_EC_MSI_UDELAY); if (ec_transaction_done(ec)) return 0; } else { if (wait_event_timeout(ec->wait, ec_transaction_done(ec), msecs_to_jiffies(1))) return 0; } advance_transaction(ec, acpi_ec_read_status(ec)); } while (time_before(jiffies, delay)); pr_debug("controller reset, restart transaction\n"); spin_lock_irqsave(&ec->lock, flags); start_transaction(ec); spin_unlock_irqrestore(&ec->lock, flags); } return -ETIME; } You can comment the advance_transaction() line directly. But it's better that you test newest code from linux-pm tree bleeding-edge branch. git clone git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git git checkout bleeding-edge Hi: Any update? Created attachment 145571 [details]
Dmesg with applied patch from #37
After reboot unplugged and replugged AC twice (immediate reaction), lowered brightness by few steps (immediate reaction) but then my system frozen for ~20s (no screen updates, no reaction to input from keyboard nor mouse). After 20s everything resumed but no more acpi events were reported to OS. Attached dmesg captured after resume from softlock.
Hi, Please provide output of dmidecode. Created attachment 147181 [details]
dmicode output
Created attachment 147191 [details]
ec.patch
Please try this patch. It adds MSI quirk for your machine.
Hi, I checked 141081, since these log entries: [ 3.336688] ACPI : EC: transaction start (cmd=0x83, addr=0x00) [ 3.336688] ACPI : EC: ===== TASK ===== [ 3.336691] ACPI : EC: EC_SC(R) = 0x10 SCI_EVT=0 BURST=1 CMD=0 IBF=0 OBF=0 [ 3.336692] ACPI : EC: EC_SC(W) = 0x83 [ 3.336698] ACPI : EC: ===== IRQ ===== [ 3.336701] ACPI : EC: EC_SC(R) = 0x18 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=0 [ 3.336706] ACPI : EC: EC_SC(R) = 0x18 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=0 [ 3.336713] ACPI : EC: EC_SC(R) = 0x18 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=0 [ 3.336714] ACPI : EC: transaction end [ 3.336753] ACPI : EC: ===== IRQ ===== [ 3.336755] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 [ 3.336758] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 [ 3.336773] ACPI : EC: ===== IRQ ===== [ 3.336777] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 [ 3.336779] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 The last command handled using interrupt mode was a BD_EC command to disable EC BURST mode. I also checked 142071, since these log entries: [ 3.715777] ACPI : EC: transaction start (cmd=0x80, addr=0x10) [ 3.715779] ACPI : EC: ===== TASK ===== [ 3.715782] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 [ 3.715783] ACPI : EC: EC_SC(W) = 0x80 [ 3.716144] ACPI : EC: ===== TASK ===== [ 3.716147] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 [ 3.716148] ACPI : EC: EC_DATA(W) = 0x10 [ 3.716151] ACPI : EC: ===== IRQ ===== [ 3.716156] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0 [ 3.716159] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0 [ 3.716175] ACPI : EC: ===== IRQ ===== [ 3.716180] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0 [ 3.716183] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0 [ 3.716280] ACPI : EC: ===== IRQ ===== [ 3.716283] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1 [ 3.716286] ACPI : EC: EC_DATA(R) = 0x05 [ 3.716289] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 [ 3.716297] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 [ 3.716298] ACPI : EC: transaction end [ 3.716307] ACPI : EC: ===== IRQ ===== [ 3.716311] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 [ 3.716314] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 The last command handled using interrupt mode was a RD_EC to read from address 0x10. For both cases, BURST mode has been enabled for several times. After the listed log entries, no more PE IRQ generated by the platform can be seen in the log. And since then, the SCI_EVT has never been set. By using MSI quirk, we are forcing all commands to be issued in the BURST mode and there is code to handle some timing requirements for such firmware. This quirk has 3 things implemented: 1. delaying 500us between transactions in acpi_ec_transaction_unlocked() 2. delaying 500us instead of waiting 1ms in task context before advancing transaction in ec_poll() 3. forcing every command to be issued in BURST mode in acpi_ec_space_handler() From comment 34, I learned that with 2 and 3 and without 1, this bug cannot be fixed. From comment 36, I learned that with 1 and without 2 and 3, this bug cannot be fixed. So why don't we try a combination of 1 + 2 to make sure this is just a timing issue? If it was fixed by forcing BURSE mode, then root causes could be others. Let me post a debug patch later after this comment. We also need to check SMI_EVT flag for this bug. Thanks and best regards -Lv Created attachment 147201 [details] [PATCH] Debugging if udelay is required by this hardware This patch is generated to perform a test: 1. based on the working environment reported in comment 32 2. remove timing code Please check if this patch can work for your platform. Created attachment 147211 [details] [PATCH] Debugging if BURST mode can fix this issue If 147201 cannot fix this issue, please try this patch without 147201 applied. This patch is generated to perform a test: 1. based on the working environment reported in comment 32 2. remove burst mode code Please check if this patch can work for your platform. It's better you can post dmesg for both tests. Thanks in advance. I'm sorry but I'm a bit confised. Which patch should I try in first place: 147191, 147201 or 147211 or all of them? :) (In reply to qbanin from comment #49) > I'm sorry but I'm a bit confised. Which patch should I try in first place: > 147191, 147201 or 147211 or all of them? :) Hi, Please try all of them. But don't apply them all. Please apply only one patch and make sure others are not applied for each try. Thanks and best regards -Lv (In reply to qbanin from comment #49) > I'm sorry but I'm a bit confised. Which patch should I try in first place: > 147191, 147201 or 147211 or all of them? :) Could you test 147191 separately? I would upstream this one first. Hi, I'm afraid this problem might still need to be root caused. It's very appreciated that we can have the testability brought by you and such a platform to help us to root cause the flag_msi quirk. The quirk seems to be wrong, as it looks like there is such an EC firmware in the world that dosn't follow the specification and doesn't correctly implement the IBF/OBF controlled firmware<->driver communication protocol. I doubt how can this still be an ACPI product. To root cause it, we need 1. trace whether the bug can be exactly fixed only by the udelay() timing code, so I need you to try attachment 147201 [details] and attachment 147211 [details]. 2. trace if GPE or SCI is disabled, for this, could you also provide the following information. A. When the system stops responding to the events, please do: # cat /sys/firmware/acpi/interrupts/gpe17 According to your dmesg output, the GPE 0x17 is used for the EC: ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62 ^^^^ You can replace 17 to other number according to the dmesg for your boot. B. Track if the ACPI SCI interrupt has been disabled. We can see irq count in the following files. # cat /proc/interrupts # cat /proc/irq/<x>/spurious The number of "x" can be obtained from /proc/interrupts, if you can see such an entry in /proc/interrupts for "acpi": 9: 5954 0 IO-APIC-fasteoi acpi ^ ^^^^ Then the "x" is 9. You can cat these 2 files several times to see if the "acpi" interrupts can still increase and the unhandled irqs are increased after the system stops responding to the events. Please also report the observation result here by unpload 2 different output results of these 2 files. Also please provide full name in your reply once so that we can have it correctly filled for the Reported-and-tested-by field in the patch. :-) Thanks in advance and best regards -Lv (In reply to Lan Tianyu from comment #51) > (In reply to qbanin from comment #49) > > I'm sorry but I'm a bit confised. Which patch should I try in first place: > > 147191, 147201 or 147211 or all of them? :) > > Could you test 147191 separately? I would upstream this one first. Yes, please confirm this patch first so that the quirk can be upstreamed. And hope we still can have you performing tests for us to root cause this issue after that. Best regards -Lv Hi, I got some clues for the root cause of this issue. It seems your platform is still able to handle EC events when the EC GPE is disabled: [ 47.439317] ACPI : EC: ===== TASK ===== [ 47.439320] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 47.439321] ACPI : EC: EC_SC(W) = 0x84 [ 47.440283] ACPI : EC: ===== TASK ===== [ 47.440289] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 47.440292] ACPI : EC: EC_DATA(R) = 0x1c So there is no special timing requirement for this issue. We just don't know why the GPE is disabled. Note that in the current EC driver, event polling can only happen after an EC transaction is completed. So if your platform or the driver enters "GPE disabled" mode, and there is no EC command issued, the EC driver will not be able to handle EC event again. I think this issue can be fixed by this patchset: https://lkml.org/lkml/2014/7/21/43 In this patchset, we have a seperate thread to poll EC event timely. Let me prepare a tarball of this patchset for you to test. Thanks and best regards -Lv Created attachment 147711 [details] The EC event polling implementation This patchset contains SCI_EVT polling mode implementation. It's a big patchset based on some GPE API improvements. So you need to apply all of them. Sorry for the inconvenience. This patchset is a revised ones for the following 2 patchsets: https://lkml.org/lkml/2014/7/14/901 https://lkml.org/lkml/2014/7/21/43 They are under internal review. You need to apply the following patches orderly: lv-gpe01.patch lv-gpe02.patch lv-gpe03.patch lv-gpe04.patch lv-gpe05.patch lv-gpe06.patch lv-gpe07.patch lv-gpe08.patch lv-gpe09.patch lv-gpe10.patch lv-gpe11.patch lv-gpe12.patch lv-gpe13.patch ec-event01.patch ec-event02.patch ec-event03.patch ec-event04.patch ec-event05.patch ec-event06.patch ec-event07.patch ec-event08.patch ec-event09.patch ec-event10.patch ec-event11.patch ec-event12.patch ec-event13.patch ec-event14.patch The ec-event15.patch and ec-event16.patch are not necessary. If you have quilt installed, then you can: # cd linux # ln -s <path to uncompressed folder>/ec-patches ./patches # quilt push -a Please give this a try and post the dmesg output here to see if the bugs can be fixed for your system. If it's still not fixed, please try a boot parameter "acpi.ec_poll_events=Y" and post the dmesg output here. Well this is just a patchset that can make your platform working. We still haven't root caused it. To root cause it, we need the test result mentioned in the comment 52. For test 2 in comment 52, you can try the test by using the kernel compiled with this patchset applied, so that more information including hardware GPE register's current settings can be dumped from /sys/firware/acpi/events/gpe17. Thanks in advance. Best regards -Lv (In reply to Lan Tianyu from comment #51) > (In reply to qbanin from comment #49) > > I'm sorry but I'm a bit confised. Which patch should I try in first place: > > 147191, 147201 or 147211 or all of them? :) > > Could you test 147191 separately? I would upstream this one first. Hi Qbanin: Could you try this first? It's much simple and just like the quirk patch you have tried. I'm sorry it took so long. 147191 works fine, just like my dirty quirk. Will post results of the remaining patches soon. Im having problem with applying patches from comment #55 on top 3.16.1. Few hunks got rejected and the bleeding-edge kernel doesn't compile :( (In reply to qbanin from comment #58) > Im having problem with applying patches from comment #55 on top 3.16.1. Few > hunks got rejected and the bleeding-edge kernel doesn't compile :( OK. I can help to rebase them on 3.16.1. I can also merge some of them to make it easier. Thanks for your testing. Best regards -Lv What's the result of the test mentioned in comment 55? I just want to find out why the GPE is disabled. The fix patch has been sent to ACPI maillist. Please help LV to test his patchset and that will be very helpful to improve EC driver. Thanks. https://patchwork.kernel.org/patch/4808561/ (In reply to Lan Tianyu from comment #61) > The fix patch has been sent to ACPI maillist. Please help LV to test his > patchset and that will be very helpful to improve EC driver. Thanks. > > https://patchwork.kernel.org/patch/4808561/ Not a problem, but I need a working patchset for 3.16.1. I have not enough skills to modify it by myself :( I think LV can give you a branch and you just need to run "git pull". LV, do you have such branch? Hi, Sorry for the delay. Could you please try the following git repository: # git clone https://github.com/zetalog/linux # git checkout ec-next # copy <your old kernel directory>/.config ./linux/.config I've merged all posted patches on top of recent Rafael's linux-pm.git/linux-next branch. Please use this kernel and your previous kernel configuration to try again. Thanks in advance -Lv Could you help to try the upstream kernel with this quirk disabled on your platform. We have another commit to ensure the register access guarding in the wait polling mode: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9e295ac So it should be safe now to use wait polling instead of busy polling for your platform. Thanks and best regards -Lv The quirk was added for this machine in Linux-3.17-rc4: commit 777cb382958851c88763253fe00a26529be4c0e9 Author: Lan Tianyu <tianyu.lan@intel.com> Date: Fri Aug 29 10:50:08 2014 +0800 ACPI / EC: Add msi quirk for Clevo W350etq The quirk was removed and replaced, hopefully, by a fixed EC driver in Linux 4.2-rc1: commit 3174abcfea6a05aa25038156d6722b6c8876fb36 Author: Lv Zheng <lv.zheng@intel.com> Date: Fri May 15 14:37:11 2015 +0800 ACPI / EC: Remove non-root-caused busy polling quirks. Please re-open if this machine does not work with above kernels. |