Bug 107191

Summary: Macbook ACPI EC intermittent boot stalls
Product: ACPI Reporter: Chris Bainbridge (chris.bainbridge)
Component: Power-BatteryAssignee: acpi_power-battery
Status: CLOSED CODE_FIX    
Severity: normal CC: aaron.lu, lv.zheng
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.3.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg.bad.gz

Description Chris Bainbridge 2015-11-04 22:24:48 UTC
Created attachment 192131 [details]
dmesg.bad.gz

It appears that the fix from bug #94651 was not sufficient to completely prevent boot stalls caused by the ACPI EC.

I have captured dmesg (with ec.c DEBUG) from an example stall. The dmesg log is annotated with the smb_hc_* calls from sbshc.c and with a stack trace at the end of the stall. If you highlight "smb_hc" you will see the start of each individual call in to the ec code.

The dmesg shows a smb_hc_write operation beginning at 0.377764 seconds which causes a stall until 232.4 seconds.

Note that, to save space in the dmesg log, I only capture the first 100,000 lines of EC debug output (ec.c appears to be stuck in a loop outputting the same repeated lines thousands of times per second).
Comment 1 Lv Zheng 2015-11-05 02:04:56 UTC
Hi,

I cannot see any EC failure in the log:
[    0.377764] ACPI: smb_hc_write ACPI_SMB_PROTOCOL(0)=09
[    0.377764] ACPI : EC: 2: Increase command
[    0.377765] ACPI : EC: ***** Command(WR_EC) started *****
[    0.377766] ACPI : EC: ===== TASK (3) =====
[    0.377771] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    0.377772] ACPI : EC: EC_SC(W) = 0x81
[    0.377790] ACPI : EC: ===== IRQ (2) =====
[    0.377796] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[    0.377797] ACPI : EC: EC_DATA(W) = 0x20
[    0.377815] ACPI : EC: ===== IRQ (2) =====
[    0.377821] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    0.377822] ACPI : EC: EC_DATA(W) = 0x09
[    0.377833] ACPI : EC: ===== TASK (3) =====
[    0.377843] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    0.377844] ACPI : EC: ===== IRQ (2) =====
[    0.377849] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    0.377850] ACPI : EC: ***** Command(WR_EC) stopped *****
[    0.377851] ACPI : EC: 1: Decrease command
...
[    1.497723] ACPI : EC: 2: Increase command
[    1.497724] ACPI : EC: ***** Command(RD_EC) started *****
[    1.497726] ACPI : EC: ===== TASK (1) =====
[    1.497744] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[    1.497745] ACPI : EC: EC_SC(W) = 0x80
[    1.497766] ACPI : EC: ===== IRQ (2) =====
[    1.497772] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[    1.497773] ACPI : EC: EC_DATA(W) = 0x21
[    1.497823] ACPI : EC: ===== IRQ (2) =====
[    1.497832] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1
[    1.497836] ACPI : EC: EC_DATA(R) = 0x00
[    1.497844] ACPI : EC: ***** Command(RD_EC) stopped *****
[    1.497846] ACPI : EC: 1: Decrease command
[    1.497848] ACPI : EC: 2: Increase command
[    1.497850] ACPI : EC: ***** Command(RD_EC) started *****
[    1.497852] ACPI : EC: ===== TASK (1) =====
[    1.497871] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[    1.497872] ACPI : EC: EC_SC(W) = 0x80
[    1.497891] ACPI : EC: ===== IRQ (2) =====

=CUT=

No IRQ loss, no transaction failure, no no-op IRQs...
This simply looks like a gap in the smart battery driver.
Some control methods that access the EC opregion field are evaluated at the wrong state.
IMO, this is a smart battery bug, EC driver has nothing to do here.

Thanks
-Lv
Comment 2 Lv Zheng 2015-11-05 02:09:46 UTC
If you looked into the details, you could see the smart battery driver was executing a control method that wait the EC register(0x21) to return none zero value.
While RD_EC(0x21) (EC read command) always return 0x00.
[    1.497723] ACPI : EC: 2: Increase command
[    1.497724] ACPI : EC: ***** Command(RD_EC) started *****
[    1.497726] ACPI : EC: ===== TASK (1) =====
[    1.497744] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[    1.497745] ACPI : EC: EC_SC(W) = 0x80
[    1.497766] ACPI : EC: ===== IRQ (2) =====
[    1.497772] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[    1.497773] ACPI : EC: EC_DATA(W) = 0x21
[    1.497823] ACPI : EC: ===== IRQ (2) =====
[    1.497832] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1
[    1.497836] ACPI : EC: EC_DATA(R) = 0x00
[    1.497844] ACPI : EC: ***** Command(RD_EC) stopped *****
[    1.497846] ACPI : EC: 1: Decrease command
This definitely indicates a gap in the smart battery driver.
Maybe a state machine problem in the smb driver or a feature gap.

Thanks and best regards
-Lv
Comment 3 Aaron Lu 2015-11-09 03:17:03 UTC
According to the comments here and the comments of your patch in mailing list:
https://lkml.org/lkml/2015/11/6/776

I suppose the next step is to fix it in sbs driver? I'll move it to battery category, feel free to change it if you think this is wrong.
Comment 4 Chris Bainbridge 2015-11-09 10:50:55 UTC
Yes, this needs to be fixed in sbshc.
Comment 6 Aaron Lu 2015-12-02 02:16:54 UTC
Thanks for the update.