Created attachment 192131 [details] dmesg.bad.gz It appears that the fix from bug #94651 was not sufficient to completely prevent boot stalls caused by the ACPI EC. I have captured dmesg (with ec.c DEBUG) from an example stall. The dmesg log is annotated with the smb_hc_* calls from sbshc.c and with a stack trace at the end of the stall. If you highlight "smb_hc" you will see the start of each individual call in to the ec code. The dmesg shows a smb_hc_write operation beginning at 0.377764 seconds which causes a stall until 232.4 seconds. Note that, to save space in the dmesg log, I only capture the first 100,000 lines of EC debug output (ec.c appears to be stuck in a loop outputting the same repeated lines thousands of times per second).
Hi, I cannot see any EC failure in the log: [ 0.377764] ACPI: smb_hc_write ACPI_SMB_PROTOCOL(0)=09 [ 0.377764] ACPI : EC: 2: Increase command [ 0.377765] ACPI : EC: ***** Command(WR_EC) started ***** [ 0.377766] ACPI : EC: ===== TASK (3) ===== [ 0.377771] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 0.377772] ACPI : EC: EC_SC(W) = 0x81 [ 0.377790] ACPI : EC: ===== IRQ (2) ===== [ 0.377796] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 0.377797] ACPI : EC: EC_DATA(W) = 0x20 [ 0.377815] ACPI : EC: ===== IRQ (2) ===== [ 0.377821] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 0.377822] ACPI : EC: EC_DATA(W) = 0x09 [ 0.377833] ACPI : EC: ===== TASK (3) ===== [ 0.377843] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 0.377844] ACPI : EC: ===== IRQ (2) ===== [ 0.377849] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 0.377850] ACPI : EC: ***** Command(WR_EC) stopped ***** [ 0.377851] ACPI : EC: 1: Decrease command ... [ 1.497723] ACPI : EC: 2: Increase command [ 1.497724] ACPI : EC: ***** Command(RD_EC) started ***** [ 1.497726] ACPI : EC: ===== TASK (1) ===== [ 1.497744] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 [ 1.497745] ACPI : EC: EC_SC(W) = 0x80 [ 1.497766] ACPI : EC: ===== IRQ (2) ===== [ 1.497772] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 [ 1.497773] ACPI : EC: EC_DATA(W) = 0x21 [ 1.497823] ACPI : EC: ===== IRQ (2) ===== [ 1.497832] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1 [ 1.497836] ACPI : EC: EC_DATA(R) = 0x00 [ 1.497844] ACPI : EC: ***** Command(RD_EC) stopped ***** [ 1.497846] ACPI : EC: 1: Decrease command [ 1.497848] ACPI : EC: 2: Increase command [ 1.497850] ACPI : EC: ***** Command(RD_EC) started ***** [ 1.497852] ACPI : EC: ===== TASK (1) ===== [ 1.497871] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 [ 1.497872] ACPI : EC: EC_SC(W) = 0x80 [ 1.497891] ACPI : EC: ===== IRQ (2) ===== =CUT= No IRQ loss, no transaction failure, no no-op IRQs... This simply looks like a gap in the smart battery driver. Some control methods that access the EC opregion field are evaluated at the wrong state. IMO, this is a smart battery bug, EC driver has nothing to do here. Thanks -Lv
If you looked into the details, you could see the smart battery driver was executing a control method that wait the EC register(0x21) to return none zero value. While RD_EC(0x21) (EC read command) always return 0x00. [ 1.497723] ACPI : EC: 2: Increase command [ 1.497724] ACPI : EC: ***** Command(RD_EC) started ***** [ 1.497726] ACPI : EC: ===== TASK (1) ===== [ 1.497744] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0 [ 1.497745] ACPI : EC: EC_SC(W) = 0x80 [ 1.497766] ACPI : EC: ===== IRQ (2) ===== [ 1.497772] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 [ 1.497773] ACPI : EC: EC_DATA(W) = 0x21 [ 1.497823] ACPI : EC: ===== IRQ (2) ===== [ 1.497832] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1 [ 1.497836] ACPI : EC: EC_DATA(R) = 0x00 [ 1.497844] ACPI : EC: ***** Command(RD_EC) stopped ***** [ 1.497846] ACPI : EC: 1: Decrease command This definitely indicates a gap in the smart battery driver. Maybe a state machine problem in the smb driver or a feature gap. Thanks and best regards -Lv
According to the comments here and the comments of your patch in mailing list: https://lkml.org/lkml/2015/11/6/776 I suppose the next step is to fix it in sbs driver? I'll move it to battery category, feel free to change it if you think this is wrong.
Yes, this needs to be fixed in sbshc.
Fixed by: https://github.com/torvalds/linux/commit/add68d6aa9e2492e51707ca603ada5b26c626757 Related: https://github.com/torvalds/linux/commit/a76032e0abef027df83f7f053580aa297de54818
Thanks for the update.