I run a long run test on ubuntu linux, this test will call ACPI method to write EC RAM through ACPI EC write command(0x81). At the fail case, ACPI EC write transaction is done, but the EC RAM data is no change. Write EC Ram flow: step 1. write 0x81 to port 66 step 2. write address byte to port 62 step 3. write data byte to port 62 The fail transaction cases can be grouped into 2 classes: 1. the time interval between step 1 and 2 is shorter than 10 us 2. the time interval between step 2 and 3 is shorter than 10 us For the successful transaction cases, the time interval between 1 and 2 and 3 is longer than 100 us. fail case 1: Jun 6 15:55:10 barton-A24J kernel: [ 377.863077] Barton EC write command =0x81 Jun 6 15:55:10 barton-A24J kernel: [ 377.863086] Barton EC write data = 0x1c Jun 6 15:55:10 barton-A24J kernel: [ 377.864948] Barton EC read status = 0x0 Jun 6 15:55:10 barton-A24J kernel: [ 377.864951] Barton EC write data = 0x7 Jun 6 15:55:10 barton-A24J kernel: [ 377.864982] Barton EC read status = 0x2 Jun 6 15:55:10 barton-A24J kernel: [ 377.866706] Barton EC read status = 0x0 fail case 2: Jun 7 13:44:35 barton-A24J kernel: [ 166.878303] bxudbg EC write command =0x81 Jun 7 13:44:35 barton-A24J kernel: [ 166.878668] bxudbg EC read status = 0x8 Jun 7 13:44:35 barton-A24J kernel: [ 166.878669] bxudbg EC write data = 0x1c Jun 7 13:44:35 barton-A24J kernel: [ 166.878671] bxudbg EC read status = 0x8 Jun 7 13:44:35 barton-A24J kernel: [ 166.878676] bxudbg EC write data = 0x7 Jun 7 13:44:35 barton-A24J kernel: [ 166.878697] bxudbg EC read status = 0x2 Jun 7 13:44:35 barton-A24J kernel: [ 166.879172] bxudbg EC read status = 0x0 successful case: Jun 6 12:30:36 barton-A24J kernel: [ 275.360203] Barton EC write command =0x81 Jun 6 12:30:36 barton-A24J kernel: [ 275.360655] Barton EC read status = 0x8 Jun 6 12:30:36 barton-A24J kernel: [ 275.360656] Barton EC write data = 0x1c Jun 6 12:30:36 barton-A24J kernel: [ 275.361494] Barton EC read status = 0x0 Jun 6 12:30:36 barton-A24J kernel: [ 275.361495] Barton EC write data = 0x7 Jun 6 12:30:36 barton-A24J kernel: [ 275.361524] Barton EC read status = 0x2 Jun 6 12:30:36 barton-A24J kernel: [ 275.361985] Barton EC read status = 0x0 I had tried the below change in ec.c to make sure the time sequence for the ACPI EC write transaction, and this workaround is OK for running my test. static void advance_transaction(struct acpi_ec *ec, u8 status) { unsigned long flags; struct transaction *t = ec->curr; u8 trusty_status; //bxudbg spin_lock_irqsave(&ec->lock, flags); if (!t) goto unlock; //bxudbg >>> udelay(100); trusty_status = acpi_ec_read_status(ec); //bxudbg <<< if (t->wlen > t->wi) { //bxudbg if ((status & ACPI_EC_FLAG_IBF) == 0) if ((trusty_status & ACPI_EC_FLAG_IBF) == 0) //bxudbg acpi_ec_write_data(ec, t->wdata[t->wi++]); else goto err; } else if (t->rlen > t->ri) { //bxudbg if ((status & ACPI_EC_FLAG_OBF) == 1) { if ((trusty_status & ACPI_EC_FLAG_OBF) == 1) { //bxudbg t->rdata[t->ri++] = acpi_ec_read_data(ec); if (t->rlen == t->ri) t->done = true; } else goto err; //bxudbg } else if (t->wlen == t->wi && (status & ACPI_EC_FLAG_IBF) == 0) } else if ((t->wlen == t->wi) && ((trusty_status & ACPI_EC_FLAG_IBF) == 0)) //bxudbg t->done = true; goto unlock; err: /* * If SCI bit is set, then don't think it's a false IRQ * otherwise will take a not handled IRQ as a false one. */ if (in_interrupt() && !(status & ACPI_EC_FLAG_SCI)) ++t->irq_count; unlock: spin_unlock_irqrestore(&ec->lock, flags); }
Created attachment 105351 [details] original ec.c
Created attachment 105361 [details] my change to ec.c
Hi: Sorry, I am not clear what's your problem? How do you check "EC RAM data is no change"? What's machine you are using? Please attach the output of acpidump.
Hi Tianyu, My test machine configuration: CPU: Haswell PCH: Lynx point GPU: Nvidia Geforce EC: ITE My test runtine will call method XXXX, and in method XXXX, set FBRQ 1 to inform EC to poll a GPIO pin status. Run this test many loops, there are fail rate. At the fail case, I found EC doesn't poll pin status. If I put the write FBRQ statement into a while loop, and break if read FBRQ = 1, the test will not fail. And this test will never fail under windows 8. OperationRegion(ERAM, EmbeddedControl, Zero, 0xFF) Field(ERAM, ByteAcc, NoLock, Preserve) { Offset(0x1C), FBRQ, 1, //Use this bit to inform EC to do something FBST, 1, } Method(XXXX, 0, Serialized) { CreateField(TGPC,0x06,0x02,ECOC) CreateField(TGPC,0x08,0x02,PRGX) Store(Zero, LNKD) SGPO(HLRS, One) Sleep(One) SGPO(PWEN, Zero) SGPO(0x34, Zero) While(LNotEqual(SGPI(PWOK), One)) { Store(Zero, Local0) While(LLess(Local0, 0x14)) { Add(Local0, One, Local0) Stall(0x32) } } Store(Zero, Local0) While(LLess(Local0, 0x1E)) { Add(Local0, One, Local0) Stall(0x32) } SGPO(HLRS, Zero) Sleep(0x05) Store(Zero, AFES) While(LLess(LNKS, 0x07)) { Stall(0x64) } Store(One, \_SB.PCI0.LPCB.EC0.FBRQ) //write EC RAM FBRQ bit Store(One, GS04) Store(Zero, GO04) And(GL01, 0xEF, GL01) Return(Zero) }
(In reply to comment #4) > Hi Tianyu, > > My test machine configuration: > CPU: Haswell > PCH: Lynx point > GPU: Nvidia Geforce > EC: ITE > Ok. If I understand correctly, you are developing a Bios for New product. Your workaround is to add a 100ms delay in the advance transaction. This also may happen in SCI handler and so looks not every good. I am not very familiar with the EC Firmware. But according ACPI EC spec, IBF=0, Signals that the embedded controller has read the last command or data from the input buffer. So when IBF=0, data should have been written into EC RAM? > My test runtine will call method XXXX, and in method XXXX, set FBRQ 1 > to inform EC to poll a GPIO pin status. > Run this test many loops, there are fail rate. > At the fail case, I found EC doesn't poll pin status. > If I put the write FBRQ statement into a while loop, and break if > read > FBRQ = 1, the test will not fail. > And this test will never fail under windows 8. > > OperationRegion(ERAM, EmbeddedControl, Zero, 0xFF) > Field(ERAM, ByteAcc, NoLock, Preserve) > { > Offset(0x1C), > FBRQ, 1, //Use this bit to inform EC to do something > FBST, 1, > } > > Method(XXXX, 0, Serialized) > { > CreateField(TGPC,0x06,0x02,ECOC) > CreateField(TGPC,0x08,0x02,PRGX) > Store(Zero, LNKD) > SGPO(HLRS, One) > Sleep(One) > SGPO(PWEN, Zero) > SGPO(0x34, Zero) > While(LNotEqual(SGPI(PWOK), One)) > { > Store(Zero, Local0) > While(LLess(Local0, 0x14)) > { > Add(Local0, One, Local0) > Stall(0x32) > } > } > Store(Zero, Local0) > While(LLess(Local0, 0x1E)) > { > Add(Local0, One, Local0) > Stall(0x32) > } > SGPO(HLRS, Zero) > Sleep(0x05) > Store(Zero, AFES) > While(LLess(LNKS, 0x07)) > { > Stall(0x64) > } > Store(One, \_SB.PCI0.LPCB.EC0.FBRQ) //write EC RAM FBRQ bit > Store(One, GS04) > Store(Zero, GO04) > And(GL01, 0xEF, GL01) > Return(Zero) > }
Hi Tianyu, > But according ACPI EC spec, IBF=0, Signals that the >embedded controller has read the last command or data from the input buffer. >So >when IBF=0, data should have been written into EC RAM? I think so too. But is there any chance that we read the status before the IBF change to 1? host write 0x81 to 66h port IBF = 0 -> 1 EC read command data IBF = 1 -> 0 host write address to 62h port And, do you know how does function wait_event_timeout work? static int ec_poll(struct acpi_ec *ec) { unsigned long flags; int repeat = 5; /* number of command restarts */ while (repeat--) { unsigned long delay = jiffies + msecs_to_jiffies(ec_delay); do { /* don't sleep with disabled interrupts */ if (EC_FLAGS_MSI || irqs_disabled()) { udelay(ACPI_EC_MSI_UDELAY); if (ec_transaction_done(ec)) return 0; } else { if (wait_event_timeout(ec->wait, ec_transaction_done(ec), msecs_to_jiffies(1))) return 0; } advance_transaction(ec, acpi_ec_read_status(ec)); } while (time_before(jiffies, delay)); pr_debug(PREFIX "controller reset, restart transaction\n"); spin_lock_irqsave(&ec->lock, flags); start_transaction(ec); spin_unlock_irqrestore(&ec->lock, flags); } return -ETIME; }
(In reply to comment #6) > Hi Tianyu, > > > But according ACPI EC spec, IBF=0, Signals that the > >embedded controller has read the last command or data from the input buffer. > So > >when IBF=0, data should have been written into EC RAM? > > I think so too. But is there any chance that we read the status before > the It depends on the EC. EC may deal with data/command too fast to catch "IBF 0->1" before IBF "1-> 0" and there is no interrupt for this. Could you do a compare with Linux and Windows EC transaction's time sequence? That will be helpful. > IBF change to 1? > > host write 0x81 to 66h port > IBF = 0 -> 1 > EC read command data > IBF = 1 -> 0 > host write address to 62h port > > And, do you know how does function wait_event_timeout work? > Please see acpi_ec_gpe_handler(), it's SCI interrupt handler for EC. When the transaction is done, ec->wait will be woken up. static u32 acpi_ec_gpe_handler(acpi_handle gpe_device, u32 gpe_number, void *data) { struct acpi_ec *ec = data; u8 status = acpi_ec_read_status(ec); pr_debug(PREFIX "~~~> interrupt, status:0x%02x\n", status); advance_transaction(ec, status); if (ec_transaction_done(ec) && (acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF) == 0) { wake_up(&ec->wait); ec_check_sci(ec, acpi_ec_read_status(ec)); } return ACPI_INTERRUPT_HANDLED | ACPI_REENABLE_GPE; } > static int ec_poll(struct acpi_ec *ec) > { > unsigned long flags; > int repeat = 5; /* number of command restarts */ > while (repeat--) { > unsigned long delay = jiffies + > msecs_to_jiffies(ec_delay); > do { > /* don't sleep with disabled interrupts */ > if (EC_FLAGS_MSI || irqs_disabled()) { > udelay(ACPI_EC_MSI_UDELAY); > if (ec_transaction_done(ec)) > return 0; > } else { > if (wait_event_timeout(ec->wait, > ec_transaction_done(ec), > msecs_to_jiffies(1))) > return 0; > } > advance_transaction(ec, acpi_ec_read_status(ec)); > } while (time_before(jiffies, delay)); > pr_debug(PREFIX "controller reset, restart transaction\n"); > spin_lock_irqsave(&ec->lock, flags); > start_transaction(ec); > spin_unlock_irqrestore(&ec->lock, flags); > } > return -ETIME; > }
Hi Tianyu, Refer to following information, I found that "wait_event_timeout" didn't sleep at the fail case. Except "wake_up(&ec->wait)", who can wake the "wait_event_timeout"? normal case: kernel: [ 87.715399] bxudbg EC read status = 0x0 kernel: [ 87.715407] bxudbg EC write command = 0x81 // start_transaction kernel: [ 87.715459] bxudbg EC read status = 0x8 // acpi_ec_gpe_handler kernel: [ 87.715464] bxudbg ec interrupt, status = 0x8 kernel: [ 87.715467] bxudbg EC write data = 0x8 // advance_transaction kernel: [ 87.715523] bxudbg EC read status = 0x0 // acpi_ec_gpe_handler kernel: [ 87.715540] bxudbg ec interrupt, status = 0x0 kernel: [ 87.715547] bxudbg EC write data = 0x80 // advance_transaction kernel: [ 87.715596] bxudbg EC read status = 0x0 // acpi_ec_gpe_handler kernel: [ 87.715600] bxudbg ec interrupt, status = 0x0 kernel: [ 87.715607] bxudbg EC read status = 0x0 // kernel: [ 87.715611] bxudbg ec wake up // wake up "wait_event_timeout" kernel: [ 87.715624] bxudbg EC read status = 0x0 // kernel: [ 87.715660] bxudbg ec transaction done // ec_poll kernel: [ 87.715676] bxudbg EC read status = 0x0 // fail case: kernel: [ 519.708513] bxudbg EC read status = 0x0 kernel: [ 519.708515] bxudbg EC write command = 0x81 kernel: [ 519.708556] bxudbg advance_transaction path 1 //who wake up the sleep? kernel: [ 519.708563] bxudbg EC read status = 0x8 kernel: [ 519.708567] bxudbg EC write data = 0x0 //ec_poll kernel: [ 519.708569] bxudbg EC read status = 0x8 kernel: [ 519.708571] bxudbg ec interrupt, status = 0x8 kernel: [ 519.708581] bxudbg EC write data = 0x40 //acpi_ec_gpe_handler kernel: [ 519.708617] bxudbg EC read status = 0x0 kernel: [ 519.708621] bxudbg ec interrupt, status = 0x0 kernel: [ 519.708627] bxudbg EC read status = 0x0 kernel: [ 519.708631] bxudbg ec wake up kernel: [ 519.708641] bxudbg EC read status = 0x0 kernel: [ 519.708672] bxudbg ec transaction done kernel: [ 519.708682] bxudbg EC read status = 0x0 static int ec_poll(struct acpi_ec *ec) { unsigned long flags; int repeat = 5; /* number of command restarts */ while (repeat--) { unsigned long delay = jiffies + msecs_to_jiffies(ec_delay); do { /* don't sleep with disabled interrupts */ if (EC_FLAGS_MSI || irqs_disabled()) { udelay(ACPI_EC_MSI_UDELAY); if (ec_transaction_done(ec)) return 0; } else { if (wait_event_timeout(ec->wait, ec_transaction_done(ec), msecs_to_jiffies(1))) { printk(KERN_WARNING "bxudbg ec transaction done\n"); return 0; } } printk(KERN_WARNING "bxudbg advance_transaction path 1\n"); advance_transaction(ec, acpi_ec_read_status(ec)); } while (time_before(jiffies, delay)); printk(KERN_WARNING "bxudbg controller reset, restart transaction\n"); spin_lock_irqsave(&ec->lock, flags); start_transaction(ec); spin_unlock_irqrestore(&ec->lock, flags); } return -ETIME; } static u32 acpi_ec_gpe_handler(acpi_handle gpe_device, u32 gpe_number, void *data) { struct acpi_ec *ec = data; struct transaction *t = ec->curr; u8 status = acpi_ec_read_status(ec); if (!t) printk(KERN_WARNING "bxudbg t = 0\n"); printk(KERN_WARNING "bxudbg ec interrupt, status = 0x%x\n", status); advance_transaction(ec, status); if (ec_transaction_done(ec) && (acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF) == 0) { printk(KERN_WARNING "bxudbg ec wake up\n"); wake_up(&ec->wait); ec_check_sci(ec, acpi_ec_read_status(ec)); } return ACPI_INTERRUPT_HANDLED | ACPI_REENABLE_GPE; }
(In reply to comment #8) > > static int ec_poll(struct acpi_ec *ec) > { > unsigned long flags; > int repeat = 5; /* number of command restarts */ > while (repeat--) { > unsigned long delay = jiffies + > msecs_to_jiffies(ec_delay); > do { > /* don't sleep with disabled interrupts */ > if (EC_FLAGS_MSI || irqs_disabled()) { Could you add a log here and see whether the code path fall here during failure case ? > udelay(ACPI_EC_MSI_UDELAY); > if (ec_transaction_done(ec)) > return 0; > } else { > if (wait_event_timeout(ec->wait, > ec_transaction_done(ec), > msecs_to_jiffies(1))) > { > printk(KERN_WARNING "bxudbg ec transaction done\n"); > return 0; > } > } > printk(KERN_WARNING "bxudbg advance_transaction path 1\n"); > advance_transaction(ec, acpi_ec_read_status(ec)); > } while (time_before(jiffies, delay)); > printk(KERN_WARNING "bxudbg controller reset, restart > transaction\n"); > spin_lock_irqsave(&ec->lock, flags); > start_transaction(ec); > spin_unlock_irqrestore(&ec->lock, flags); > } > return -ETIME; > } >
In the fail transaction, the time interval between 1 and 2 is 41 us, it looks like without executing udelay(ACPI_EC_MSI_UDELAY). ACPI_EC_MSI_UDELAY = 550 fail case: 1 kernel: [ 519.708515] bxudbg EC write command = 0x81 2 kernel: [ 519.708556] bxudbg advance_transaction path 1
Hi Tianyu, Follow your suggest, I added debug info, and ran test again. static int ec_poll(struct acpi_ec *ec) { unsigned long flags; int repeat = 5; /* number of command restarts */ while (repeat--) { unsigned long delay = jiffies + msecs_to_jiffies(ec_delay); do { /* don't sleep with disabled interrupts */ if (EC_FLAGS_MSI || irqs_disabled()) { printk(KERN_WARNING "bxudbg IRQ disabled\n"); udelay(ACPI_EC_MSI_UDELAY); if (ec_transaction_done(ec)) return 0; } else { if (wait_event_timeout(ec->wait, ec_transaction_done(ec), msecs_to_jiffies(1))) { printk(KERN_WARNING "bxudbg ec transaction done\n"); return 0; } } printk(KERN_WARNING "bxudbg advance_transaction path 1\n"); advance_transaction(ec, acpi_ec_read_status(ec)); } while (time_before(jiffies, delay)); printk(KERN_WARNING "bxudbg controller reset, restart transaction\n"); spin_lock_irqsave(&ec->lock, flags); start_transaction(ec); spin_unlock_irqrestore(&ec->lock, flags); } return -ETIME; } fail case log: kernel: [ 472.207836] bxudbg EC read status = 0x8 kernel: [ 472.207844] bxudbg EC write command = 0x81 kernel: [ 472.207919] bxudbg EC read status = 0x8 kernel: [ 472.207927] bxudbg ec interrupt, status = 0x8 kernel: [ 472.207939] bxudbg advance_transaction path 1 kernel: [ 472.207946] bxudbg EC read status = 0x8 kernel: [ 472.207955] bxudbg EC write data = 0x8 //abnormal kernel: [ 472.207963] bxudbg EC write data = 0x80 //abnormal kernel: [ 472.208006] bxudbg EC read status = 0x0 kernel: [ 472.208009] bxudbg ec interrupt, status = 0x0 kernel: [ 472.208013] bxudbg EC read status = 0x0 kernel: [ 472.208015] bxudbg ec wake up kernel: [ 472.208027] bxudbg EC read status = 0x0 kernel: [ 472.208036] bxudbg ec transaction done kernel: [ 472.208050] bxudbg EC read status = 0x0
Created attachment 106887 [details] debug.patch Sorry for no response for long time. Please try this patch.
Tianyu, Thank you! I have tried this patch, run my test over night without error, looks like this patch is OK. - advance_transaction(ec, acpi_ec_read_status(ec)); + + if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) + advance_transaction(ec, acpi_ec_read_status(ec)); + What does EC_FLAGS_GPE_STORM mean? Will this patch be put in linux kernel?
(In reply to barton xu from comment #13) > Tianyu, > > Thank you! > > I have tried this patch, run my test over night without error, looks like > this patch is OK. Thanks for test. > > - advance_transaction(ec, acpi_ec_read_status(ec)); > + > + if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) > + advance_transaction(ec, > acpi_ec_read_status(ec)); > + > > What does EC_FLAGS_GPE_STORM mean? On some machines, EC's GPE will storm sometimes. For this case, linux EC driver will change handle mode. There are two modes. One is interrupt mode and the other is polling mode. Normally, it use interrupt mode. But it will select polling mode when GPE storm happens. GPE will be disabled during transaction in this case. I think this issue is caused by mixing polling mode and interrupt mode. advance_transaction() may be called in the ec_poll() and acpi_ec_gpe_handler() in the same transaction. If they are called too nearly, the second one would overwrite the data the first one just wrote to EC before IBF 0->1. This patch is to separate two modes clearly. > > Will this patch be put in linux kernel? I will push this patch and include you in Cc list.
> I will push this patch and include you in Cc list. If you need more test to confirm the patch, I am pleasure to do it.
Hi Barton: Could you apply the following patch and attach the dmesg? I'd like to see how long IBF remain 0 after host write data or command to EC on your machine. I see the log you attached. kernel: [ 519.708515] bxudbg EC write command = 0x81 kernel: [ 519.708556] bxudbg advance_transaction path 1 //who wake up the sleep? kernel: [ 519.708563] bxudbg EC read status = 0x8 kernel: [ 519.708567] bxudbg EC write data = 0x0 //ec_poll kernel: [ 519.708569] bxudbg EC read status = 0x8 kernel: [ 519.708571] bxudbg ec interrupt, status = 0x8 kernel: [ 519.708581] bxudbg EC write data = 0x40 //acpi_ec_gpe_handler kernel: [ 519.708617] bxudbg EC read status = 0x0 kernel: [ 519.708621] bxudbg ec interrupt, status = 0x0 kernel: [ 519.708627] bxudbg EC read status = 0x0 The IBF wasn't set immediately after "write data" and this doesn't happen on my machine and IBF will remain 1 for 100us. diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 16ac3e8..cc6e7bc 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -181,9 +181,16 @@ static void advance_transaction(struct acpi_ec *ec, u8 status) if (!t) goto unlock; if (t->wlen > t->wi) { - if ((status & ACPI_EC_FLAG_IBF) == 0) - acpi_ec_write_data(ec, + if ((status & ACPI_EC_FLAG_IBF) == 0) { + acpi_ec_write_data(ec, t->wdata[t->wi++]); + status = acpi_ec_read_status(ec); + pr_info("EC: Status1 0x%0x \n", status); + while (!(status & ACPI_EC_FLAG_IBF)) + status = acpi_ec_read_status(ec); + pr_info("EC: Status2 0x%0x \n", status); + + } else goto err; } else if (t->rlen > t->ri) {
Created attachment 107014 [details] time of EC IBF change 0->1 after write command
Hi Tianyu, I found some cases that IBF wasn't set immediately. [ 1.728964] hub 1-1:1.0: USB hub found [ 1.729101] hub 1-1:1.0: 6 ports detected [ 1.734141] EC: Status 1 0x2 [ 1.734150] EC: Status 2 0x2 [ 1.749513] EC: Status 1 0x2 [ 1.749523] EC: Status 2 0x2 [ 1.750509] EC: Status 1 0x2 [ 1.750516] EC: Status 2 0x2 [ 1.751509] EC: Status 1 0x2 [ 1.751515] EC: Status 2 0x2 [ 1.752905] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Hi, I don't see such cases in the dmesg. Could you run your test and try to find whether the IBF is delay to be set when the issue happen? (In reply to barton xu from comment #18) > Hi Tianyu, > > I found some cases that IBF wasn't set immediately. > > [ 1.728964] hub 1-1:1.0: USB hub found > [ 1.729101] hub 1-1:1.0: 6 ports detected > [ 1.734141] EC: Status 1 0x2 > [ 1.734150] EC: Status 2 0x2 > [ 1.749513] EC: Status 1 0x2 > [ 1.749523] EC: Status 2 0x2 > [ 1.750509] EC: Status 1 0x2 > [ 1.750516] EC: Status 2 0x2 > [ 1.751509] EC: Status 1 0x2 > [ 1.751515] EC: Status 2 0x2 > [ 1.752905] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded "Status 1" shows the status just after write operation. If IBF wasn't set, the following loop would check the IBF until it's set. "Status 2" shows the status at that time. So from this log, the IBF is set immediately.
Hi Tianyu Sorry, I made a mistake there. I will try whether the IBF is delay to be set when the issue happen.
No problem :) Thanks for test.
Hi Tianyu, Test result: the IBF is set immediately without delay when the issue happen. And I think the IBF is set by Hardware automatically, is it right?
Sorry for later response. This seems conflict the before log. kernel: [ 519.708563] bxudbg EC read status = 0x8 kernel: [ 519.708567] bxudbg EC write data = 0x0 //ec_poll kernel: [ 519.708569] bxudbg EC read status = 0x8 kernel: [ 519.708571] bxudbg ec interrupt, status = 0x8 kernel: [ 519.708581] bxudbg EC write data = 0x40 //acpi_ec_gpe_handler kernel: [ 519.708617] bxudbg EC read status = 0x0 kernel: [ 519.708621] bxudbg ec interrupt, status = 0x0 This shows the IBF status wasn't set just after write operation. So there is a chance that the second operation may overwrite the data of first one. IBF should be set by Hardware. BTW, Does the ITE EC spec say one interval is required between two data/commands to EC register?
It's difficult to explain that log, maybe EC has read the data but doesn't handle completely. But from the below log, even if IBF is set just after write operation, there is a chance to fail, because the status for write data = 0x80 is readed before write data = 0x8 kernel: [ 472.207836] bxudbg EC read status = 0x8 kernel: [ 472.207844] bxudbg EC write command = 0x81 kernel: [ 472.207919] bxudbg EC read status = 0x8 kernel: [ 472.207927] bxudbg ec interrupt, status = 0x8 kernel: [ 472.207939] bxudbg advance_transaction path 1 kernel: [ 472.207946] bxudbg EC read status = 0x8 //status for write data = 0x80 kernel: [ 472.207955] bxudbg EC write data = 0x8 //abnormal kernel: [ 472.207963] bxudbg EC write data = 0x80 //abnormal kernel: [ 472.208006] bxudbg EC read status = 0x0 kernel: [ 472.208009] bxudbg ec interrupt, status = 0x0 kernel: [ 472.208013] bxudbg EC read status = 0x0 kernel: [ 472.208015] bxudbg ec wake up kernel: [ 472.208027] bxudbg EC read status = 0x0 kernel: [ 472.208036] bxudbg ec transaction done kernel: [ 472.208050] bxudbg EC read status = 0x0 I don't have ITE EC spec for reference.
Hi Tianyu, > I will push this patch and include you in Cc list. Will your patch be added into kernel? Or you have better idea for this issue?
Hi Barton: Sorry for later response due to busy on other things. I original think this issue caused by "IBF status wasn't set just after write operation". But the later test show it doesn't. One possible reason is that one interval is required between two data/commands to EC register. Could you provide the ec.c driver you used(attachment on comment 2 ?) and the test code.
Created attachment 107278 [details] ec.c
Hi, please recovery the advance_transaction() (especally the udelay) and then apply this patch. Let's see whether this makes the issue easily reproduces. diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 8dd2d4d..9ac8431 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -228,6 +228,9 @@ static int ec_poll(struct acpi_ec *ec) unsigned long delay = jiffies + msecs_to_jiffies(ec_delay); do { + advance_transaction(ec, acpi_ec_read_status(ec)); + advance_transaction(ec, acpi_ec_read_status(ec)); + /* don't sleep with disabled interrupts */ if (EC_FLAGS_MSI || irqs_disabled()) { udelay(ACPI_EC_MSI_UDELAY); @@ -239,7 +242,6 @@ static int ec_poll(struct acpi_ec *ec) msecs_to_jiffies(1))) return 0; } - advance_transaction(ec, acpi_ec_read_status(ec)); } while (time_before(jiffies, delay)); pr_debug(PREFIX "controller reset, restart transaction\n"); spin_lock_irqsave(&ec->lock, flags); @@ -844,6 +846,7 @@ static int acpi_ec_add(struct acpi_device *device) ec->gpe, ec->command_addr, ec->data_addr); ret = ec_install_handlers(ec); + set_bit(EC_FLAGS_GPE_STORM, &ec->flags); /* EC is fully operational, allow queries */ clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
Created attachment 107289 [details] ec.c
Created attachment 107290 [details] syslog
(In reply to Lan Tianyu from comment #28) > Hi, please recovery the advance_transaction() (especally the udelay) and then > apply this patch. Let's see whether this makes the issue easily reproduces. > > > diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c > index 8dd2d4d..9ac8431 100644 > --- a/drivers/acpi/ec.c > +++ b/drivers/acpi/ec.c > @@ -228,6 +228,9 @@ static int ec_poll(struct acpi_ec *ec) > unsigned long delay = jiffies + > msecs_to_jiffies(ec_delay); > do { > + advance_transaction(ec, acpi_ec_read_status(ec)); > + advance_transaction(ec, acpi_ec_read_status(ec)); > + > /* don't sleep with disabled interrupts */ > if (EC_FLAGS_MSI || irqs_disabled()) { > udelay(ACPI_EC_MSI_UDELAY); > @@ -239,7 +242,6 @@ static int ec_poll(struct acpi_ec *ec) > msecs_to_jiffies(1))) > return 0; > } > - advance_transaction(ec, acpi_ec_read_status(ec)); > } while (time_before(jiffies, delay)); > pr_debug(PREFIX "controller reset, restart transaction\n"); > spin_lock_irqsave(&ec->lock, flags); > @@ -844,6 +846,7 @@ static int acpi_ec_add(struct acpi_device *device) > ec->gpe, ec->command_addr, ec->data_addr); > > ret = ec_install_handlers(ec); > + set_bit(EC_FLAGS_GPE_STORM, &ec->flags); > > /* EC is fully operational, allow queries */ > clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags); Try this patch, test no fail. From the syslog, seems interrupt mode is disabled by setting "EC_FLAGS_GPE_STORM" flag.
Yes, it uses polling mode. But from the log, the delay between two write operations is still large. So please try the following patch. Thanks. diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index a06d983..8a9ef09 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -234,10 +234,14 @@ static int ec_poll(struct acpi_ec *ec) if (ec_transaction_done(ec)) return 0; } else { - if (wait_event_timeout(ec->wait, - ec_transaction_done(ec), - msecs_to_jiffies(1))) + udelay(5); + if (ec_transaction_done(ec)) return 0; + +// if (wait_event_timeout(ec->wait, +// ec_transaction_done(ec), +// msecs_to_jiffies(1))) +// return 0; } advance_transaction(ec, acpi_ec_read_status(ec)); } while (time_before(jiffies, delay)); @@ -844,6 +848,7 @@ static int acpi_ec_add(struct acpi_device *device) ec->gpe, ec->command_addr, ec->data_addr); ret = ec_install_handlers(ec); + set_bit(EC_FLAGS_GPE_STORM, &ec->flags); /* EC is fully operational, allow queries */ clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
Created attachment 107410 [details] syslog-comment32
Try this patch, no fail. Attach syslog-comment32. From the experiments I have done, I find the fail only occurs when the interrupt mode and polling mode are mixed. Can we find out why they are mixed?
From the log, the interval between two write operations is about 180~200us. The previous failure logs show the interval is about 10us. This seems to only happen when interrupt mode and polling mode are mixed. Further more, IBF lasts about 150us after data is written to EC from the log. [ 3484.705387] bxudbg EC write data = 0x1c [ 3484.705562] bxudbg EC write data = 0x7 [ 3484.748047] bxudbg EC write data = 0x1c [ 3484.748247] bxudbg EC read data = 0x7 [ 3484.753533] bxudbg EC write data = 0x1c [ 3484.753738] bxudbg EC read data = 0x7 [ 3484.761422] bxudbg EC write data = 0x1c [ 3484.761605] bxudbg EC write data = 0x6 [ 3487.273732] bxudbg EC write data = 0x1c [ 3487.273932] bxudbg EC read data = 0x6 [ 3487.301281] bxudbg EC write data = 0x1c [ 3487.301489] bxudbg EC read data = 0x6 [ 3487.307967] bxudbg EC write data = 0x1c [ 3487.308143] bxudbg EC write data = 0x7 [ 3487.371678] bxudbg EC write data = 0x1c [ 3487.371880] bxudbg EC read data = 0x7 [ 3487.380098] bxudbg EC write data = 0x1c [ 3487.380300] bxudbg EC read data = 0x7 [ 3487.388520] bxudbg EC write data = 0x1c [ 3487.388712] bxudbg EC write data = 0x6 Could you provide the whole log when failure happen? The log before the fail transaction maybe have some clues. > Can we find out why they are mixed? Following is that change log. commit 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Sun Aug 30 03:06:14 2009 +0400 ACPI: EC: Merge IRQ and POLL modes In general, EC transaction should complete in less than 1ms, thus it is possible to merge wait for 1ms in poll mode and 1ms of interrupt transaction timeout. Still, driver will wait 500ms for EC to complete transaction. This significantly simplifies driver and makes it immune to problematic EC interrupt implementations. It also may lessen kernel start-up time by 500ms. References: http://bugzilla.kernel.org/show_bug.cgi?id=12949
Hi Tianyu, Please refer to ec039.c, I replaced wait_event_timeout with its function body and added some track point. I run the test and found the fail case as follows: fail analysis: After writing command 0x81, the normal flow: The main process should sleep 1ms, and interrupt mode drive the transaction progress the fail case flow: The main process only sleep 127us, so the "advance_transaction" in "ec_poll" was executed(write address 0x1c) before EC trigger gpe, when gpe was triggered, the "advance_transaction" in "acpi_ec_gpe_handler" was executed(write data 0x7), so for EC, address 0x1c was missed. [ 419.762904] bxudbg EC read status = 0x0 [ 419.762904] bxudbg EC write command = 0x81 [ 419.762907] bxudbg track_point_0 [ 419.762908] bxudbg track_point_1 [ 419.763035] bxudbg track_point_3 ret = 0 //schedule only 127us [ 419.763037] bxudbg track_point_5 [ 419.763041] bxudbg EC read status = 0x8 [ 419.763041] bxudbg EC write data = 0x1c [ 419.763044] bxudbg EC read status = 0x8 [ 419.763045] bxudbg ec interrupt, status = 0x8 [ 419.763046] bxudbg track_point_0 [ 419.763047] bxudbg track_point_1 [ 419.763048] bxudbg EC write data = 0x7 [ 419.763065] bxudbg EC read status = 0x2 [ 419.763066] bxudbg ec interrupt, status = 0x2 [ 419.763253] bxudbg EC read status = 0x0 [ 419.763256] bxudbg ec interrupt, status = 0x0 [ 419.763260] bxudbg EC read status = 0x0 [ 419.763261] bxudbg ec wake up [ 419.763266] bxudbg EC read status = 0x0 [ 419.763278] bxudbg track_point_3 ret = 1 [ 419.763278] bxudbg track_point_1 [ 419.763279] bxudbg track_point_2 [ 419.763280] bxudbg track_point_4 [ 419.763283] bxudbg EC read status = 0x0
Created attachment 107412 [details] ec039.c
Created attachment 107413 [details] syslog for ec039.c
Hi Tianyu, I put fail cases into two categories. 1. unknown EC interrupt meanwhile host write command 0x81, then the IBF isn't set, so we assume EC has got the command 0x81 falsely, and we send the address 0x1C to EC 2. abnormal schedule sleep time, then host send the data 0x7 stick with send the address 0x1C CMD bit = 1, means in the data register is a command byte CMD bit = 0, means in the data register is a data byte so for fail case 1, if we see the CMD bit = 0, we know EC doesn't get the command 0x81, the unknown interrupt isn't for inform host write address 0x1c, we can skip write address 0x1c in advance_transaction. for fail case 2, if we see the CMD bit = 1, we know EC doesn't get the address 0x1C, we can skip write data 0x7 in advance_transaction. [ 108.824947] bxudbg EC read status = 0x0 [ 108.824948] bxudbg EC write command = 0x81 [ 108.824950] bxudbg EC read status = 0x0 //CMD bit = 0, abnormal [ 108.824950] bxudbg ec interrupt, status = 0x0 [ 108.824952] bxudbg EC write data = 0x1c [ 108.824955] bxudbg track_point_0 [ 108.824956] bxudbg track_point_1 [ 108.827442] bxudbg track_point_3 ret = 0 [ 108.827446] bxudbg track_point_5 [ 108.827451] bxudbg EC read status = 0x0 [ 108.827452] bxudbg EC write data = 0x7 [ 108.827455] bxudbg track_point_0 [ 108.827456] bxudbg track_point_1 [ 108.831449] bxudbg track_point_3 ret = 0 [ 108.831451] bxudbg track_point_5 [ 108.831455] bxudbg EC read status = 0x0 [ 108.831456] bxudbg track_point_4 [ 108.831459] bxudbg EC read status = 0x0 [ 419.762904] bxudbg EC read status = 0x0 [ 419.762904] bxudbg EC write command = 0x81 [ 419.762907] bxudbg track_point_0 [ 419.762908] bxudbg track_point_1 [ 419.763035] bxudbg track_point_3 ret = 0 //schedule only 127us [ 419.763037] bxudbg track_point_5 [ 419.763041] bxudbg EC read status = 0x8 [ 419.763041] bxudbg EC write data = 0x1c [ 419.763044] bxudbg EC read status = 0x8 //CMD bit = 1, abnormal [ 419.763045] bxudbg ec interrupt, status = 0x8 [ 419.763046] bxudbg track_point_0 [ 419.763047] bxudbg track_point_1 [ 419.763048] bxudbg EC write data = 0x7 [ 419.763065] bxudbg EC read status = 0x2 [ 419.763066] bxudbg ec interrupt, status = 0x2 [ 419.763253] bxudbg EC read status = 0x0 [ 419.763256] bxudbg ec interrupt, status = 0x0 [ 419.763260] bxudbg EC read status = 0x0 [ 419.763261] bxudbg ec wake up [ 419.763266] bxudbg EC read status = 0x0 [ 419.763278] bxudbg track_point_3 ret = 1 [ 419.763278] bxudbg track_point_1 [ 419.763279] bxudbg track_point_2 [ 419.763280] bxudbg track_point_4 [ 419.763283] bxudbg EC read status = 0x0 I add the CMD bit as condition in advance_transaction, test no fail static void advance_transaction(struct acpi_ec *ec, u8 status) { unsigned long flags; struct transaction *t = ec->curr; spin_lock_irqsave(&ec->lock, flags); if (!t) goto unlock; if (t->wlen > t->wi) { //bxudbg>>> if (t->wi == 0) { if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0x08)) acpi_ec_write_data(ec,t->wdata[t->wi++]); else goto err; } else { if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0)) acpi_ec_write_data(ec,t->wdata[t->wi++]); else goto err; } //bxudbg<<< // if ((status & ACPI_EC_FLAG_IBF) == 0) // acpi_ec_write_data(ec, // t->wdata[t->wi++]); // else // goto err; Could you review my change, is there any risk?
(In reply to barton xu from comment #39) > Hi Tianyu, > > > I put fail cases into two categories. > 1. unknown EC interrupt meanwhile host write command 0x81, then the IBF > isn't set, so we assume EC has got the command 0x81 falsely, and we send the > address 0x1C to EC When do you mean unknown interrupt? This maybe a SCI event. The operation of writing command or data in advance_transaction is under spin_lock_irq protection and irq will be disabled. Could you show some logs of this case? > 2. abnormal schedule sleep time, then host send the data 0x7 stick with > send the address 0x1C > > > CMD bit = 1, means in the data register is a command byte > CMD bit = 0, means in the data register is a data byte > > so for fail case 1, if we see the CMD bit = 0, we know EC doesn't get the > command 0x81, the unknown interrupt isn't for inform host write address 0x1c, > we can skip write address 0x1c in advance_transaction. For this case, the IBF also is not set to 1. So maybe we can try to check IGB after write cmd/data. If it's not set to 1, retry several times. How about following change? But I don't know why cmd wasn't written to register. lanlty@bee:/c/lanlty/kernel/linux-pm$ git diff drivers/acpi/ec.c diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 8dd2d4d..8770c68 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -176,15 +176,28 @@ static void advance_transaction(struct acpi_ec *ec, u8 status) { unsigned long flags; struct transaction *t = ec->curr; + int i; spin_lock_irqsave(&ec->lock, flags); if (!t) goto unlock; if (t->wlen > t->wi) { - if ((status & ACPI_EC_FLAG_IBF) == 0) + if ((status & ACPI_EC_FLAG_IBF) == 0) { +retry: acpi_ec_write_data(ec, - t->wdata[t->wi++]); - else + t->wdata[t->wi]); + + if (acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF) { + if (i < 3) { + i++; + goto retry; + + } else + pr_warn("write error/n"); + } + + t->wi++; + } else goto err; } else if (t->rlen > t->ri) { if ((status & ACPI_EC_FLAG_OBF) == 1) { > for fail case 2, if we see the CMD bit = 1, we know EC doesn't get the > address 0x1C, we can skip write data 0x7 in advance_transaction. This seems to be strange. Why CMD bit was set to 1 after data was written to register? > > [ 108.824947] bxudbg EC read status = 0x0 > [ 108.824948] bxudbg EC write command = 0x81 > [ 108.824950] bxudbg EC read status = 0x0 //CMD bit = 0, abnormal > [ 108.824950] bxudbg ec interrupt, status = 0x0 > [ 108.824952] bxudbg EC write data = 0x1c > [ 108.824955] bxudbg track_point_0 > [ 108.824956] bxudbg track_point_1 > [ 108.827442] bxudbg track_point_3 ret = 0 > [ 108.827446] bxudbg track_point_5 > [ 108.827451] bxudbg EC read status = 0x0 > [ 108.827452] bxudbg EC write data = 0x7 > [ 108.827455] bxudbg track_point_0 > [ 108.827456] bxudbg track_point_1 > [ 108.831449] bxudbg track_point_3 ret = 0 > [ 108.831451] bxudbg track_point_5 > [ 108.831455] bxudbg EC read status = 0x0 > [ 108.831456] bxudbg track_point_4 > [ 108.831459] bxudbg EC read status = 0x0 > > > > [ 419.762904] bxudbg EC read status = 0x0 > [ 419.762904] bxudbg EC write command = 0x81 > [ 419.762907] bxudbg track_point_0 > [ 419.762908] bxudbg track_point_1 > [ 419.763035] bxudbg track_point_3 ret = 0 //schedule only 127us > [ 419.763037] bxudbg track_point_5 > [ 419.763041] bxudbg EC read status = 0x8 > [ 419.763041] bxudbg EC write data = 0x1c > [ 419.763044] bxudbg EC read status = 0x8 //CMD bit = 1, abnormal > [ 419.763045] bxudbg ec interrupt, status = 0x8 > [ 419.763046] bxudbg track_point_0 > [ 419.763047] bxudbg track_point_1 > [ 419.763048] bxudbg EC write data = 0x7 > [ 419.763065] bxudbg EC read status = 0x2 > [ 419.763066] bxudbg ec interrupt, status = 0x2 > [ 419.763253] bxudbg EC read status = 0x0 > [ 419.763256] bxudbg ec interrupt, status = 0x0 > [ 419.763260] bxudbg EC read status = 0x0 > [ 419.763261] bxudbg ec wake up > [ 419.763266] bxudbg EC read status = 0x0 > [ 419.763278] bxudbg track_point_3 ret = 1 > [ 419.763278] bxudbg track_point_1 > [ 419.763279] bxudbg track_point_2 > [ 419.763280] bxudbg track_point_4 > [ 419.763283] bxudbg EC read status = 0x0 > > > > I add the CMD bit as condition in advance_transaction, test no fail > > > static void advance_transaction(struct acpi_ec *ec, u8 status) > { > unsigned long flags; > struct transaction *t = ec->curr; > > spin_lock_irqsave(&ec->lock, flags); > if (!t) > goto unlock; > if (t->wlen > t->wi) { > //bxudbg>>> > if (t->wi == 0) { > if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & > ACPI_EC_FLAG_CMD) == > 0x08)) > acpi_ec_write_data(ec,t->wdata[t->wi++]); > else > goto err; > } > else { > if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & > ACPI_EC_FLAG_CMD) == > 0)) > acpi_ec_write_data(ec,t->wdata[t->wi++]); > else > goto err; > } > //bxudbg<<< > // if ((status & ACPI_EC_FLAG_IBF) == 0) > // acpi_ec_write_data(ec, > // t->wdata[t->wi++]); > // else > // goto err; > > > > Could you review my change, is there any risk?
Created attachment 107801 [details] ec.patch Sorry. Please ignore previous patch and try this one.
Hi Tianyu, > > When do you mean unknown interrupt? This maybe a SCI event. > The operation of writing command or data in advance_transaction is under > spin_lock_irq protection and irq will be disabled. > > Could you show some logs of this case? > log like below, irq will be enabled after I/O-write instruction, I think interrupt can reach CPU faster than I/O transmission reach EC. This maybe a SCI event, but it is surely not a interrupt to inform host doing advance_transaction. [ 108.824947] bxudbg EC read status = 0x0 [ 108.824948] bxudbg EC write command = 0x81 [ 108.824950] bxudbg EC read status = 0x0 //CMD bit = 0, abnormal [ 108.824950] bxudbg ec interrupt, status = 0x0 //unknown interrupt [ 108.824952] bxudbg EC write data = 0x1c [ 108.824955] bxudbg track_point_0 [ 108.824956] bxudbg track_point_1 [ 108.827442] bxudbg track_point_3 ret = 0 [ 108.827446] bxudbg track_point_5 [ 108.827451] bxudbg EC read status = 0x0 [ 108.827452] bxudbg EC write data = 0x7 [ 108.827455] bxudbg track_point_0 [ 108.827456] bxudbg track_point_1 [ 108.831449] bxudbg track_point_3 ret = 0 [ 108.831451] bxudbg track_point_5 [ 108.831455] bxudbg EC read status = 0x0 [ 108.831456] bxudbg track_point_4 [ 108.831459] bxudbg EC read status = 0x0 > > This seems to be strange. Why CMD bit was set to 1 after data was written to > register? > This means that EC doesn't got the data. Once fail system on my hand, I will try your patch.
hi, anyupdate?
Hi Tianyu, Sorry for late reply, I just got the fail system. Try your patch, test fail but no retry fail in advance_transaction. code: if ((status & ACPI_EC_FLAG_IBF) == 0) { retry: acpi_ec_write_data(ec, t->wdata[t->wi++]); if(!(acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF)) { if(i<3) { i++; goto retry; } else pr_warn("bxuflag write error\n"); } } else goto err; syslog: kernel: [ 139.362786] bxudbg EC read status = 0x0 kernel: [ 139.362787] bxudbg EC write command = 0x81 kernel: [ 139.362788] bxudbg EC read status = 0x0 kernel: [ 139.362789] bxudbg ec interrupt, status = 0x0 kernel: [ 139.362791] bxudbg EC write data = 0x1c //abnormal kernel: [ 139.362796] bxudbg EC read status = 0x2 kernel: [ 139.362796] bxudbg track_point_0 kernel: [ 139.362797] bxudbg track_point_1 kernel: [ 139.362992] bxudbg track_point_3 ret = 0 kernel: [ 139.362996] bxudbg track_point_5 kernel: [ 139.363002] bxudbg EC read status = 0x0 kernel: [ 139.363003] bxudbg EC write data = 0x7 kernel: [ 139.363008] bxudbg EC read status = 0x2 kernel: [ 139.363009] bxudbg track_point_0 kernel: [ 139.363009] bxudbg track_point_1 kernel: [ 139.367032] bxudbg track_point_3 ret = 0 kernel: [ 139.367040] bxudbg track_point_5 kernel: [ 139.367046] bxudbg EC read status = 0x0 kernel: [ 139.367050] bxudbg track_point_4 kernel: [ 139.367055] bxudbg EC read status = 0x0 Any other suggestion?
Created attachment 110431 [details] ec.patch Sorry, just come back from vacation. We should check IBG and do retry after writing command rather than writing data. So please try this patch. Thanks.
Any update?
Hi Tianyu, Sorry for late reply. I have tried your patch, still fail. Fail log as follows: A24J kernel: [ 261.141934] bxudbg EC write command = 0x81 A24J kernel: [ 261.141941] bxudbg EC read status = 0xa A24J kernel: [ 261.141943] bxudbg track_point_0 A24J kernel: [ 261.141943] bxudbg track_point_1 A24J kernel: [ 261.142238] bxudbg track_point_3 ret = 0 //schedule only 295us A24J kernel: [ 261.142240] bxudbg EC read status = 0x8 A24J kernel: [ 261.142241] bxudbg track_point_5 A24J kernel: [ 261.142243] bxudbg ec interrupt, status = 0x8 A24J kernel: [ 261.142244] bxudbg EC read status = 0x8 A24J kernel: [ 261.142245] bxudbg EC write data = 0x1c //bxuflag A24J kernel: [ 261.142250] bxudbg EC read status = 0x2 A24J kernel: [ 261.142251] bxudbg EC write data = 0x7 //bxuflag A24J kernel: [ 261.142256] bxudbg EC read status = 0x2 A24J kernel: [ 261.142257] bxudbg track_point_0 A24J kernel: [ 261.142258] bxudbg track_point_1 A24J kernel: [ 261.142275] bxudbg EC read status = 0x2 A24J kernel: [ 261.142276] bxudbg ec interrupt, status = 0x2 A24J kernel: [ 261.142469] bxudbg EC read status = 0x0 A24J kernel: [ 261.142471] bxudbg ec interrupt, status = 0x0 A24J kernel: [ 261.142475] bxudbg EC read status = 0x0 A24J kernel: [ 261.142476] bxudbg ec wake up A24J kernel: [ 261.142482] bxudbg EC read status = 0x0 A24J kernel: [ 261.142494] bxudbg track_point_3 ret = 1 A24J kernel: [ 261.142495] bxudbg track_point_1 A24J kernel: [ 261.142496] bxudbg track_point_2 A24J kernel: [ 261.142497] bxudbg track_point_4 A24J kernel: [ 261.142500] bxudbg EC read status = 0x0 A24J kernel: [ 262.381359] bxudbg EC read status = 0x20 A24J kernel: [ 262.381361] bxudbg ec interrupt, status = 0x20 A24J kernel: [ 262.381365] bxudbg EC read status = 0x20 A24J kernel: [ 262.381366] bxudbg ec wake up A24J kernel: [ 262.381369] bxudbg EC read status = 0x20
Hi Tianyu, Till now, seems checking CMD bit in advance_transaction can sequence the transaction. Can we use CMD bit? ACPI spec said CMD bit only used by controller, SMI_EVT: 1 – Indicates SMI event is pending (requesting SMI query). 0 – No SMI events are pending. SCI_EVT: 1 – Indicates SCI event is pending (requesting SCI query). 0 – No SCI events are pending. BURST: 1 – Controller is in burst mode for polled command processing. 0 – Controller is in normal mode for interrupt-driven command processing. CMD: 1 – Byte in data register is a command byte (only used by controller). 0 – Byte in data register is a data byte (only used by controller). IBF: 1 – Input buffer is full (data ready for embedded controller). 0 – Input buffer is empty. OBF: 1 – Output buffer is full (data ready for host). 0 – Output buffer is empty.
Created attachment 115371 [details] ec.patch It's better to check the bit in the start_transaction(), could you check whether this patch can fix your issue. But I am not sure we can do this since it said "only used by controller" and there maybe different behaviour on the other machines.
Since no response for long time, close this bug and feel free reopen it if there is any update.
Hi Tianyu, I had returned the fail laptop to customer earlier, so no update till now. My colleague got a repro system now, could we open the bug again?
Hi Tianyu, Could you still give your great help for this issue? Thanks.
Hi Tianyu: If there is no reposone from you tomorrow, we will file new bug for tracking, thanks for help.
I am busy on the other things. So reassign to default assignee
Thanks,Lan. Hi Barton: can you attach last patch that we confirmed it from our side without issue for the review ?
I have shown my change in comment 39, copy to here again static void advance_transaction(struct acpi_ec *ec, u8 status) { unsigned long flags; struct transaction *t = ec->curr; spin_lock_irqsave(&ec->lock, flags); if (!t) goto unlock; if (t->wlen > t->wi) { //bxudbg>>> if (t->wi == 0) { if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0x08)) acpi_ec_write_data(ec,t->wdata[t->wi++]); else goto err; } else { if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0)) acpi_ec_write_data(ec,t->wdata[t->wi++]); else goto err; } //bxudbg<<< // if ((status & ACPI_EC_FLAG_IBF) == 0) // acpi_ec_write_data(ec, // t->wdata[t->wi++]); // else // goto err; Hi tianyu, You patch in comment 49 can lower the fail rate, but still fail.
Could we have someone to review the patch at #57 ? Thanks!
Hi Tianyu: Any chance you can get this bug back and help to do final review ?
Sorry. For later response, I still think the patch in the Comment 12 should be right solution and it works without no failure, right?
Created attachment 139361 [details] ec.patch Please try this patch.
Hi Tianyu: We are going to verify it , we should provide some update by early next week. Thanks Arthur
Hi Tianyu: We've tried your new patch but it didn't help , Is it possible that you can review advance_transaction() modification at comment 57 ? that is the pass condition we have right now. Thanks for help! -Arthur
Yes, I think it complies with ACPI spec. But from original log, I think the root case is race condition issue. There is a new patchset to fix race condition issue in the EC driver, could you please test it? http://marc.info/?l=linux-acpi&m=140279288812650&w=2
Hi Tianyu: I couldn't find the patch from the link http://marc.info/?l=linux-acpi&m=140279288812650&w=2, could you help to attach the new patchset to this bug for my validation ? Thanks for help! -Arthur
Sure. http://marc.info/?l=linux-acpi&m=140279290812659&w=2 http://marc.info/?l=linux-acpi&m=140279291012660&w=2 http://marc.info/?l=linux-acpi&m=140279292712665&w=2 http://marc.info/?l=linux-acpi&m=140279300712697&w=2 http://marc.info/?l=linux-acpi&m=140279300012696&w=2 http://marc.info/?l=linux-acpi&m=140279299612694&w=2 http://marc.info/?l=linux-acpi&m=140279299312692&w=2
Hi Tinayu: Should I apply all of patches to do the validation ? just want to make sure we are doing right here ? or the patch from #65 is good enough ? Thanks!
Please apply all patches in the Comment 66. The patch in the #65 is an introduction for the patchset. Thanks.
OK, I got it, we will try that. thanks!
Hi Tianyu: We merged the patches from #66 manually and confirmed test result is positive. Can you help to confirm if all the patches are check-in already ? we would like to confirm with official release. Thanks!
It will be merged into v3.16 and backport to some older stable kernel.
Hi Tianyu: Can you give us update once those patches merged to v3.16, we would like to double check with official kernel. Thanks for help! -Arthur
Ok. No problem.
Thanks for the reporting. I'll add you guys into the patchset. Marking this bug as RESOLVED... Best regards -Lv
Hi Tianyu/Lv, I checked the v3.16-rc3 but couldn't find this ec patch. Can you help to check? Thanks.
No, ACPI maintainer has yet come back from vacation. So it will be merged later.
The patchset will appear in v3.17 according to our schedule.
I see. Thanks for prompt response!
The patchset has been merged into linux-pm tree. http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/?h=bleeding-edge
Hi LV, Can you confirm this patch (http://marc.info/?l=linux-acpi&m=140279299612694&w=2) has check-in already or we should wait for v3.17? Thanks, Pan
(In reply to Pan Kuo from comment #80) > Hi LV, > > Can you confirm this patch > (http://marc.info/?l=linux-acpi&m=140279299612694&w=2) has check-in already > or we should wait for v3.17? It's upstreamed since v3.17-rc1. Thanks -Lv
Thanks for reply, LV.