Bug 59911 - acpi ec write ec ram fail
Summary: acpi ec write ec ram fail
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-19 09:21 UTC by barton xu
Modified: 2014-09-16 01:54 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.9.4
Tree: Mainline
Regression: No


Attachments
original ec.c (27.99 KB, text/plain)
2013-06-19 09:38 UTC, barton xu
Details
my change to ec.c (28.38 KB, text/plain)
2013-06-19 09:39 UTC, barton xu
Details
debug.patch (551 bytes, patch)
2013-07-15 07:31 UTC, Lan Tianyu
Details | Diff
time of EC IBF change 0->1 after write command (57.83 KB, application/octet-stream)
2013-07-25 05:50 UTC, barton xu
Details
ec.c (28.38 KB, text/plain)
2013-08-22 02:20 UTC, barton xu
Details
ec.c (28.23 KB, text/plain)
2013-08-23 11:16 UTC, barton xu
Details
syslog (880.21 KB, application/octet-stream)
2013-08-23 11:18 UTC, barton xu
Details
syslog-comment32 (45.25 KB, application/octet-stream)
2013-09-04 03:39 UTC, barton xu
Details
ec039.c (29.19 KB, text/plain)
2013-09-04 10:21 UTC, barton xu
Details
syslog for ec039.c (847.91 KB, text/plain)
2013-09-04 10:33 UTC, barton xu
Details
ec.patch (827 bytes, patch)
2013-09-09 09:43 UTC, Lan Tianyu
Details | Diff
ec.patch (622 bytes, patch)
2013-10-08 07:22 UTC, Lan Tianyu
Details | Diff
ec.patch (940 bytes, patch)
2013-11-21 08:00 UTC, Lan Tianyu
Details | Diff
ec.patch (1.29 KB, patch)
2014-06-12 02:31 UTC, Lan Tianyu
Details | Diff

Description barton xu 2013-06-19 09:21:06 UTC
I run a long run test on ubuntu linux, this test will call ACPI method to write EC RAM through ACPI EC write command(0x81). At the fail case, ACPI EC write transaction is done, but the EC RAM data is no change.
Write EC Ram flow:
step 1. write 0x81 to port 66
step 2. write address byte to port 62
step 3. write data byte to port 62

The fail transaction cases can be grouped into 2 classes:
1. the time interval between step 1 and 2 is shorter than 10 us
2. the time interval between step 2 and 3 is shorter than 10 us

For the successful transaction cases, the time interval between 1 and 2 and 3 is longer than 100 us.
 
fail case 1:
Jun  6 15:55:10 barton-A24J kernel: [  377.863077] Barton EC write command =0x81
Jun  6 15:55:10 barton-A24J kernel: [  377.863086] Barton EC write data = 0x1c
Jun  6 15:55:10 barton-A24J kernel: [  377.864948] Barton EC read status = 0x0
Jun  6 15:55:10 barton-A24J kernel: [  377.864951] Barton EC write data = 0x7
Jun  6 15:55:10 barton-A24J kernel: [  377.864982] Barton EC read status = 0x2
Jun  6 15:55:10 barton-A24J kernel: [  377.866706] Barton EC read status = 0x0

fail case 2:
Jun  7 13:44:35 barton-A24J kernel: [  166.878303] bxudbg EC write command =0x81
Jun  7 13:44:35 barton-A24J kernel: [  166.878668] bxudbg EC read status = 0x8
Jun  7 13:44:35 barton-A24J kernel: [  166.878669] bxudbg EC write data = 0x1c
Jun  7 13:44:35 barton-A24J kernel: [  166.878671] bxudbg EC read status = 0x8
Jun  7 13:44:35 barton-A24J kernel: [  166.878676] bxudbg EC write data = 0x7
Jun  7 13:44:35 barton-A24J kernel: [  166.878697] bxudbg EC read status = 0x2
Jun  7 13:44:35 barton-A24J kernel: [  166.879172] bxudbg EC read status = 0x0

successful case:
Jun  6 12:30:36 barton-A24J kernel: [  275.360203] Barton EC write command =0x81
Jun  6 12:30:36 barton-A24J kernel: [  275.360655] Barton EC read status = 0x8
Jun  6 12:30:36 barton-A24J kernel: [  275.360656] Barton EC write data = 0x1c
Jun  6 12:30:36 barton-A24J kernel: [  275.361494] Barton EC read status = 0x0
Jun  6 12:30:36 barton-A24J kernel: [  275.361495] Barton EC write data = 0x7
Jun  6 12:30:36 barton-A24J kernel: [  275.361524] Barton EC read status = 0x2
Jun  6 12:30:36 barton-A24J kernel: [  275.361985] Barton EC read status = 0x0

I had tried the below change in ec.c to make sure the time sequence for the ACPI EC write transaction, and this workaround is OK for running my test.

static void advance_transaction(struct acpi_ec *ec, u8 status)
{
      unsigned long flags;
      struct transaction *t = ec->curr;
      u8 trusty_status;   //bxudbg

      spin_lock_irqsave(&ec->lock, flags);
      if (!t)
             goto unlock;
//bxudbg >>>
       udelay(100);
       trusty_status = acpi_ec_read_status(ec);
//bxudbg <<<

      if (t->wlen > t->wi) {
//bxudbg       if ((status & ACPI_EC_FLAG_IBF) == 0)
             if ((trusty_status & ACPI_EC_FLAG_IBF) == 0)   //bxudbg
                    acpi_ec_write_data(ec,
                           t->wdata[t->wi++]);
             else
                    goto err;
      } else if (t->rlen > t->ri) {
//bxudbg       if ((status & ACPI_EC_FLAG_OBF) == 1) {
             if ((trusty_status & ACPI_EC_FLAG_OBF) == 1) { //bxudbg
                    t->rdata[t->ri++] = acpi_ec_read_data(ec);
                    if (t->rlen == t->ri)
                           t->done = true;
             } else
                    goto err;
//bxudbg } else if (t->wlen == t->wi && (status & ACPI_EC_FLAG_IBF) == 0)
      } else if ((t->wlen == t->wi) && ((trusty_status & ACPI_EC_FLAG_IBF) == 0)) //bxudbg
             t->done = true;
      goto unlock;
err:
      /*
       * If SCI bit is set, then don't think it's a false IRQ
       * otherwise will take a not handled IRQ as a false one.
       */
      if (in_interrupt() && !(status & ACPI_EC_FLAG_SCI))
             ++t->irq_count;

unlock:
      spin_unlock_irqrestore(&ec->lock, flags);
}
Comment 1 barton xu 2013-06-19 09:38:28 UTC
Created attachment 105351 [details]
original ec.c
Comment 2 barton xu 2013-06-19 09:39:14 UTC
Created attachment 105361 [details]
my change to ec.c
Comment 3 Lan Tianyu 2013-06-20 13:01:45 UTC
Hi:
    Sorry, I am not clear what's your problem? How do you check "EC RAM data is no change"? What's machine you are using?
    Please attach the output of acpidump.
Comment 4 barton xu 2013-06-21 02:44:47 UTC
Hi Tianyu,

        My test machine configuration:
        CPU: Haswell
        PCH: Lynx point
        GPU: Nvidia Geforce
        EC: ITE

        My test runtine will call method XXXX, and in method XXXX, set FBRQ 1 to inform EC to poll a GPIO pin status. 
        Run this test many loops, there are fail rate.
        At the fail case, I found EC doesn't poll pin status.
        If I put the write FBRQ statement into a while loop, and break if read FBRQ = 1, the test will not fail.
        And this test will never fail under windows 8. 
 
	OperationRegion(ERAM, EmbeddedControl, Zero, 0xFF)
	Field(ERAM, ByteAcc, NoLock, Preserve)
	{
                Offset(0x1C),
		FBRQ, 1,            //Use this bit to inform EC to do something 
		FBST, 1,	
        }

	Method(XXXX, 0, Serialized)
	{
		CreateField(TGPC,0x06,0x02,ECOC)
		CreateField(TGPC,0x08,0x02,PRGX)
		Store(Zero, LNKD)
		SGPO(HLRS, One)
		Sleep(One)
		SGPO(PWEN, Zero)
		SGPO(0x34, Zero)
		While(LNotEqual(SGPI(PWOK), One))
		{
			Store(Zero, Local0)
			While(LLess(Local0, 0x14))
			{
				Add(Local0, One, Local0)
				Stall(0x32)
			}
		}
		Store(Zero, Local0)
		While(LLess(Local0, 0x1E))
		{
			Add(Local0, One, Local0)
			Stall(0x32)
		}
		SGPO(HLRS, Zero)
		Sleep(0x05)
		Store(Zero, AFES)
		While(LLess(LNKS, 0x07))
		{
			Stall(0x64)
		}
		Store(One, \_SB.PCI0.LPCB.EC0.FBRQ)   //write EC RAM FBRQ bit
		Store(One, GS04)
		Store(Zero, GO04)
		And(GL01, 0xEF, GL01)
		Return(Zero)
	}
Comment 5 Lan Tianyu 2013-06-21 03:39:14 UTC
(In reply to comment #4)
> Hi Tianyu,
> 
>         My test machine configuration:
>         CPU: Haswell
>         PCH: Lynx point
>         GPU: Nvidia Geforce
>         EC: ITE
> 
Ok. If I understand correctly, you are developing a Bios for New product.

Your workaround is to add a 100ms delay in the advance transaction. This also may happen in SCI handler and so looks not every good. I am not very familiar with the  EC Firmware. But according ACPI EC spec, IBF=0, Signals that the embedded controller has read the last command or data from the input buffer. So when IBF=0, data should have been written into EC RAM?


>         My test runtine will call method XXXX, and in method XXXX, set FBRQ 1
> to inform EC to poll a GPIO pin status. 
>         Run this test many loops, there are fail rate.
>         At the fail case, I found EC doesn't poll pin status.
>         If I put the write FBRQ statement into a while loop, and break if
>         read
> FBRQ = 1, the test will not fail.
>         And this test will never fail under windows 8. 
> 
>     OperationRegion(ERAM, EmbeddedControl, Zero, 0xFF)
>     Field(ERAM, ByteAcc, NoLock, Preserve)
>     {
>                 Offset(0x1C),
>         FBRQ, 1,            //Use this bit to inform EC to do something 
>         FBST, 1,    
>         }
> 
>     Method(XXXX, 0, Serialized)
>     {
>         CreateField(TGPC,0x06,0x02,ECOC)
>         CreateField(TGPC,0x08,0x02,PRGX)
>         Store(Zero, LNKD)
>         SGPO(HLRS, One)
>         Sleep(One)
>         SGPO(PWEN, Zero)
>         SGPO(0x34, Zero)
>         While(LNotEqual(SGPI(PWOK), One))
>         {
>             Store(Zero, Local0)
>             While(LLess(Local0, 0x14))
>             {
>                 Add(Local0, One, Local0)
>                 Stall(0x32)
>             }
>         }
>         Store(Zero, Local0)
>         While(LLess(Local0, 0x1E))
>         {
>             Add(Local0, One, Local0)
>             Stall(0x32)
>         }
>         SGPO(HLRS, Zero)
>         Sleep(0x05)
>         Store(Zero, AFES)
>         While(LLess(LNKS, 0x07))
>         {
>             Stall(0x64)
>         }
>         Store(One, \_SB.PCI0.LPCB.EC0.FBRQ)   //write EC RAM FBRQ bit
>         Store(One, GS04)
>         Store(Zero, GO04)
>         And(GL01, 0xEF, GL01)
>         Return(Zero)
>     }
Comment 6 barton xu 2013-06-21 07:40:56 UTC
Hi Tianyu,

>     But according ACPI EC spec, IBF=0, Signals that the
>embedded controller has read the last command or data from the input buffer.
>So
>when IBF=0, data should have been written into EC RAM?

     I think so too. But is there any chance that we read the status before the IBF change to 1?

host write 0x81 to 66h port
   IBF = 0 -> 1
EC read command data
   IBF = 1 -> 0
host write address to 62h port

    And, do you know how does function wait_event_timeout work?

static int ec_poll(struct acpi_ec *ec)
{
	unsigned long flags;
	int repeat = 5; /* number of command restarts */
	while (repeat--) {
		unsigned long delay = jiffies +
			msecs_to_jiffies(ec_delay);
		do {
			/* don't sleep with disabled interrupts */
			if (EC_FLAGS_MSI || irqs_disabled()) {
				udelay(ACPI_EC_MSI_UDELAY);
				if (ec_transaction_done(ec))
					return 0;
			} else {
				if (wait_event_timeout(ec->wait,
						ec_transaction_done(ec),
						msecs_to_jiffies(1)))
					return 0;
			}
			advance_transaction(ec, acpi_ec_read_status(ec));
		} while (time_before(jiffies, delay));
		pr_debug(PREFIX "controller reset, restart transaction\n");
		spin_lock_irqsave(&ec->lock, flags);
		start_transaction(ec);
		spin_unlock_irqrestore(&ec->lock, flags);
	}
	return -ETIME;
}
Comment 7 Lan Tianyu 2013-06-21 08:05:32 UTC
(In reply to comment #6)
> Hi Tianyu,
> 
> >     But according ACPI EC spec, IBF=0, Signals that the
> >embedded controller has read the last command or data from the input buffer.
> So
> >when IBF=0, data should have been written into EC RAM?
> 
>      I think so too. But is there any chance that we read the status before
>      the
It depends on the EC. EC may deal with data/command too fast to catch "IBF 0->1"
before IBF "1-> 0" and there is no interrupt for this.

Could you do a compare with Linux and Windows EC transaction's time sequence? That will be helpful. 

> IBF change to 1?
> 
> host write 0x81 to 66h port
>    IBF = 0 -> 1
> EC read command data
>    IBF = 1 -> 0
> host write address to 62h port
> 
>     And, do you know how does function wait_event_timeout work?
> 
Please see acpi_ec_gpe_handler(), it's SCI interrupt handler for EC.
When the transaction is done, ec->wait will be woken up.

static u32 acpi_ec_gpe_handler(acpi_handle gpe_device,
	u32 gpe_number, void *data)
{
	struct acpi_ec *ec = data;
	u8 status = acpi_ec_read_status(ec);

	pr_debug(PREFIX "~~~> interrupt, status:0x%02x\n", status);

	advance_transaction(ec, status);
	if (ec_transaction_done(ec) &&
	    (acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF) == 0) {
		wake_up(&ec->wait);
		ec_check_sci(ec, acpi_ec_read_status(ec));
	}
	return ACPI_INTERRUPT_HANDLED | ACPI_REENABLE_GPE;
}
> static int ec_poll(struct acpi_ec *ec)
> {
>     unsigned long flags;
>     int repeat = 5; /* number of command restarts */
>     while (repeat--) {
>         unsigned long delay = jiffies +
>             msecs_to_jiffies(ec_delay);
>         do {
>             /* don't sleep with disabled interrupts */
>             if (EC_FLAGS_MSI || irqs_disabled()) {
>                 udelay(ACPI_EC_MSI_UDELAY);
>                 if (ec_transaction_done(ec))
>                     return 0;
>             } else {
>                 if (wait_event_timeout(ec->wait,
>                         ec_transaction_done(ec),
>                         msecs_to_jiffies(1)))
>                     return 0;
>             }
>             advance_transaction(ec, acpi_ec_read_status(ec));
>         } while (time_before(jiffies, delay));
>         pr_debug(PREFIX "controller reset, restart transaction\n");
>         spin_lock_irqsave(&ec->lock, flags);
>         start_transaction(ec);
>         spin_unlock_irqrestore(&ec->lock, flags);
>     }
>     return -ETIME;
> }
Comment 8 barton xu 2013-06-26 12:18:27 UTC
Hi Tianyu,

   Refer to following information, I found that "wait_event_timeout" didn't sleep at the fail case. Except "wake_up(&ec->wait)", who can wake the "wait_event_timeout"?

normal case:
kernel: [   87.715399] bxudbg EC read status = 0x0               
kernel: [   87.715407] bxudbg EC write command = 0x81    // start_transaction
kernel: [   87.715459] bxudbg EC read status = 0x8       // acpi_ec_gpe_handler
kernel: [   87.715464] bxudbg ec interrupt, status = 0x8  
kernel: [   87.715467] bxudbg EC write data = 0x8        // advance_transaction 
kernel: [   87.715523] bxudbg EC read status = 0x0       // acpi_ec_gpe_handler
kernel: [   87.715540] bxudbg ec interrupt, status = 0x0
kernel: [   87.715547] bxudbg EC write data = 0x80       // advance_transaction
kernel: [   87.715596] bxudbg EC read status = 0x0       // acpi_ec_gpe_handler
kernel: [   87.715600] bxudbg ec interrupt, status = 0x0
kernel: [   87.715607] bxudbg EC read status = 0x0       //
kernel: [   87.715611] bxudbg ec wake up           // wake up "wait_event_timeout"
kernel: [   87.715624] bxudbg EC read status = 0x0       //
kernel: [   87.715660] bxudbg ec transaction done        // ec_poll
kernel: [   87.715676] bxudbg EC read status = 0x0       //

fail case:
kernel: [  519.708513] bxudbg EC read status = 0x0
kernel: [  519.708515] bxudbg EC write command = 0x81
kernel: [  519.708556] bxudbg advance_transaction path 1 //who wake up the sleep?
kernel: [  519.708563] bxudbg EC read status = 0x8
kernel: [  519.708567] bxudbg EC write data = 0x0    //ec_poll
kernel: [  519.708569] bxudbg EC read status = 0x8
kernel: [  519.708571] bxudbg ec interrupt, status = 0x8
kernel: [  519.708581] bxudbg EC write data = 0x40     //acpi_ec_gpe_handler
kernel: [  519.708617] bxudbg EC read status = 0x0
kernel: [  519.708621] bxudbg ec interrupt, status = 0x0
kernel: [  519.708627] bxudbg EC read status = 0x0
kernel: [  519.708631] bxudbg ec wake up
kernel: [  519.708641] bxudbg EC read status = 0x0
kernel: [  519.708672] bxudbg ec transaction done
kernel: [  519.708682] bxudbg EC read status = 0x0


static int ec_poll(struct acpi_ec *ec)
{
	unsigned long flags;
	int repeat = 5; /* number of command restarts */
	while (repeat--) {
		unsigned long delay = jiffies +
			msecs_to_jiffies(ec_delay);
		do {
			/* don't sleep with disabled interrupts */
			if (EC_FLAGS_MSI || irqs_disabled()) {
				udelay(ACPI_EC_MSI_UDELAY);
				if (ec_transaction_done(ec))
					return 0;
			} else {
				if (wait_event_timeout(ec->wait,
						ec_transaction_done(ec),
						msecs_to_jiffies(1)))
				{
					printk(KERN_WARNING "bxudbg ec transaction done\n");
					return 0;
				}
			}
			printk(KERN_WARNING "bxudbg advance_transaction path 1\n");
			advance_transaction(ec, acpi_ec_read_status(ec));
		} while (time_before(jiffies, delay));
		printk(KERN_WARNING "bxudbg controller reset, restart transaction\n");
		spin_lock_irqsave(&ec->lock, flags);
		start_transaction(ec);
		spin_unlock_irqrestore(&ec->lock, flags);
	}
	return -ETIME;
}


static u32 acpi_ec_gpe_handler(acpi_handle gpe_device,
	u32 gpe_number, void *data)
{
	struct acpi_ec *ec = data;
	struct transaction *t = ec->curr;
	u8 status = acpi_ec_read_status(ec);

	if (!t)
		printk(KERN_WARNING "bxudbg t = 0\n");

	printk(KERN_WARNING "bxudbg ec interrupt, status = 0x%x\n", status);

	advance_transaction(ec, status);
	if (ec_transaction_done(ec) &&
	    (acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF) == 0) {
                printk(KERN_WARNING "bxudbg ec wake up\n");
		wake_up(&ec->wait);
		ec_check_sci(ec, acpi_ec_read_status(ec));
	}
	return ACPI_INTERRUPT_HANDLED | ACPI_REENABLE_GPE;
}
Comment 9 Lan Tianyu 2013-06-27 08:02:24 UTC
(In reply to comment #8)
> 
> static int ec_poll(struct acpi_ec *ec)
> {
>     unsigned long flags;
>     int repeat = 5; /* number of command restarts */
>     while (repeat--) {
>         unsigned long delay = jiffies +
>             msecs_to_jiffies(ec_delay);
>         do {
>             /* don't sleep with disabled interrupts */
>             if (EC_FLAGS_MSI || irqs_disabled()) {
Could you add a log here and see whether the code path fall here during failure case ? 

>                 udelay(ACPI_EC_MSI_UDELAY);
>                 if (ec_transaction_done(ec))
>                     return 0;
>             } else {
>                 if (wait_event_timeout(ec->wait,
>                         ec_transaction_done(ec),
>                         msecs_to_jiffies(1)))
>                 {
>                     printk(KERN_WARNING "bxudbg ec transaction done\n");
>                     return 0;
>                 }
>             }
>             printk(KERN_WARNING "bxudbg advance_transaction path 1\n");
>             advance_transaction(ec, acpi_ec_read_status(ec));
>         } while (time_before(jiffies, delay));
>         printk(KERN_WARNING "bxudbg controller reset, restart
>         transaction\n");
>         spin_lock_irqsave(&ec->lock, flags);
>         start_transaction(ec);
>         spin_unlock_irqrestore(&ec->lock, flags);
>     }
>     return -ETIME;
> }
>
Comment 10 barton xu 2013-06-27 10:38:10 UTC
 In the fail transaction, the time interval between 1 and 2 is 41 us, it looks like without executing udelay(ACPI_EC_MSI_UDELAY).

ACPI_EC_MSI_UDELAY = 550

fail case:
1  kernel: [  519.708515] bxudbg EC write command = 0x81
2  kernel: [  519.708556] bxudbg advance_transaction path 1
Comment 11 barton xu 2013-07-02 10:26:50 UTC
Hi Tianyu,

     Follow your suggest, I added debug info, and ran test again.

static int ec_poll(struct acpi_ec *ec)
{
	unsigned long flags;
	int repeat = 5; /* number of command restarts */
	while (repeat--) {
		unsigned long delay = jiffies +
			msecs_to_jiffies(ec_delay);
		do {
			/* don't sleep with disabled interrupts */
			if (EC_FLAGS_MSI || irqs_disabled()) {
				printk(KERN_WARNING "bxudbg IRQ disabled\n");
				udelay(ACPI_EC_MSI_UDELAY);
				if (ec_transaction_done(ec))
					return 0;
			} else {
				if (wait_event_timeout(ec->wait,
						ec_transaction_done(ec),
						msecs_to_jiffies(1)))
				{
					printk(KERN_WARNING "bxudbg ec transaction done\n");
					return 0;
				}
			}
			printk(KERN_WARNING "bxudbg advance_transaction path 1\n");
			advance_transaction(ec, acpi_ec_read_status(ec));
		} while (time_before(jiffies, delay));
		printk(KERN_WARNING "bxudbg controller reset, restart transaction\n");
		spin_lock_irqsave(&ec->lock, flags);
		start_transaction(ec);
		spin_unlock_irqrestore(&ec->lock, flags);
	}
	return -ETIME;
}


fail case log:

kernel: [  472.207836] bxudbg EC read status = 0x8
kernel: [  472.207844] bxudbg EC write command = 0x81
kernel: [  472.207919] bxudbg EC read status = 0x8
kernel: [  472.207927] bxudbg ec interrupt, status = 0x8
kernel: [  472.207939] bxudbg advance_transaction path 1
kernel: [  472.207946] bxudbg EC read status = 0x8
kernel: [  472.207955] bxudbg EC write data = 0x8      //abnormal
kernel: [  472.207963] bxudbg EC write data = 0x80     //abnormal
kernel: [  472.208006] bxudbg EC read status = 0x0
kernel: [  472.208009] bxudbg ec interrupt, status = 0x0
kernel: [  472.208013] bxudbg EC read status = 0x0
kernel: [  472.208015] bxudbg ec wake up
kernel: [  472.208027] bxudbg EC read status = 0x0
kernel: [  472.208036] bxudbg ec transaction done
kernel: [  472.208050] bxudbg EC read status = 0x0
Comment 12 Lan Tianyu 2013-07-15 07:31:05 UTC
Created attachment 106887 [details]
debug.patch

Sorry for no response for long time. Please try this patch.
Comment 13 barton xu 2013-07-18 03:24:01 UTC
Tianyu,

   Thank you!

I have tried this patch, run my test over night without error, looks like this patch is OK.

-			advance_transaction(ec, acpi_ec_read_status(ec));
+			
+			if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags))
+				advance_transaction(ec, acpi_ec_read_status(ec));
+

What does EC_FLAGS_GPE_STORM mean?

Will this patch be put in linux kernel?
Comment 14 Lan Tianyu 2013-07-18 06:54:30 UTC
(In reply to barton xu from comment #13)
> Tianyu,
> 
>    Thank you!
> 
> I have tried this patch, run my test over night without error, looks like
> this patch is OK.

Thanks for test.

> 
> -                     advance_transaction(ec, acpi_ec_read_status(ec));
> +                     
> +                     if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags))
> +                             advance_transaction(ec,
> acpi_ec_read_status(ec));
> +
> 
> What does EC_FLAGS_GPE_STORM mean?

On some machines, EC's GPE will storm sometimes. For this case, linux EC driver will change handle mode. There are two modes. One is interrupt mode and the other is polling mode. Normally, it use interrupt mode. But it will select polling mode when GPE storm happens. GPE will be disabled during transaction in this case.

I think this issue is caused by mixing polling mode and interrupt mode. advance_transaction() may be called in the ec_poll() and acpi_ec_gpe_handler() in the same transaction. If they are called too nearly, the second one would overwrite the data the first one just wrote to EC before IBF 0->1. This patch is to separate two modes clearly.

> 
> Will this patch be put in linux kernel?

I will push this patch and include you in Cc list.
Comment 15 barton xu 2013-07-18 10:22:38 UTC
> I will push this patch and include you in Cc list.

If you need more test to confirm the patch, I am pleasure to do it.
Comment 16 Lan Tianyu 2013-07-25 03:27:25 UTC
Hi Barton:
      Could you apply the following patch and attach the dmesg? I'd like to see how long IBF remain 0 after host write data or command to EC on your machine. I see the log you attached.

kernel: [  519.708515] bxudbg EC write command = 0x81
kernel: [  519.708556] bxudbg advance_transaction path 1 //who wake up the sleep?
kernel: [  519.708563] bxudbg EC read status = 0x8
kernel: [  519.708567] bxudbg EC write data = 0x0    //ec_poll
kernel: [  519.708569] bxudbg EC read status = 0x8
kernel: [  519.708571] bxudbg ec interrupt, status = 0x8
kernel: [  519.708581] bxudbg EC write data = 0x40     //acpi_ec_gpe_handler
kernel: [  519.708617] bxudbg EC read status = 0x0
kernel: [  519.708621] bxudbg ec interrupt, status = 0x0
kernel: [  519.708627] bxudbg EC read status = 0x0

The IBF wasn't set immediately after "write data" and this doesn't happen on my machine and IBF will remain 1 for 100us.
        

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 16ac3e8..cc6e7bc 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -181,9 +181,16 @@ static void advance_transaction(struct acpi_ec *ec, u8 status)
        if (!t)
                goto unlock;
        if (t->wlen > t->wi) {
-               if ((status & ACPI_EC_FLAG_IBF) == 0)
-                       acpi_ec_write_data(ec,
+               if ((status & ACPI_EC_FLAG_IBF) == 0) {
+                       acpi_ec_write_data(ec, 
                                t->wdata[t->wi++]);
+                       status = acpi_ec_read_status(ec);
+                       pr_info("EC: Status1 0x%0x \n", status);
+                       while (!(status & ACPI_EC_FLAG_IBF))
+                               status = acpi_ec_read_status(ec);
+                       pr_info("EC: Status2 0x%0x \n", status);
+
+               }
                else
                        goto err;
        } else if (t->rlen > t->ri) {
Comment 17 barton xu 2013-07-25 05:50:25 UTC
Created attachment 107014 [details]
time of EC IBF change 0->1 after write command
Comment 18 barton xu 2013-07-25 05:56:56 UTC
Hi Tianyu,

    I found some cases that IBF wasn't set immediately.

[    1.728964] hub 1-1:1.0: USB hub found
[    1.729101] hub 1-1:1.0: 6 ports detected
[    1.734141] EC: Status 1 0x2 
[    1.734150] EC: Status 2 0x2 
[    1.749513] EC: Status 1 0x2 
[    1.749523] EC: Status 2 0x2 
[    1.750509] EC: Status 1 0x2 
[    1.750516] EC: Status 2 0x2 
[    1.751509] EC: Status 1 0x2 
[    1.751515] EC: Status 2 0x2 
[    1.752905] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Comment 19 Lan Tianyu 2013-07-25 06:09:27 UTC
Hi, I don't see such cases in the dmesg. Could you run your test and try to find whether the IBF is delay to be set when the issue happen?

(In reply to barton xu from comment #18)
> Hi Tianyu,
> 
>     I found some cases that IBF wasn't set immediately.
> 
> [    1.728964] hub 1-1:1.0: USB hub found
> [    1.729101] hub 1-1:1.0: 6 ports detected
> [    1.734141] EC: Status 1 0x2 
> [    1.734150] EC: Status 2 0x2 
> [    1.749513] EC: Status 1 0x2 
> [    1.749523] EC: Status 2 0x2 
> [    1.750509] EC: Status 1 0x2 
> [    1.750516] EC: Status 2 0x2 
> [    1.751509] EC: Status 1 0x2 
> [    1.751515] EC: Status 2 0x2 
> [    1.752905] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded

"Status 1" shows the status just after write operation. If IBF wasn't set, the following loop would check the IBF until it's set. "Status 2" shows the status at that time. So from this log, the IBF is set immediately.
Comment 20 barton xu 2013-07-25 13:48:22 UTC
Hi Tianyu

     Sorry, I made a mistake there. I will try whether the IBF is delay to be set when the issue happen.
Comment 21 Lan Tianyu 2013-07-26 05:36:33 UTC
No problem :) Thanks for test.
Comment 22 barton xu 2013-07-26 07:36:32 UTC
Hi Tianyu,

    Test result: the IBF is set immediately without delay when the issue happen.

    And I think the IBF is set by Hardware automatically, is it right?
Comment 23 Lan Tianyu 2013-08-01 08:16:34 UTC
Sorry for later response. This seems conflict the before log.

kernel: [  519.708563] bxudbg EC read status = 0x8
kernel: [  519.708567] bxudbg EC write data = 0x0    //ec_poll
kernel: [  519.708569] bxudbg EC read status = 0x8
kernel: [  519.708571] bxudbg ec interrupt, status = 0x8
kernel: [  519.708581] bxudbg EC write data = 0x40     //acpi_ec_gpe_handler
kernel: [  519.708617] bxudbg EC read status = 0x0
kernel: [  519.708621] bxudbg ec interrupt, status = 0x0

This shows the IBF status wasn't set just after write operation. So there is a chance that the second operation may overwrite the data of first one.

IBF should be set by Hardware.

BTW, Does the ITE EC spec say one interval is required between two data/commands to EC register?
Comment 24 barton xu 2013-08-05 10:03:44 UTC
It's difficult to explain that log, maybe EC has read the data but doesn't handle completely.

But from the below log, even if IBF is set just after write operation, there is a chance to fail, because the status for write data = 0x80 is readed before write data = 0x8

kernel: [  472.207836] bxudbg EC read status = 0x8
kernel: [  472.207844] bxudbg EC write command = 0x81
kernel: [  472.207919] bxudbg EC read status = 0x8  
kernel: [  472.207927] bxudbg ec interrupt, status = 0x8
kernel: [  472.207939] bxudbg advance_transaction path 1
kernel: [  472.207946] bxudbg EC read status = 0x8  //status for write data = 0x80
kernel: [  472.207955] bxudbg EC write data = 0x8      //abnormal
kernel: [  472.207963] bxudbg EC write data = 0x80     //abnormal
kernel: [  472.208006] bxudbg EC read status = 0x0
kernel: [  472.208009] bxudbg ec interrupt, status = 0x0
kernel: [  472.208013] bxudbg EC read status = 0x0
kernel: [  472.208015] bxudbg ec wake up
kernel: [  472.208027] bxudbg EC read status = 0x0
kernel: [  472.208036] bxudbg ec transaction done
kernel: [  472.208050] bxudbg EC read status = 0x0

I don't have ITE EC spec for reference.
Comment 25 barton xu 2013-08-20 01:07:43 UTC
Hi Tianyu,

 > I will push this patch and include you in Cc list.

    Will your patch be added into kernel? Or you have better idea for this issue?
Comment 26 Lan Tianyu 2013-08-21 03:25:37 UTC
Hi Barton:
     Sorry for later response due to busy on other things. I original think this issue caused by  "IBF status wasn't set just after write operation". But the later  test show it doesn't. One possible reason is that one interval is required between 
two data/commands to EC register.

Could you provide the ec.c driver you used(attachment on comment 2 ?) and the test code.
Comment 27 barton xu 2013-08-22 02:20:09 UTC
Created attachment 107278 [details]
ec.c
Comment 28 Lan Tianyu 2013-08-22 13:02:56 UTC
Hi, please recovery the advance_transaction() (especally the udelay) and then
apply this patch. Let's see whether this makes the issue easily reproduces.


diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 8dd2d4d..9ac8431 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -228,6 +228,9 @@ static int ec_poll(struct acpi_ec *ec)
                unsigned long delay = jiffies +
                        msecs_to_jiffies(ec_delay);
                do {
+                       advance_transaction(ec, acpi_ec_read_status(ec));
+                       advance_transaction(ec, acpi_ec_read_status(ec));
+
                        /* don't sleep with disabled interrupts */
                        if (EC_FLAGS_MSI || irqs_disabled()) {
                                udelay(ACPI_EC_MSI_UDELAY);
@@ -239,7 +242,6 @@ static int ec_poll(struct acpi_ec *ec)
                                                msecs_to_jiffies(1)))
                                        return 0;
                        }
-                       advance_transaction(ec, acpi_ec_read_status(ec));
                } while (time_before(jiffies, delay));
                pr_debug(PREFIX "controller reset, restart transaction\n");
                spin_lock_irqsave(&ec->lock, flags);
@@ -844,6 +846,7 @@ static int acpi_ec_add(struct acpi_device *device)
                          ec->gpe, ec->command_addr, ec->data_addr);

        ret = ec_install_handlers(ec);
+       set_bit(EC_FLAGS_GPE_STORM, &ec->flags);

        /* EC is fully operational, allow queries */
        clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
Comment 29 barton xu 2013-08-23 11:16:25 UTC
Created attachment 107289 [details]
ec.c
Comment 30 barton xu 2013-08-23 11:18:45 UTC
Created attachment 107290 [details]
syslog
Comment 31 barton xu 2013-08-23 11:21:48 UTC
(In reply to Lan Tianyu from comment #28)
> Hi, please recovery the advance_transaction() (especally the udelay) and then
> apply this patch. Let's see whether this makes the issue easily reproduces.
> 
> 
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index 8dd2d4d..9ac8431 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -228,6 +228,9 @@ static int ec_poll(struct acpi_ec *ec)
>                 unsigned long delay = jiffies +
>                         msecs_to_jiffies(ec_delay);
>                 do {
> +                       advance_transaction(ec, acpi_ec_read_status(ec));
> +                       advance_transaction(ec, acpi_ec_read_status(ec));
> +
>                         /* don't sleep with disabled interrupts */
>                         if (EC_FLAGS_MSI || irqs_disabled()) {
>                                 udelay(ACPI_EC_MSI_UDELAY);
> @@ -239,7 +242,6 @@ static int ec_poll(struct acpi_ec *ec)
>                                                 msecs_to_jiffies(1)))
>                                         return 0;
>                         }
> -                       advance_transaction(ec, acpi_ec_read_status(ec));
>                 } while (time_before(jiffies, delay));
>                 pr_debug(PREFIX "controller reset, restart transaction\n");
>                 spin_lock_irqsave(&ec->lock, flags);
> @@ -844,6 +846,7 @@ static int acpi_ec_add(struct acpi_device *device)
>                           ec->gpe, ec->command_addr, ec->data_addr);
> 
>         ret = ec_install_handlers(ec);
> +       set_bit(EC_FLAGS_GPE_STORM, &ec->flags);
> 
>         /* EC is fully operational, allow queries */
>         clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);



    Try this patch, test no fail.
    From the syslog, seems interrupt mode is disabled by setting "EC_FLAGS_GPE_STORM" flag.
Comment 32 Lan Tianyu 2013-09-02 08:04:37 UTC
Yes, it uses polling mode. But from the log, the delay between two write operations is still large. So please try the following patch. Thanks.

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index a06d983..8a9ef09 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -234,10 +234,14 @@ static int ec_poll(struct acpi_ec *ec)
                                if (ec_transaction_done(ec))
                                        return 0;
                        } else {
-                               if (wait_event_timeout(ec->wait,
-                                               ec_transaction_done(ec),
-                                               msecs_to_jiffies(1)))
+                               udelay(5);
+                               if (ec_transaction_done(ec))
                                        return 0;
+
+//                             if (wait_event_timeout(ec->wait,
+//                                             ec_transaction_done(ec),
+//                                             msecs_to_jiffies(1)))
+//                                     return 0;
                        }
                        advance_transaction(ec, acpi_ec_read_status(ec));
                } while (time_before(jiffies, delay));
@@ -844,6 +848,7 @@ static int acpi_ec_add(struct acpi_device *device)
                          ec->gpe, ec->command_addr, ec->data_addr);
 
        ret = ec_install_handlers(ec);
+       set_bit(EC_FLAGS_GPE_STORM, &ec->flags);
 
        /* EC is fully operational, allow queries */
        clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
Comment 33 barton xu 2013-09-04 03:39:38 UTC
Created attachment 107410 [details]
syslog-comment32
Comment 34 barton xu 2013-09-04 03:40:17 UTC
Try this patch, no fail. Attach syslog-comment32.
From the experiments I have done, I find the fail only occurs when the interrupt mode and polling mode are mixed.
Can we find out why they are mixed?
Comment 35 Lan Tianyu 2013-09-04 07:21:39 UTC
From the log, the interval between two write operations is about 180~200us. The previous failure logs show the interval is about 10us. This seems to only happen   when interrupt mode and polling mode are mixed.

Further more, IBF lasts about 150us after data is written to EC from the log.

[ 3484.705387] bxudbg EC write data = 0x1c
[ 3484.705562] bxudbg EC write data = 0x7
[ 3484.748047] bxudbg EC write data = 0x1c
[ 3484.748247] bxudbg EC read data = 0x7
[ 3484.753533] bxudbg EC write data = 0x1c
[ 3484.753738] bxudbg EC read data = 0x7
[ 3484.761422] bxudbg EC write data = 0x1c
[ 3484.761605] bxudbg EC write data = 0x6
[ 3487.273732] bxudbg EC write data = 0x1c
[ 3487.273932] bxudbg EC read data = 0x6
[ 3487.301281] bxudbg EC write data = 0x1c
[ 3487.301489] bxudbg EC read data = 0x6
[ 3487.307967] bxudbg EC write data = 0x1c
[ 3487.308143] bxudbg EC write data = 0x7
[ 3487.371678] bxudbg EC write data = 0x1c
[ 3487.371880] bxudbg EC read data = 0x7
[ 3487.380098] bxudbg EC write data = 0x1c
[ 3487.380300] bxudbg EC read data = 0x7
[ 3487.388520] bxudbg EC write data = 0x1c
[ 3487.388712] bxudbg EC write data = 0x6


Could you provide the whole log when failure happen? The log before the fail transaction maybe have some clues.

> Can we find out why they are mixed?
Following is that change log.

commit 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Sun Aug 30 03:06:14 2009 +0400

    ACPI: EC: Merge IRQ and POLL modes
    
    In general, EC transaction should complete in less than 1ms, thus it is possible to merge wait for
    1ms in poll mode and 1ms of interrupt transaction timeout.
    Still, driver will wait 500ms for EC to complete transaction.
    
    This significantly simplifies driver and makes it immune to problematic EC interrupt
    implementations.
    
    It also may lessen kernel start-up time by 500ms.
    
    References: http://bugzilla.kernel.org/show_bug.cgi?id=12949
Comment 36 barton xu 2013-09-04 10:18:51 UTC
Hi Tianyu,

    Please refer to ec039.c, I replaced wait_event_timeout with its function body and added some track point.

    I run the test and found the fail case as follows:

fail analysis:
    After writing command 0x81, 
    the normal flow:
        The main process should sleep 1ms, and interrupt mode drive the transaction progress
    the fail case flow: 
        The main process only sleep 127us, so the "advance_transaction" in "ec_poll" was executed(write address 0x1c) before EC trigger gpe, when gpe was triggered, the "advance_transaction" in "acpi_ec_gpe_handler" was executed(write data 0x7), so for EC, address 0x1c was missed.


[  419.762904] bxudbg EC read status = 0x0
[  419.762904] bxudbg EC write command = 0x81
[  419.762907] bxudbg track_point_0
[  419.762908] bxudbg track_point_1
[  419.763035] bxudbg track_point_3 ret = 0 //schedule only 127us
[  419.763037] bxudbg track_point_5
[  419.763041] bxudbg EC read status = 0x8
[  419.763041] bxudbg EC write data = 0x1c
[  419.763044] bxudbg EC read status = 0x8
[  419.763045] bxudbg ec interrupt, status = 0x8
[  419.763046] bxudbg track_point_0
[  419.763047] bxudbg track_point_1
[  419.763048] bxudbg EC write data = 0x7
[  419.763065] bxudbg EC read status = 0x2
[  419.763066] bxudbg ec interrupt, status = 0x2
[  419.763253] bxudbg EC read status = 0x0
[  419.763256] bxudbg ec interrupt, status = 0x0
[  419.763260] bxudbg EC read status = 0x0
[  419.763261] bxudbg ec wake up
[  419.763266] bxudbg EC read status = 0x0
[  419.763278] bxudbg track_point_3 ret = 1
[  419.763278] bxudbg track_point_1
[  419.763279] bxudbg track_point_2
[  419.763280] bxudbg track_point_4
[  419.763283] bxudbg EC read status = 0x0
Comment 37 barton xu 2013-09-04 10:21:22 UTC
Created attachment 107412 [details]
ec039.c
Comment 38 barton xu 2013-09-04 10:33:07 UTC
Created attachment 107413 [details]
syslog for ec039.c
Comment 39 barton xu 2013-09-06 11:13:15 UTC
Hi Tianyu,

   
   I put fail cases into two categories.
   1. unknown EC interrupt meanwhile host write command 0x81, then the IBF isn't set, so we assume EC has got the command 0x81 falsely, and we send the address 0x1C to EC
   2. abnormal schedule sleep time, then host send the data 0x7 stick with send the address 0x1C


   CMD bit = 1, means in the data register is a command byte
   CMD bit = 0, means in the data register is a data byte

   so for fail case 1, if we see the CMD bit = 0, we know EC doesn't get the command 0x81, the unknown interrupt isn't for inform host write address 0x1c,
we can skip write address 0x1c in advance_transaction.
   for fail case 2, if we see the CMD bit = 1, we know EC doesn't get the address 0x1C, we can skip write data 0x7 in advance_transaction.

[  108.824947] bxudbg EC read status = 0x0
[  108.824948] bxudbg EC write command = 0x81      
[  108.824950] bxudbg EC read status = 0x0      //CMD bit = 0, abnormal
[  108.824950] bxudbg ec interrupt, status = 0x0   
[  108.824952] bxudbg EC write data = 0x1c        
[  108.824955] bxudbg track_point_0
[  108.824956] bxudbg track_point_1
[  108.827442] bxudbg track_point_3 ret = 0
[  108.827446] bxudbg track_point_5
[  108.827451] bxudbg EC read status = 0x0
[  108.827452] bxudbg EC write data = 0x7
[  108.827455] bxudbg track_point_0
[  108.827456] bxudbg track_point_1
[  108.831449] bxudbg track_point_3 ret = 0
[  108.831451] bxudbg track_point_5
[  108.831455] bxudbg EC read status = 0x0
[  108.831456] bxudbg track_point_4
[  108.831459] bxudbg EC read status = 0x0



[  419.762904] bxudbg EC read status = 0x0
[  419.762904] bxudbg EC write command = 0x81
[  419.762907] bxudbg track_point_0
[  419.762908] bxudbg track_point_1
[  419.763035] bxudbg track_point_3 ret = 0 //schedule only 127us
[  419.763037] bxudbg track_point_5
[  419.763041] bxudbg EC read status = 0x8
[  419.763041] bxudbg EC write data = 0x1c
[  419.763044] bxudbg EC read status = 0x8     //CMD bit = 1, abnormal
[  419.763045] bxudbg ec interrupt, status = 0x8
[  419.763046] bxudbg track_point_0
[  419.763047] bxudbg track_point_1
[  419.763048] bxudbg EC write data = 0x7
[  419.763065] bxudbg EC read status = 0x2
[  419.763066] bxudbg ec interrupt, status = 0x2
[  419.763253] bxudbg EC read status = 0x0
[  419.763256] bxudbg ec interrupt, status = 0x0
[  419.763260] bxudbg EC read status = 0x0
[  419.763261] bxudbg ec wake up
[  419.763266] bxudbg EC read status = 0x0
[  419.763278] bxudbg track_point_3 ret = 1
[  419.763278] bxudbg track_point_1
[  419.763279] bxudbg track_point_2
[  419.763280] bxudbg track_point_4
[  419.763283] bxudbg EC read status = 0x0



I add the CMD bit as condition in advance_transaction, test no fail


static void advance_transaction(struct acpi_ec *ec, u8 status)
{
	unsigned long flags;
	struct transaction *t = ec->curr;

	spin_lock_irqsave(&ec->lock, flags);
	if (!t)
		goto unlock;
	if (t->wlen > t->wi) {
//bxudbg>>>
		if (t->wi == 0) {
			if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0x08))
				acpi_ec_write_data(ec,t->wdata[t->wi++]);
			else
				goto err;
		}
		else {
			if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0))
				acpi_ec_write_data(ec,t->wdata[t->wi++]);
			else
				goto err;
		}
//bxudbg<<<
//		if ((status & ACPI_EC_FLAG_IBF) == 0)
//			acpi_ec_write_data(ec,
//				t->wdata[t->wi++]);
//		else
//			goto err;



    Could you review my change, is there any risk?
Comment 40 Lan Tianyu 2013-09-09 08:54:51 UTC
(In reply to barton xu from comment #39)
> Hi Tianyu,
> 
>    
>    I put fail cases into two categories.
>    1. unknown EC interrupt meanwhile host write command 0x81, then the IBF
> isn't set, so we assume EC has got the command 0x81 falsely, and we send the
> address 0x1C to EC

When do you mean unknown interrupt? This maybe a SCI event.
The operation of writing command or data in advance_transaction is under spin_lock_irq protection and irq will be disabled.

Could you show some logs of this case?

>    2. abnormal schedule sleep time, then host send the data 0x7 stick with
> send the address 0x1C
> 
> 
>    CMD bit = 1, means in the data register is a command byte
>    CMD bit = 0, means in the data register is a data byte
> 
>    so for fail case 1, if we see the CMD bit = 0, we know EC doesn't get the
> command 0x81, the unknown interrupt isn't for inform host write address 0x1c,
> we can skip write address 0x1c in advance_transaction.

For this case, the IBF also is not set to 1. So maybe we can try to check IGB
after write cmd/data. If it's not set to 1, retry several times.

How about following change?

But I don't know why cmd wasn't written to register.

lanlty@bee:/c/lanlty/kernel/linux-pm$ git diff drivers/acpi/ec.c
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 8dd2d4d..8770c68 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -176,15 +176,28 @@ static void advance_transaction(struct acpi_ec *ec, u8 status)
 {
        unsigned long flags;
        struct transaction *t = ec->curr;
+       int i;
 
        spin_lock_irqsave(&ec->lock, flags);
        if (!t)
                goto unlock;
        if (t->wlen > t->wi) {
-               if ((status & ACPI_EC_FLAG_IBF) == 0)
+               if ((status & ACPI_EC_FLAG_IBF) == 0) {
+retry:
                        acpi_ec_write_data(ec,
-                               t->wdata[t->wi++]);
-               else
+                               t->wdata[t->wi]);
+
+                       if (acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF) {
+                               if (i < 3) {
+                                       i++;    
+                                       goto retry;
+
+                               } else
+                                       pr_warn("write error/n");
+                       }
+               
+                       t->wi++;
+               } else
                        goto err;
        } else if (t->rlen > t->ri) {
                if ((status & ACPI_EC_FLAG_OBF) == 1) {



>    for fail case 2, if we see the CMD bit = 1, we know EC doesn't get the
> address 0x1C, we can skip write data 0x7 in advance_transaction.

This seems to be strange. Why CMD bit was set to 1 after data was written to register?

> 
> [  108.824947] bxudbg EC read status = 0x0
> [  108.824948] bxudbg EC write command = 0x81      
> [  108.824950] bxudbg EC read status = 0x0      //CMD bit = 0, abnormal
> [  108.824950] bxudbg ec interrupt, status = 0x0   
> [  108.824952] bxudbg EC write data = 0x1c        
> [  108.824955] bxudbg track_point_0
> [  108.824956] bxudbg track_point_1
> [  108.827442] bxudbg track_point_3 ret = 0
> [  108.827446] bxudbg track_point_5
> [  108.827451] bxudbg EC read status = 0x0
> [  108.827452] bxudbg EC write data = 0x7
> [  108.827455] bxudbg track_point_0
> [  108.827456] bxudbg track_point_1
> [  108.831449] bxudbg track_point_3 ret = 0
> [  108.831451] bxudbg track_point_5
> [  108.831455] bxudbg EC read status = 0x0
> [  108.831456] bxudbg track_point_4
> [  108.831459] bxudbg EC read status = 0x0
> 
> 
> 
> [  419.762904] bxudbg EC read status = 0x0
> [  419.762904] bxudbg EC write command = 0x81
> [  419.762907] bxudbg track_point_0
> [  419.762908] bxudbg track_point_1
> [  419.763035] bxudbg track_point_3 ret = 0 //schedule only 127us
> [  419.763037] bxudbg track_point_5
> [  419.763041] bxudbg EC read status = 0x8
> [  419.763041] bxudbg EC write data = 0x1c
> [  419.763044] bxudbg EC read status = 0x8     //CMD bit = 1, abnormal
> [  419.763045] bxudbg ec interrupt, status = 0x8
> [  419.763046] bxudbg track_point_0
> [  419.763047] bxudbg track_point_1
> [  419.763048] bxudbg EC write data = 0x7
> [  419.763065] bxudbg EC read status = 0x2
> [  419.763066] bxudbg ec interrupt, status = 0x2
> [  419.763253] bxudbg EC read status = 0x0
> [  419.763256] bxudbg ec interrupt, status = 0x0
> [  419.763260] bxudbg EC read status = 0x0
> [  419.763261] bxudbg ec wake up
> [  419.763266] bxudbg EC read status = 0x0
> [  419.763278] bxudbg track_point_3 ret = 1
> [  419.763278] bxudbg track_point_1
> [  419.763279] bxudbg track_point_2
> [  419.763280] bxudbg track_point_4
> [  419.763283] bxudbg EC read status = 0x0
> 
> 
> 
> I add the CMD bit as condition in advance_transaction, test no fail
> 
> 
> static void advance_transaction(struct acpi_ec *ec, u8 status)
> {
>       unsigned long flags;
>       struct transaction *t = ec->curr;
> 
>       spin_lock_irqsave(&ec->lock, flags);
>       if (!t)
>               goto unlock;
>       if (t->wlen > t->wi) {
> //bxudbg>>>
>               if (t->wi == 0) {
>                       if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status &
> ACPI_EC_FLAG_CMD) ==
> 0x08))
>                               acpi_ec_write_data(ec,t->wdata[t->wi++]);
>                       else
>                               goto err;
>               }
>               else {
>                       if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status &
> ACPI_EC_FLAG_CMD) ==
> 0))
>                               acpi_ec_write_data(ec,t->wdata[t->wi++]);
>                       else
>                               goto err;
>               }
> //bxudbg<<<
> //            if ((status & ACPI_EC_FLAG_IBF) == 0)
> //                    acpi_ec_write_data(ec,
> //                            t->wdata[t->wi++]);
> //            else
> //                    goto err;
> 
> 
> 
>     Could you review my change, is there any risk?
Comment 41 Lan Tianyu 2013-09-09 09:43:28 UTC
Created attachment 107801 [details]
ec.patch

Sorry. Please ignore previous patch and try this one.
Comment 42 barton xu 2013-09-12 10:41:39 UTC
Hi Tianyu,



> 
> When do you mean unknown interrupt? This maybe a SCI event.
> The operation of writing command or data in advance_transaction is under
> spin_lock_irq protection and irq will be disabled.
> 
> Could you show some logs of this case?
> 

log like below, irq will be enabled after I/O-write instruction, I think interrupt can reach CPU faster than I/O transmission reach EC.
This maybe a SCI event, but it is surely not a interrupt to inform host doing advance_transaction.

[  108.824947] bxudbg EC read status = 0x0
[  108.824948] bxudbg EC write command = 0x81      
[  108.824950] bxudbg EC read status = 0x0      //CMD bit = 0, abnormal
[  108.824950] bxudbg ec interrupt, status = 0x0    //unknown interrupt
[  108.824952] bxudbg EC write data = 0x1c        
[  108.824955] bxudbg track_point_0
[  108.824956] bxudbg track_point_1
[  108.827442] bxudbg track_point_3 ret = 0
[  108.827446] bxudbg track_point_5
[  108.827451] bxudbg EC read status = 0x0
[  108.827452] bxudbg EC write data = 0x7
[  108.827455] bxudbg track_point_0
[  108.827456] bxudbg track_point_1
[  108.831449] bxudbg track_point_3 ret = 0
[  108.831451] bxudbg track_point_5
[  108.831455] bxudbg EC read status = 0x0
[  108.831456] bxudbg track_point_4
[  108.831459] bxudbg EC read status = 0x0
 


> 
> This seems to be strange. Why CMD bit was set to 1 after data was written to
> register?
> 

This means that EC doesn't got the data.


Once fail system on my hand, I will try your patch.
Comment 43 Lan Tianyu 2013-09-23 01:32:34 UTC
hi, anyupdate?
Comment 44 barton xu 2013-09-27 10:44:41 UTC
Hi Tianyu,

    Sorry for late reply, I just got the fail system.

    Try your patch, test fail but no retry fail in advance_transaction.


code:
		if ((status & ACPI_EC_FLAG_IBF) == 0) {
retry:
			acpi_ec_write_data(ec, t->wdata[t->wi++]);
			if(!(acpi_ec_read_status(ec) & ACPI_EC_FLAG_IBF)) {
				if(i<3) {
					i++;
					goto retry;
				} else
					pr_warn("bxuflag write error\n");
			}
		} else
			goto err;

syslog:
 kernel: [  139.362786] bxudbg EC read status = 0x0
 kernel: [  139.362787] bxudbg EC write command = 0x81
 kernel: [  139.362788] bxudbg EC read status = 0x0
 kernel: [  139.362789] bxudbg ec interrupt, status = 0x0
 kernel: [  139.362791] bxudbg EC write data = 0x1c   //abnormal
 kernel: [  139.362796] bxudbg EC read status = 0x2
 kernel: [  139.362796] bxudbg track_point_0
 kernel: [  139.362797] bxudbg track_point_1
 kernel: [  139.362992] bxudbg track_point_3 ret = 0
 kernel: [  139.362996] bxudbg track_point_5
 kernel: [  139.363002] bxudbg EC read status = 0x0
 kernel: [  139.363003] bxudbg EC write data = 0x7
 kernel: [  139.363008] bxudbg EC read status = 0x2
 kernel: [  139.363009] bxudbg track_point_0
 kernel: [  139.363009] bxudbg track_point_1
 kernel: [  139.367032] bxudbg track_point_3 ret = 0
 kernel: [  139.367040] bxudbg track_point_5
 kernel: [  139.367046] bxudbg EC read status = 0x0
 kernel: [  139.367050] bxudbg track_point_4
 kernel: [  139.367055] bxudbg EC read status = 0x0


Any other suggestion?
Comment 45 Lan Tianyu 2013-10-08 07:22:40 UTC
Created attachment 110431 [details]
ec.patch

Sorry, just come back from vacation. We should check IBG and do retry after writing command rather than writing data. So please try this patch. Thanks.
Comment 46 Lan Tianyu 2013-10-23 02:17:01 UTC
Any update?
Comment 47 barton xu 2013-11-08 04:08:16 UTC
Hi Tianyu,

    Sorry for late reply. 

    I have tried your patch, still fail.

    Fail log as follows:

A24J kernel: [  261.141934] bxudbg EC write command = 0x81
A24J kernel: [  261.141941] bxudbg EC read status = 0xa
A24J kernel: [  261.141943] bxudbg track_point_0
A24J kernel: [  261.141943] bxudbg track_point_1
A24J kernel: [  261.142238] bxudbg track_point_3 ret = 0  //schedule only 295us
A24J kernel: [  261.142240] bxudbg EC read status = 0x8
A24J kernel: [  261.142241] bxudbg track_point_5
A24J kernel: [  261.142243] bxudbg ec interrupt, status = 0x8
A24J kernel: [  261.142244] bxudbg EC read status = 0x8
A24J kernel: [  261.142245] bxudbg EC write data = 0x1c   //bxuflag
A24J kernel: [  261.142250] bxudbg EC read status = 0x2
A24J kernel: [  261.142251] bxudbg EC write data = 0x7    //bxuflag
A24J kernel: [  261.142256] bxudbg EC read status = 0x2
A24J kernel: [  261.142257] bxudbg track_point_0
A24J kernel: [  261.142258] bxudbg track_point_1
A24J kernel: [  261.142275] bxudbg EC read status = 0x2
A24J kernel: [  261.142276] bxudbg ec interrupt, status = 0x2
A24J kernel: [  261.142469] bxudbg EC read status = 0x0
A24J kernel: [  261.142471] bxudbg ec interrupt, status = 0x0
A24J kernel: [  261.142475] bxudbg EC read status = 0x0
A24J kernel: [  261.142476] bxudbg ec wake up
A24J kernel: [  261.142482] bxudbg EC read status = 0x0
A24J kernel: [  261.142494] bxudbg track_point_3 ret = 1
A24J kernel: [  261.142495] bxudbg track_point_1
A24J kernel: [  261.142496] bxudbg track_point_2
A24J kernel: [  261.142497] bxudbg track_point_4
A24J kernel: [  261.142500] bxudbg EC read status = 0x0
A24J kernel: [  262.381359] bxudbg EC read status = 0x20
A24J kernel: [  262.381361] bxudbg ec interrupt, status = 0x20
A24J kernel: [  262.381365] bxudbg EC read status = 0x20
A24J kernel: [  262.381366] bxudbg ec wake up
A24J kernel: [  262.381369] bxudbg EC read status = 0x20
Comment 48 barton xu 2013-11-08 09:08:33 UTC
Hi Tianyu,

   Till now, seems checking CMD bit in advance_transaction can sequence the transaction.
   Can we use CMD bit?

ACPI spec said CMD bit only used by controller,

SMI_EVT: 1 – Indicates SMI event is pending (requesting SMI query).
0 – No SMI events are pending.
SCI_EVT: 1 – Indicates SCI event is pending (requesting SCI query).
0 – No SCI events are pending.
BURST: 1 – Controller is in burst mode for polled command processing.
0 – Controller is in normal mode for interrupt-driven command processing.
CMD: 1 – Byte in data register is a command byte (only used by controller).
0 – Byte in data register is a data byte (only used by controller).
IBF: 1 – Input buffer is full (data ready for embedded controller).
0 – Input buffer is empty.
OBF: 1 – Output buffer is full (data ready for host).
0 – Output buffer is empty.
Comment 49 Lan Tianyu 2013-11-21 08:00:31 UTC
Created attachment 115371 [details]
ec.patch

It's better to check the bit in the start_transaction(), could you check whether this patch can fix your issue. 

But I am not sure we can do this since it said "only used by controller" and there maybe different behaviour on the other machines.
Comment 50 Lan Tianyu 2013-12-11 09:47:44 UTC
Any update?
Comment 51 Lan Tianyu 2014-01-27 07:36:17 UTC
Since no response for long time, close this bug and feel free reopen it if there is any update.
Comment 52 barton xu 2014-04-17 02:12:45 UTC
Hi Tianyu,

     I had returned the fail laptop to customer earlier, so no update till now.
     My colleague got a repro system now, could we open the bug again?
Comment 53 barton xu 2014-05-15 12:11:26 UTC
Hi Tianyu,

     Could you still give your great help for this issue? Thanks.
Comment 54 Arthur Chen 2014-05-19 01:44:56 UTC
Hi Tianyu:

If there is no reposone from you tomorrow, we will file new bug for tracking, thanks for help.
Comment 55 Lan Tianyu 2014-05-19 03:36:56 UTC
I am busy on the other things. So reassign to default assignee
Comment 56 Arthur Chen 2014-05-19 03:55:24 UTC
Thanks,Lan.

Hi Barton: can you attach last patch that we confirmed it from our side without issue for the review ?
Comment 57 barton xu 2014-05-19 06:03:43 UTC
I have shown my change in comment 39, copy to here again


static void advance_transaction(struct acpi_ec *ec, u8 status)
{
	unsigned long flags;
	struct transaction *t = ec->curr;

	spin_lock_irqsave(&ec->lock, flags);
	if (!t)
		goto unlock;
	if (t->wlen > t->wi) {
//bxudbg>>>
		if (t->wi == 0) {
			if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0x08))
				acpi_ec_write_data(ec,t->wdata[t->wi++]);
			else
				goto err;
		}
		else {
			if (((status & ACPI_EC_FLAG_IBF) == 0) && ((status & ACPI_EC_FLAG_CMD) == 0))
				acpi_ec_write_data(ec,t->wdata[t->wi++]);
			else
				goto err;
		}
//bxudbg<<<
//		if ((status & ACPI_EC_FLAG_IBF) == 0)
//			acpi_ec_write_data(ec,
//				t->wdata[t->wi++]);
//		else
//			goto err;



Hi tianyu,

     You patch in comment 49 can lower the fail rate, but still fail.
Comment 58 Arthur Chen 2014-05-23 06:21:10 UTC
Could we have someone to review the patch at #57 ? Thanks!
Comment 59 Arthur Chen 2014-06-06 10:10:16 UTC
Hi Tianyu:

Any chance you can get this bug back and help to do final review ?
Comment 60 Lan Tianyu 2014-06-12 02:26:11 UTC
Sorry. For later response, I still think the patch in the Comment 12 should be right solution and it works without no failure, right?
Comment 61 Lan Tianyu 2014-06-12 02:31:00 UTC
Created attachment 139361 [details]
ec.patch

Please try this patch.
Comment 62 Arthur Chen 2014-06-12 10:13:36 UTC
Hi Tianyu:

We are going to verify it , we should provide some update by early next week. 

Thanks
Arthur
Comment 63 Arthur Chen 2014-06-17 06:48:31 UTC
Hi Tianyu:

We've tried your new patch but it didn't help , Is it possible that you can review  advance_transaction() modification at comment 57 ? that is the pass condition we have right now.

Thanks for help!
-Arthur
Comment 64 Lan Tianyu 2014-06-17 07:44:23 UTC
Yes, I think it complies with ACPI spec. But from original log, I think the root case is race condition issue.

There is a new patchset to fix race condition issue in the EC driver, could you please test it?
http://marc.info/?l=linux-acpi&m=140279288812650&w=2
Comment 65 Arthur Chen 2014-06-17 08:05:11 UTC
Hi Tianyu:

I couldn't find the patch from the link http://marc.info/?l=linux-acpi&m=140279288812650&w=2, could you help to attach the new patchset to this bug for my validation ? 

Thanks for help!
-Arthur
Comment 67 Arthur Chen 2014-06-17 08:23:50 UTC
Hi Tinayu:

Should I apply all of patches to do the validation ? just want to make sure we are doing right here ? or the patch from #65 is good enough ? Thanks!
Comment 68 Lan Tianyu 2014-06-17 08:31:38 UTC
Please apply all patches in the Comment 66. The patch in the #65 is an introduction for the patchset. Thanks.
Comment 69 Arthur Chen 2014-06-17 08:38:25 UTC
OK, I got it, we will try that. thanks!
Comment 70 Arthur Chen 2014-06-18 12:30:09 UTC
Hi Tianyu:

We merged the patches from #66 manually and confirmed test result is positive.

Can you help to confirm if all the patches are check-in already ? we would like to confirm with official release. Thanks!
Comment 71 Lan Tianyu 2014-06-19 05:54:01 UTC
It will be merged into v3.16 and backport to some older stable kernel.
Comment 72 Arthur Chen 2014-06-19 06:03:55 UTC
Hi Tianyu:

Can you give us update once those patches merged to v3.16, we would like to double check with official kernel.  

Thanks for help!
-Arthur
Comment 73 Lan Tianyu 2014-06-19 06:05:08 UTC
Ok. No problem.
Comment 74 Lv Zheng 2014-06-20 02:20:29 UTC
Thanks for the reporting.
I'll add you guys into the patchset.
Marking this bug as RESOLVED...

Best regards
-Lv
Comment 75 Pan Kuo 2014-07-04 08:45:11 UTC
Hi Tianyu/Lv,

I checked the v3.16-rc3 but couldn't find this ec patch.
Can you help to check?
Thanks.
Comment 76 Lan Tianyu 2014-07-04 08:47:05 UTC
No, ACPI maintainer has yet come back from vacation. So it will be merged later.
Comment 77 Lv Zheng 2014-07-04 08:48:13 UTC
The patchset will appear in v3.17 according to our schedule.
Comment 78 Pan Kuo 2014-07-04 08:51:24 UTC
I see. Thanks for prompt response!
Comment 79 Lan Tianyu 2014-07-08 03:31:10 UTC
The patchset has been merged into linux-pm tree. 

http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/?h=bleeding-edge
Comment 80 Pan Kuo 2014-09-15 07:37:47 UTC
Hi LV,

Can you confirm this patch (http://marc.info/?l=linux-acpi&m=140279299612694&w=2) has check-in already or we should wait for v3.17?

Thanks,
Pan
Comment 81 Lv Zheng 2014-09-16 00:06:33 UTC
(In reply to Pan Kuo from comment #80)
> Hi LV,
> 
> Can you confirm this patch
> (http://marc.info/?l=linux-acpi&m=140279299612694&w=2) has check-in already
> or we should wait for v3.17?

It's upstreamed since v3.17-rc1.

Thanks
-Lv
Comment 82 Pan Kuo 2014-09-16 01:54:21 UTC
Thanks for reply, LV.

Note You need to log in before you can comment on or make changes to this bug.