Bug 197047 - Average power consumption in idle increased by 1.3-1.5 times due to events by INT3432 - Dell Venue 11 Pro 7140
Summary: Average power consumption in idle increased by 1.3-1.5 times due to events by...
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Mika Westerberg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-26 20:58 UTC by RussianNeuroMancer
Modified: 2018-10-26 08:20 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.13.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with Linux 4.13.3 (58.18 KB, text/plain)
2017-09-26 20:58 UTC, RussianNeuroMancer
Details
turbostat on Linux 4.9.3 normal boot (13.72 KB, text/plain)
2018-06-14 16:01 UTC, RussianNeuroMancer
Details
powertop on Linux 4.9.3 normal boot (94.67 KB, text/html)
2018-06-14 16:01 UTC, RussianNeuroMancer
Details
turbostat on Linux 4.9.3 boot after Linux 4.17.0 (68.59 KB, text/plain)
2018-06-14 16:02 UTC, RussianNeuroMancer
Details
powertop on Linux 4.9.3 boot after Linux 4.17.0 (94.86 KB, text/html)
2018-06-14 16:02 UTC, RussianNeuroMancer
Details
acpidump output on Linux 4.9.3 normal boot (585.47 KB, text/plain)
2018-06-29 08:51 UTC, RussianNeuroMancer
Details
/proc/interrupts content on Linux 4.9.3 normal boot (3.09 KB, text/plain)
2018-06-29 08:52 UTC, RussianNeuroMancer
Details
/sys/firmware/acpi/interrupts/ content on Linux 4.9.3 normal boot (9.94 KB, text/plain)
2018-06-29 08:53 UTC, RussianNeuroMancer
Details
acpidump output on Linux 4.9.3 boot after Linux 4.17.0 (585.47 KB, text/plain)
2018-06-29 08:54 UTC, RussianNeuroMancer
Details
/proc/interrupts content on Linux 4.9.3 boot after Linux 4.17.0 (3.09 KB, text/plain)
2018-06-29 08:55 UTC, RussianNeuroMancer
Details
/sys/firmware/acpi/interrupts/ content on Linux 4.9.3 boot after Linux 4.17.0 (9.94 KB, text/plain)
2018-06-29 08:56 UTC, RussianNeuroMancer
Details
i2c-ls (1.24 KB, text/plain)
2018-09-02 08:39 UTC, RussianNeuroMancer
Details
/proc/interrupts content on Linux 4.18.6 normal boot with i2c_hid module blacklisted (3.01 KB, text/plain)
2018-09-13 04:42 UTC, RussianNeuroMancer
Details
iio devices list (162 bytes, text/plain)
2018-10-12 07:55 UTC, RussianNeuroMancer
Details

Description RussianNeuroMancer 2017-09-26 20:58:32 UTC
Created attachment 258613 [details]
dmesg with Linux 4.13.3

On Dell Venue 11 Pro 7140 average power consumption in idle increased by 1.3-1.5 times due to events coming from INT343A. According to powertop since Linux 4.10 INT3432:00 generate around two hundred events on average, in /sys/devices/pci0000:00/INT3432:00/i2c-6 there is two devices: INT343A and SMO91D0. AFAIK INT343A is rt286.

With Linux 4.9.0-4.9.45, Linux 4.11.0-4.11.12 in idle there is around 100 wakeups per second in sum, battery discharge rate around 3-3.5 Watts per second.
But with Linux 4.9.46-4.9.51, Linux 4.10.0-4.10.17, Linux 4.12.0rc1-4.13.3 - around 300 wakeups per second on average, due to events coming from INT3432:00. With Linux 4.13.3 battery discharge rate around 4.5 Watts per second.
Probably some commit was backported to Linux 4.9 between .45 and .46 releases.
I have no idea why issue is not reproducible on any Linux 4.11 release I tried.

Sometimes events rate fall from two hundred to one hundred for shorts period of time (for example I observe this right now on Linux 4.10.0 while removing/installing packages).

Message like this sometimes appear in dmesg:
[  731.226730] i2c_hid i2c-SMO91D0:00: i2c_hid_get_input: incomplete report (53/13568)

Complete dmesg with Linux 4.13.3 is attached.
Comment 1 RussianNeuroMancer 2017-09-26 21:02:55 UTC
> Sometimes events rate fall from two hundred to one hundred for shorts period
> of time

Correction: here I talk about events coming especially from INT343A, not total events rate.
Comment 2 RussianNeuroMancer 2017-09-26 21:05:08 UTC
Sorry, another correction, just to be sure:

> Sometimes events rate fall from two hundred to one hundred for shorts period
> of time

Here I talk about events coming especially from *INT3432*, not total events rate.
Comment 3 Kai-Heng Feng 2017-10-03 05:33:13 UTC
Can you do a bisect between 4.9.45 and 4.9.46?
Comment 4 Zhang Rui 2017-12-18 03:07:10 UTC
since there are not too many changes between 4.9.45 and 4.9.46, please do git bisect to find out which commit introduces the problem.
Comment 5 RussianNeuroMancer 2017-12-18 11:59:28 UTC
> Can you do a bisect between 4.9.45 and 4.9.46?

> since there are not too many changes between 4.9.45 and 4.9.46, please do git
> bisect to find out which commit introduces the problem.

Thanks for advice! I'll try to do so, as soon as it will be possible. (There is some issues with hardware I usually use for building kernels.)
Comment 6 Zhang Rui 2018-01-15 03:44:13 UTC
any updates?
Comment 7 RussianNeuroMancer 2018-01-15 11:35:19 UTC
Not yet, as issues mentioned above remain unresolved, so I still can't rebuild kernel.
Comment 8 RussianNeuroMancer 2018-02-06 16:41:06 UTC
Hardware I usually use for building kernels is operational again, so I hope to do git bisect between 4.9.45 and 4.9.46 in next couple of weeks.
Comment 9 Zhang Rui 2018-04-02 01:28:48 UTC
ping ...
Comment 10 RussianNeuroMancer 2018-04-03 15:19:16 UTC
Hello!

Albeit I started it with (unexpected for me) delay and doing it slow (due to age of hardware I using for building kernel) bisect is in progress right now.
Comment 11 RussianNeuroMancer 2018-04-10 09:55:49 UTC
On every step I done testing three times but unfortunately commit I come to seems like doesn't make any sense for this issue: 

5f81b1f51b9cfcbfbe7a1abea09962c91bf485e7 is the first bad commit
commit 5f81b1f51b9cfcbfbe7a1abea09962c91bf485e7
Author: Florian Westphal <fw@strlen.de>
Date:   Fri Jul 7 13:07:17 2017 +0200

    netfilter: nat: fix src map lookup
    
    commit 97772bcd56efa21d9d8976db6f205574ea602f51 upstream.
    
    When doing initial conversion to rhashtable I replaced the bucket
    walk with a single rhashtable_lookup_fast().
...


I will re-done bisect doing ten tests on every step this time.
Comment 12 Zhang Rui 2018-05-07 06:02:13 UTC
so I suppose we will have some update about the bisect?
Comment 13 RussianNeuroMancer 2018-05-07 11:15:31 UTC
Yes, with additional tests I find that assumption about Linux 4.9.45 was wrong - 4.9.45 is affected too. Blame commit seems like somewhere between 4.9.0-4.9.8. (Sorry for slow progress on this, hardware for building is old and slow, and every test takes much more time now.)
Comment 14 Zhang Rui 2018-05-30 04:41:59 UTC
any updates?
Comment 15 RussianNeuroMancer 2018-06-09 20:22:44 UTC
As I proceed with bisect (due to various reasons now I have to build inside virtual machine instead of bare metal hardware, so this slow down building by few times) I have difficulties with determining what build have to be marked as good, and what build have to be marked as bad. For example, with 4.9.44 I seen issue reproduced couple of times, but most of the time it doesn't happen with this release. With 4.9.3 issue seems like doesn't happen at all, but if I boot 4.17.0 and then reboot to 4.9.3 - it's there. With 4.9.8 it's seems like the same, but I can't be 100% sure, as situation could be the similar to 4.9.44, where issue is rare but happen sometimes.
But with 4.9.46 issue happen every time.

In your opinion, do I need to chase for commit that makes issue reproduced on every boot in 100% of attempts, or it's better to search for commit that makes issue reproduced at least once? Should I care about reboots from affected kernel into unaffected, like in 4.17.0->4.9.3 or it's possible that newer kernel somehow put hardware into failed state which makes issue happen on unaffected kernels too?
Comment 16 Zhang Rui 2018-06-10 01:24:21 UTC
(In reply to RussianNeuroMancer from comment #15)
> As I proceed with bisect (due to various reasons now I have to build inside
> virtual machine instead of bare metal hardware, so this slow down building
> by few times) I have difficulties with determining what build have to be
> marked as good, and what build have to be marked as bad. For example, with
> 4.9.44 I seen issue reproduced couple of times, but most of the time it
> doesn't happen with this release. With 4.9.3 issue seems like doesn't happen
> at all, but if I boot 4.17.0 and then reboot to 4.9.3 - it's there.

this is important, please attach the output of "turbostat --debug" and "powertop --html=foo" for both good and bad case, in 4.9.3 kernel.

As the problem can also be reproduced on 4.9.3, remove the regression flag for now.
Comment 17 RussianNeuroMancer 2018-06-14 16:01:09 UTC
Created attachment 276553 [details]
turbostat on Linux 4.9.3 normal boot
Comment 18 RussianNeuroMancer 2018-06-14 16:01:28 UTC
Created attachment 276555 [details]
powertop on Linux 4.9.3 normal boot
Comment 19 RussianNeuroMancer 2018-06-14 16:02:02 UTC
Created attachment 276557 [details]
turbostat on Linux 4.9.3 boot after Linux 4.17.0
Comment 20 RussianNeuroMancer 2018-06-14 16:02:20 UTC
Created attachment 276559 [details]
powertop on Linux 4.9.3 boot after Linux 4.17.0
Comment 22 RussianNeuroMancer 2018-06-29 08:50:39 UTC
> hmm, can you please attach the acpidump output, and also the output of
"cat /proc/interrupts" and "grep . /sys/firmware/acpi/interrupts/*" for both
good and bad case.

Sure, all data is uploaded below:
Comment 23 RussianNeuroMancer 2018-06-29 08:51:47 UTC
Created attachment 277031 [details]
acpidump output on Linux 4.9.3 normal boot
Comment 24 RussianNeuroMancer 2018-06-29 08:52:39 UTC
Created attachment 277033 [details]
/proc/interrupts content on Linux 4.9.3 normal boot
Comment 25 RussianNeuroMancer 2018-06-29 08:53:24 UTC
Created attachment 277035 [details]
/sys/firmware/acpi/interrupts/ content on Linux 4.9.3 normal boot
Comment 26 RussianNeuroMancer 2018-06-29 08:54:20 UTC
Created attachment 277037 [details]
acpidump output on Linux 4.9.3 boot after Linux 4.17.0
Comment 27 RussianNeuroMancer 2018-06-29 08:55:00 UTC
Created attachment 277039 [details]
/proc/interrupts content on Linux 4.9.3 boot after Linux 4.17.0
Comment 28 RussianNeuroMancer 2018-06-29 08:56:17 UTC
Created attachment 277041 [details]
/sys/firmware/acpi/interrupts/ content on Linux 4.9.3 boot after Linux 4.17.0
Comment 29 RussianNeuroMancer 2018-07-31 17:42:15 UTC
If continuing bisect could be helpful please clarify how to proceed with it, relevant question is in Comment 15.
Comment 30 Zhang Rui 2018-08-28 08:05:32 UTC
for Linux 4.9.3 normal boot
  7:       4945       1476       1820        275  IR-IO-APIC   7-fasteoi   INT3432:00, INT3433:00

for Linux 4.9.3 boot after Linux 4.17.0
 7:     327339      48015     802942      21344  IR-IO-APIC   7-fasteoi   INT3432:00, INT3433:00

yes. there is indeed an interrupt storm, and this could increase the power consumption easily.

It is very likely that the I2C bus is not powered off cleanly during reboot.

so, when you say normal boot, you mean a cold boot, say, in 4.17.0 kernel, shutdown the machine, and then power on the machine manually to boot into 4.9.3 kernel, right?

If this is true, we are still able to confirm the good and bad kernel, by do cold boot every time, right?

As this seems to be a driver issue, reassign to I2C experts anyway.
Comment 31 RussianNeuroMancer 2018-08-28 13:48:20 UTC
Thank you for looking into logs!

> so, when you say normal boot, you mean a cold boot, say, in 4.17.0 kernel,
> shutdown the machine, and then power on the machine manually to boot into
> 4.9.3 kernel, right?

Yes.

> If this is true, we are still able to confirm the good and bad kernel, by do
> cold boot every time, right?

So, if I cold boot some build for example 10-20 times, and interrupt storm happened at least once, then I should mark it as bad?

Does it count if I reboot (instead of cold boot) same build again and again, and then got interrupt storm after many attempts?
Comment 32 Mika Westerberg 2018-08-30 09:00:15 UTC
Can you also attach contents of /sys/bus/i2c/devices/*? It would be nice to know all devices connected to I2C buses.
Comment 33 RussianNeuroMancer 2018-09-02 08:39:59 UTC
Created attachment 278237 [details]
i2c-ls

> Can you also attach contents of /sys/bus/i2c/devices/*? 

Please look into attached file.

"ls /sys/bus/i2c/devices/*" output is sufficient or some additional info is required?
Comment 34 Jarkko Nikula 2018-09-03 12:49:41 UTC
I see SMO91D0:00 (Sensor Hub) is also generating some amount of interrupts. Maybe something is generating a lot of events from there and that causes a lot of I2C traffic from drivers?
Comment 35 Mika Westerberg 2018-09-03 14:34:55 UTC
Indeed. I wonder if you can unload (or blacklist) those drivers and see if the interrupt count goes low?
Comment 36 RussianNeuroMancer 2018-09-13 04:42:48 UTC
Created attachment 278485 [details]
/proc/interrupts content on Linux 4.18.6 normal boot with i2c_hid module blacklisted

On boot with blacklisted i2c_hid there is 40-60 wakeups per second instead of 300+, /proc/interrupts content is attached.
Comment 37 Mika Westerberg 2018-09-13 09:05:22 UTC
OK, thanks. I kind of suspect that the sensor hub is the one generating those interrupts. Could you blacklist just hid-sensor-hub and see if you still see the interrupt storm?
Comment 38 RussianNeuroMancer 2018-09-14 05:42:22 UTC
> Could you blacklist just hid-sensor-hub and see if you still see the
> interrupt storm?

Blacklisted hid-sensor-hub and get same result as with blacklisting i2c_hid - no interrupt storm. Power consumption is below 3 Watts per second in idle.
Comment 39 Mika Westerberg 2018-09-14 08:02:30 UTC
Thanks. I guess this is not related to I2C host controller driver then. Sensors generate lots of traffic if they are enabled (not sure if there is a way to disable certain from UI).
Comment 40 Mika Westerberg 2018-09-14 08:02:58 UTC
Added Srinivas who knows this area better.
Comment 41 Srinivas Pandruvada 2018-10-03 23:35:43 UTC
You can disable iio_sensor_proxy service  and reboot. Then look at
value of /sys/bus/iio/devices/iio:device*/buffer/enable
They all should be 0. Also better to note the sensor name corresponding to each iio:device*. There is an attribute called "name" under each iio:device*.

Now measure power and see if you still have issue.

If you don't see issue, we can adjust some settings for sensor report interval.
Comment 42 RussianNeuroMancer 2018-10-12 07:55:50 UTC
Created attachment 279003 [details]
iio devices list

Thank you for looking into this issue.

Names and status of sensor with enable iio sensor proxy is attached.

> Now measure power and see if you still have issue.

Issue is not reproducible with removed iio-sensor-proxy (for some reason disabling iio-sensor-proxy.service does not work - it remain enabled and start after reboot, so I removed it).

> If you don't see issue, we can adjust some settings for sensor report
> interval.

Is there patch that I could test?
Comment 43 Srinivas Pandruvada 2018-10-12 16:55:14 UTC
I don't think you need a patch. Enable iio-sensor-proxy again. You probably want to change hysteresis. This will decide how much change in sample data before data is sent to user. OR need to reduce sampling frequency.

Most probably this is accel_3d, which in your case /sys/bus/iio/devices/iio:device3. Recheck with the "name" attribute.

Try adjusting
in_accel_hysteresis to some higher value and read back if this is accepted by the sensor.
For example
#echo 0.000010 > in_accel_hysteresis

Also try to reduce in_accel_sampling_frequency.
Comment 44 RussianNeuroMancer 2018-10-13 08:41:22 UTC
I tried 0.000010 in_accel_hysteresis, then tried 0.000005 and 0.000001. I also tried to reduce in_accel_sampling_frequency from 10 to 10. Unfortunately, all of this doesn't make noticeable difference.
Comment 45 Srinivas Pandruvada 2018-10-13 16:38:00 UTC
"in_accel_sampling_frequency from 10 to 10", may be you mean something.

You have two other devices also. First try this:

For all the devices 
/sys/bus/iio/devices/iio:device*/buffer/enable = 1
make them 0.

echo 0 > /sys/bus/iio/devices/iio:device*/buffer/enable

Then I think you will be fine. Then enable 1 by 1 and see which device has issue.
Then play with those parameters in problem device.
Comment 46 RussianNeuroMancer 2018-10-18 14:19:52 UTC
> "in_accel_sampling_frequency from 10 to 10", may be you mean something.

Sorry, I mean 10 to 1.

> Then I think you will be fine. 

Yes, events stopped, power consumption back to normal.

> Then enable 1 by 1 and see which device has issue.

magn_3d and accel_3d

> Then play with those parameters in problem device.

0.010000 in_accel_hysteresis and 2 in_accel_sampling_frequency produce reasonable power consumption (below 3 watts in idle with enabled screen and wifi) and seems like doesn't affect tablet automatic screen rotation. 1 in_accel_sampling_frequency is noticeably slower. Default 10 in_accel_sampling_frequency consume more power (above 3 watts most of the time) without noticeable improvement to tablet automatic screen rotation.

With magnetometer it's kind of more difficult. I find that 1.000000 in_magn_hysteresis and 0.1 in_magn_sampling_frequency is good for power consumption, but I have no idea how to verify if magnetometer is still usable.
Comment 47 Srinivas Pandruvada 2018-10-18 14:57:23 UTC
These parameters should be set by user space based on the application requirement, kernel can't set. 

I think geoclue is some service uses magnetometer.
Comment 48 RussianNeuroMancer 2018-10-19 13:52:20 UTC
> These parameters should be set by user space based on the application
> requirement, kernel can't set.

Then why kernel version makes a difference? And how actually this bug can be solved? 

> I think geoclue is some service uses magnetometer.

I not sure how correct behaviour should look like, but with untouched Linux kerlenl magn_3d parameters Gnome Maps show my location as if laptop get constantly rotated.
Comment 49 Srinivas Pandruvada 2018-10-19 15:54:10 UTC
These settings directly go to firmware and this part of code is not touched from a long time. Did you update BIOS recently?
Try to revert 	commits 6f92253024d9d947a4f454654840ce479e251376
and 		f1664eaacec31035450132c46ed2915fd2b2049a.

They should have been backported older kernels too. If this fixes this issue, means that sensors were not powered up in your other builds as user space program iio-sensor-proxy has a race condition and failed to power up sensors.	

I think you are able to reproduce the condition even during cold boot not just reboots.
Comment 50 RussianNeuroMancer 2018-10-26 08:20:19 UTC
You are right, on Linux 4.9.0 where power consumption was low and there was no interrupts coming from INT343A - monitor-sensor can't detect orientation and can't get light sensor data.

Note You need to log in before you can comment on or make changes to this bug.