Bug 106221 - Surface 3 sensor hub isn't enumerated under Linux
Summary: Surface 3 sensor hub isn't enumerated under Linux
Status: CLOSED INVALID
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Aaron Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-19 11:07 UTC by Bastien Nocera
Modified: 2015-11-16 02:13 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.3 rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
[PATCH] ACPICA: Namespace: Add support to allow OSPMs to invoke acpi_get_object_info() several times (9.69 KB, patch)
2015-10-29 07:13 UTC, Lv Zheng
Details | Diff
acpidump for surface 3 (325.45 KB, text/plain)
2015-11-09 08:05 UTC, Aaron Lu
Details
dmesg with device probe errors (130.12 KB, text/plain)
2015-11-12 10:13 UTC, Bastien Nocera
Details

Description Bastien Nocera 2015-10-19 11:07:36 UTC
attachment 187171 [details] contains the decompiled DSDT.

Oct 19 15:00:15 localhost.localdomain kernel: ACPI Error: No handler for Region [SNOP] (ffff88013f4b6048) [GenericSerialBus] (20150818/evregion-163)
Oct 19 15:00:15 localhost.localdomain kernel: ACPI Error: Region GenericSerialBus (ID=9) has no handler (20150818/exfldio-297)
Oct 19 15:00:15 localhost.localdomain kernel: ACPI Error: Method parse/execution failed [\_SB.PCI0.I2C4.SAM._SUB] (Node ffff88013f4b5e88), AE_NOT_EXIST (20150818/psparse-542)
Oct 19 15:00:15 localhost.localdomain kernel: ACPI Error: Method execution failed [\_SB.PCI0.I2C4.SAM._SUB] (Node ffff88013f4b5e88), AE_NOT_EXIST (20150818/uteval-103)
Comment 1 Aaron Lu 2015-10-21 02:25:06 UTC
IIRC, _SUB will be executed pretty early while the operation region handler for the GPIO controllers are not installed yet, so the error message is pretty much expected.

Do you have any functionality problem except the above error messages, regarding to I2C/GPIO?
Comment 2 Bastien Nocera 2015-10-21 08:24:01 UTC
(In reply to Aaron Lu from comment #1)
> IIRC, _SUB will be executed pretty early while the operation region handler
> for the GPIO controllers are not installed yet, so the error message is
> pretty much expected.
> 
> Do you have any functionality problem except the above error messages,
> regarding to I2C/GPIO?

The HID sensor hub on the end of it isn't getting probed, and isn't available. I can't see anything relevant in the dmesg.

Chen Yu should be able to confirm on the Surface 3.
Comment 3 Zhang Rui 2015-10-26 05:58:29 UTC
Lv will check if it's possible to move _SUB evaluation out of ACPICA.
And then, we will check how to handle this in Linux/ACPI.
Comment 4 Robert Moore 2015-10-27 14:19:08 UTC
I only see _SUB being executed in acpi_get_object_info, which is executed only on demand by a driver/etc.
Comment 5 Lv Zheng 2015-10-29 03:19:17 UTC
Hi, Bob

It seems we could allow OSPMs to specify what to be returned by acpi_get_object_info(), this need to be done in ACPICA before OSPMs proceed to re-order the evaluations.
I'll provide a mechanism to achieve this.

Thanks and best regards
-Lv
Comment 6 Robert Moore 2015-10-29 03:42:21 UTC
Wasn't support for _SUB introduced relatively recently?

I think that we must be very careful about changing interfaces and behaviors based upon a single machine. There is no way we can achieve stable software this way.
Comment 7 Robert Moore 2015-10-29 03:45:38 UTC
I think we should take a look at how Linux is using the object info interface. Perhaps it should simply not be calling it until all _HIDs and _ADRs have been executed, and all drivers and region handlers have been loaded.
Comment 8 Lv Zheng 2015-10-29 06:34:46 UTC
Maybe OSPMs need first to obtain the _HID/_CID, otherwise driver matching is not possible.
And if driver is not matched and started, _DEP cannot be handled.
So _SUB looks like a control method executed after resolving _DEP.
As a conclusion, it is likely OSPMs need to invoke this function multiple times.

Thanks and best regards
-Lv
Comment 9 Lv Zheng 2015-10-29 07:13:54 UTC
Created attachment 191451 [details]
[PATCH] ACPICA: Namespace: Add support to allow OSPMs to invoke acpi_get_object_info() several times

The enhanced acpi_get_object_info().
Someone else may work on top of this linuxized ACPICA commit to tune Linux ACPI device scan algorithms.

Thanks and best regards
-Lv
Comment 10 Bastien Nocera 2015-10-29 09:36:01 UTC
(In reply to Lv Zheng from comment #9)
> Created attachment 191451 [details]
> [PATCH] ACPICA: Namespace: Add support to allow OSPMs to invoke
> acpi_get_object_info() several times
> 
> The enhanced acpi_get_object_info().
> Someone else may work on top of this linuxized ACPICA commit to tune Linux
> ACPI device scan algorithms.

Which tree is this against? Seems it depends on code that hasn't landed in the linus tree yet.
Comment 11 Lv Zheng 2015-10-30 05:54:18 UTC
No, you shouldn't test this commit.
This commit is just a linuxized version an ACPICA enhancement.
There should still be someting to do in Linux side.
For example, Linux side acpi_get_object_info() invocations should be modified accordingly.
So there is still something need to be done in Linux side.

Thanks and best regards
-Lv
Comment 12 Bastien Nocera 2015-10-30 10:57:19 UTC
Noted, thanks.
Comment 13 Robert Moore 2015-10-30 17:31:41 UTC
We forgot to ask the basic question: Does this machine work properly under Windows?
Comment 14 Bastien Nocera 2015-10-30 17:58:01 UTC
(In reply to Robert Moore from comment #13)
> We forgot to ask the basic question: Does this machine work properly under
> Windows?

It certainly does. It's made by Microsoft, and it's not damaged in any way.

I know the sensor works because the display rotates when rotating the tablet, and the backlight brightness changes depending on the ambient light.
Comment 15 Robert Moore 2015-10-30 19:13:39 UTC
I'm starting to get the feeling that the "real" problem may be that Linux apparently does not attempt to use the _SUB method, regardless of the error message from acpi_get_object_info.

Whatever driver is involved with this should be invoking _SUB directly and using the results.

Also, it may be that in fact the problem you are seeing with the sensor hub may be completely unrelated to the error messages from ACPICA. I think Aaron was alluding to this.

I think this needs to be fully root-caused before we can proceed further.
Comment 16 Aaron Lu 2015-11-02 01:42:35 UTC
(In reply to Robert Moore from comment #15)
> I'm starting to get the feeling that the "real" problem may be that Linux
> apparently does not attempt to use the _SUB method, regardless of the error
> message from acpi_get_object_info.
> 
> Whatever driver is involved with this should be invoking _SUB directly and
> using the results.
> 
> Also, it may be that in fact the problem you are seeing with the sensor hub
> may be completely unrelated to the error messages from ACPICA. I think Aaron
> was alluding to this.

It looks like to be the case, we need to find out the real problem.
Comment 17 Lv Zheng 2015-11-09 05:58:49 UTC
Hi,

There is an ACPICA upstream fix around this.
https://github.com/acpica/acpica/commit/e474395
You could wait this to be merged by Linux.

Thanks
-Lv
Comment 18 Aaron Lu 2015-11-09 08:03:25 UTC
(In reply to Bastien Nocera from comment #2)
> The HID sensor hub on the end of it isn't getting probed, and isn't
> available. I can't see anything relevant in the dmesg.

Do you mean the MSHW0037, i.e. the NTRG SPI device node?

> 
> Chen Yu should be able to confirm on the Surface 3.

I have borrowed the Surface 3 from Yu.

With the patch from(force status to 0 instead of fail during scan time):
http://www.spinics.net/lists/linux-acpi/msg60480.html

And the patch of:
commit c103a10f690cc49054c52f493eeeff143d5f59e7
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Fri Oct 30 12:02:05 2015 +0200

    gpio / ACPI: Allow shared GPIO event to be read via operation region

I'm able to successfully add NTRG SPI device, but what then? something is supposed to start working? I don't know much about these sensor things.

Hello Mika,

The device selection number of the NTRG SPI device is 1, which is the same as the master->num_chipselect(which is hard coded for ACPI enumerated SPI controllers) and thus cause the spi_add_device failed for NTRG. I have blindly set the spi->chip_select to be one less than device_selection number which is from the resource descriptor to work around this, but I doubt if this is simply wrong?
Comment 19 Aaron Lu 2015-11-09 08:04:06 UTC
Again, forgot to add Mika...

Mika,
Please take a look at comment #18, thanks.
Comment 20 Aaron Lu 2015-11-09 08:05:32 UTC
Created attachment 192481 [details]
acpidump for surface 3
Comment 21 Mika Westerberg 2015-11-09 08:10:50 UTC
I doubt that we even have suitable driver for the SPI connected touchpanel on Surface3.
Comment 22 Aaron Lu 2015-11-09 08:27:36 UTC
(In reply to Mika Westerberg from comment #21)
> I doubt that we even have suitable driver for the SPI connected touchpanel
> on Surface3.

OK, thanks. Then it doesn't seem we can do much here.
Comment 23 Bastien Nocera 2015-11-09 11:48:36 UTC
(In reply to Aaron Lu from comment #22)
> (In reply to Mika Westerberg from comment #21)
> > I doubt that we even have suitable driver for the SPI connected touchpanel
> > on Surface3.
> 
> OK, thanks. Then it doesn't seem we can do much here.

Well, as long as the device's interface is actually visible, that means that a driver _could_ be written. I didn't expect you to write a driver if none was available, but simply to deal with this bug so that we could access it.

FWIW, this bug was about the _sensor_ hub device (the accelerometer, ambient light sensor, etc.) not about the touchscreen. The touchscreen bug is:
https://bugzilla.kernel.org/show_bug.cgi?id=104291
Comment 24 Aaron Lu 2015-11-09 12:42:06 UTC
(In reply to Bastien Nocera from comment #23)
> (In reply to Aaron Lu from comment #22)
> > (In reply to Mika Westerberg from comment #21)
> > > I doubt that we even have suitable driver for the SPI connected
> touchpanel
> > > on Surface3.
> > 
> > OK, thanks. Then it doesn't seem we can do much here.
> 
> Well, as long as the device's interface is actually visible, that means that
> a driver _could_ be written. I didn't expect you to write a driver if none
> was available, but simply to deal with this bug so that we could access it.

Fair enough, but I need to know which device node the sensor hub device is.

> 
> FWIW, this bug was about the _sensor_ hub device (the accelerometer, ambient
> light sensor, etc.) not about the touchscreen. The touchscreen bug is:
> https://bugzilla.kernel.org/show_bug.cgi?id=104291

I thought MSHW0027 is the device node, but taking a look at bug 104291, MSHW0027 is obviously the touchscreen, not the sensor hub. So do you know which device node it is?
Comment 25 Bastien Nocera 2015-11-09 14:14:18 UTC
(In reply to Aaron Lu from comment #24)
> (In reply to Bastien Nocera from comment #23)
> > (In reply to Aaron Lu from comment #22)
> > > (In reply to Mika Westerberg from comment #21)
> > > > I doubt that we even have suitable driver for the SPI connected
> touchpanel
> > > > on Surface3.
> > > 
> > > OK, thanks. Then it doesn't seem we can do much here.
> > 
> > Well, as long as the device's interface is actually visible, that means
> that
> > a driver _could_ be written. I didn't expect you to write a driver if none
> > was available, but simply to deal with this bug so that we could access it.
> 
> Fair enough, but I need to know which device node the sensor hub device is.

Looks like a standard HID I2C device, which should get enumerated once the bus on which it is becomes available (fingers crossed).

> > FWIW, this bug was about the _sensor_ hub device (the accelerometer,
> ambient
> > light sensor, etc.) not about the touchscreen. The touchscreen bug is:
> > https://bugzilla.kernel.org/show_bug.cgi?id=104291
> 
> I thought MSHW0027 is the device node, but taking a look at bug 104291,
> MSHW0027 is obviously the touchscreen, not the sensor hub. So do you know
> which device node it is?

WinAudit, running on that machine reports a few devices that could reference the sensor device:

HID\VEN_MSHW&DEV_0102&SUBSYS_MSHWTSAM&REV_100C15C&COL01\5&A9BD982&0&0000
HID\VEN_MSHW&DEV_0102&SUBSYS_MSHWTSAM&REV_100C15C&COL02\5&A9BD982&0&0001
HID\VEN_MSHW&DEV_0102&SUBSYS_MSHWTSAM&REV_100C15C&COL03\5&A9BD982&0&0002
HID\VEN_MSHW&DEV_0102&SUBSYS_MSHWTSAM&REV_100C15C&COL04\5&A9BD982&0&0003

A child of the "I2C HID Device":
ACPI\VEN_MSHW&DEV_0102&SUBSYS_MSHWTSAM&REV_100C15C\4&E940C90&0

which I believe is a standard I2C HID device, which the HID and then IIO subsystems should pick up on.
Comment 26 Bastien Nocera 2015-11-10 11:07:37 UTC
(In reply to Lv Zheng from comment #17)
> Hi,
> 
> There is an ACPICA upstream fix around this.
> https://github.com/acpica/acpica/commit/e474395
> You could wait this to be merged by Linux.

Is somebody working on reworking that patch to integrate it in the kernel?
Comment 27 Mika Westerberg 2015-11-10 15:01:41 UTC
Regarding the sensor-hub, I can see it enumerated on my Surface3:

[    3.653086] i2c_hid i2c-MSHW0102:00: Fetching the HID descriptor
[    3.654967] i2c_hid i2c-MSHW0102:00: __i2c_hid_command: cmd=01 00
[    3.660722] i2c_hid i2c-MSHW0102:00: HID Descriptor: 1e 00 00 01 8e 06 02 00 03 00 3f 00 04 00 3f 00 05 00 06 00 5e 04 bd 07 03 00 00 00 00 00
[    3.664608] i2c_hid i2c-MSHW0102:00: entering i2c_hid_parse
[    3.666552] i2c_hid i2c-MSHW0102:00: i2c_hid_hwreset
[    3.668363] i2c_hid i2c-MSHW0102:00: i2c_hid_set_power
[    3.670171] i2c_hid i2c-MSHW0102:00: __i2c_hid_command: cmd=05 00 00 08
[    3.671182] i2c_hid i2c-MSHW0102:00: input: 34 00 23 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    3.680029] i2c_hid i2c-MSHW0102:00: resetting...
[    3.681796] i2c_hid i2c-MSHW0102:00: __i2c_hid_command: cmd=05 00 00 01
[    3.688776] i2c_hid i2c-MSHW0102:00: __i2c_hid_command: waiting...
[    3.692758] i2c_hid i2c-MSHW0102:00: __i2c_hid_command: finished.

It also exposes the sensor devices:

# ls -1 /sys/bus/platform/devices/
...
HID-SENSOR-200001.1.auto
HID-SENSOR-200041.4.auto
HID-SENSOR-200073.2.auto
HID-SENSOR-200076.3.auto
HID-SENSOR-20008a.5.auto
HID-SENSOR-ff040001.6.auto
HID-SENSOR-ff040002.7.auto
HID-SENSOR-ff040050.8.auto
Comment 28 Bastien Nocera 2015-11-10 17:42:00 UTC
(In reply to Mika Westerberg from comment #27)
> Regarding the sensor-hub, I can see it enumerated on my Surface3:
<snip>
> It also exposes the sensor devices:
> 
> # ls -1 /sys/bus/platform/devices/
> ...
> HID-SENSOR-200001.1.auto
> HID-SENSOR-200041.4.auto
> HID-SENSOR-200073.2.auto
> HID-SENSOR-200076.3.auto
> HID-SENSOR-20008a.5.auto
> HID-SENSOR-ff040001.6.auto
> HID-SENSOR-ff040002.7.auto
> HID-SENSOR-ff040050.8.auto

*mind boggles*

I have a git master from about a week ago, and the 2 patches mentioned in comment 18. Am I missing something?
/me launches a newer build
Comment 29 Aaron Lu 2015-11-11 02:20:28 UTC
The two patches mentioned in comment #18 is for the NTRG device node, not the sensor hub.
Comment 30 Mika Westerberg 2015-11-11 10:34:26 UTC
I just tried with vanilla 4.3 and I can still see the sensor-hub. Do you have CONFIG_I2C_HID=y in your .config? That should give you the HID device. In addition to that you still need to enable CONFIG_HID_SENSOR_HUB=y and the individual sensor drivers.
Comment 31 Bastien Nocera 2015-11-11 16:37:29 UTC
(In reply to Mika Westerberg from comment #30)
> I just tried with vanilla 4.3 and I can still see the sensor-hub. Do you
> have CONFIG_I2C_HID=y in your .config?

It's there as a module.

> That should give you the HID device.
> In addition to that you still need to enable CONFIG_HID_SENSOR_HUB=y and the
> individual sensor drivers.

That's also there as a module.

Looks like the problem is related to the i2c_designware module (log copied by hand):
i2c_designware: probe of 808622C1:00 failed with error -2
and on for :01, :02, etc. until :06.

The Intel intel-lpss-acpi drivers are also built as modules, and didn't get autoloaded, and the dw-apb-uart driver (also instantiated through intel-lpss) failed to load:
dw-apb-uart 8086228A:00: clock rate not defined
dw-apb-uart: probe of 8086228A:00 failed with error -22
dw-apb-uart 8086228A:01: clock rate not defined
dw-apb-uart: probe of 8086228A:01 failed with error -22

Ideas? Should a few of those drivers be builtin? Do you want me to make the full .config available?
Comment 32 Aaron Lu 2015-11-12 03:03:42 UTC
(In reply to Bastien Nocera from comment #31)
> i2c_designware: probe of 808622C1:00 failed with error -2

should be -22? I don't see -ENOENT(-2) anywhere in designware drivers...
Please attach full dmesg instead.

> and on for :01, :02, etc. until :06.
> 
> The Intel intel-lpss-acpi drivers are also built as modules, and didn't get
> autoloaded, and the dw-apb-uart driver (also instantiated through
> intel-lpss) failed to load:
> dw-apb-uart 8086228A:00: clock rate not defined
> dw-apb-uart: probe of 8086228A:00 failed with error -22
> dw-apb-uart 8086228A:01: clock rate not defined
> dw-apb-uart: probe of 8086228A:01 failed with error -22
> 
> Ideas? Should a few of those drivers be builtin? Do you want me to make the
> full .config available?

It doesn't fail here either, but I never use module, so perhaps you can try built them in.
Comment 33 Mika Westerberg 2015-11-12 08:40:52 UTC
The LPSS on surface3 is the older one so you need to have CONFIG_X86_INTEL_LPSS=y to get the LPSS ACPI devices enumerated. intel-lpss-* are for Skylake and newer.
Comment 34 Bastien Nocera 2015-11-12 10:13:16 UTC
(In reply to Aaron Lu from comment #32)
> (In reply to Bastien Nocera from comment #31)
> > i2c_designware: probe of 808622C1:00 failed with error -2
> 
> should be -22? I don't see -ENOENT(-2) anywhere in designware drivers...
> Please attach full dmesg instead.

Not sure that's really going to help, because that's definitely -2 (I double-checked the code when I saw it meant "ENOENT").

I also get the nice:
dw_dmac: probe of INTL9C60:00 failed with error -2

> > and on for :01, :02, etc. until :06.
> > 
> > The Intel intel-lpss-acpi drivers are also built as modules, and didn't get
> > autoloaded, and the dw-apb-uart driver (also instantiated through
> > intel-lpss) failed to load:
> > dw-apb-uart 8086228A:00: clock rate not defined
> > dw-apb-uart: probe of 8086228A:00 failed with error -22
> > dw-apb-uart 8086228A:01: clock rate not defined
> > dw-apb-uart: probe of 8086228A:01 failed with error -22
> > 
> > Ideas? Should a few of those drivers be builtin? Do you want me to make the
> > full .config available?
> 
> It doesn't fail here either, but I never use module, so perhaps you can try
> built them in.

Possibly.

(In reply to Mika Westerberg from comment #33)
> The LPSS on surface3 is the older one so you need to have
> CONFIG_X86_INTEL_LPSS=y to get the LPSS ACPI devices enumerated.
> intel-lpss-* are for Skylake and newer.

It's already set to =y:
$ grep X86_INTEL_LPSS *
kernel-4.4.0-i686.config:CONFIG_X86_INTEL_LPSS=y
kernel-4.4.0-i686-PAE.config:CONFIG_X86_INTEL_LPSS=y
kernel-4.4.0-x86_64.config:CONFIG_X86_INTEL_LPSS=y
Comment 35 Bastien Nocera 2015-11-12 10:13:48 UTC
Created attachment 192871 [details]
dmesg with device probe errors
Comment 36 Mika Westerberg 2015-11-12 10:24:27 UTC
Bastien, can you also attach your .config?
Comment 37 Bastien Nocera 2015-11-12 10:31:00 UTC
(In reply to Mika Westerberg from comment #36)
> Bastien, can you also attach your .config?

That's the .config that's used on this device:
http://paste.fedoraproject.org/289537/47324184
Comment 38 Mika Westerberg 2015-11-12 11:32:23 UTC
I tried your .config (disabled first lots of unnecessary modules) and I can still get working sensor-hub.

In your dmesg there is:
[    0.239432] clk-lpt: probe of clk-lpt failed with error -17

That causes the whole LPSS to fail.

Is this vanilla 4.3 kernel or do you have any patches on top? If not can you please try with vanilla 4.3?
Comment 39 Bastien Nocera 2015-11-12 11:42:02 UTC
(In reply to Mika Westerberg from comment #38)
> I tried your .config (disabled first lots of unnecessary modules) and I can
> still get working sensor-hub.
> 
> In your dmesg there is:
> [    0.239432] clk-lpt: probe of clk-lpt failed with error -17
> 
> That causes the whole LPSS to fail.
> 
> Is this vanilla 4.3 kernel or do you have any patches on top? If not can you
> please try with vanilla 4.3?

You're completely right. I cherry-picked a bunch of patches from:
https://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-3.10/log/?h=valleyisland-io-4.0

You can see all the patches there:
http://paste.fedoraproject.org/289557/32831414

And, yes, there's a bunch of those that look like they will break things.

I'm rebuilding now. Sorry for the waste of time here :/
Comment 40 Bastien Nocera 2015-11-13 14:45:37 UTC
Without those downstream patches it's enumerated, through i2c-hid has problems parsing the reports. I'll report that to linux-input.

Note You need to log in before you can comment on or make changes to this bug.