Bug 55611

Summary: Sony VAIO VPCZ23A4R: PCI bus is not rescanned on docking/undocking
Product: ACPI Reporter: Alexander E. Patrakov (patrakov)
Component: Config-HotplugAssignee: Aaron Lu (aaron.lu)
Status: CLOSED INVALID    
Severity: normal CC: aaron.lu, ying.huang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.8.2 Subsystem:
Regression: No Bisected commit-id:
Attachments: Here is what the boot WITH the dock looks like
Devices in the dock
Log spam due to docking and rescanning the PCI bus
PCI bus scan results after hot-plugging the dock
acpidump (from pmtools) output
radeon-related panic photo
Add some debug statement to dsdt table
dmesg output with debug (kernel 3.8.2)
dmesg output with debug (kernel 3.9-rc5)
Call _Q07 in _DCK on dock

Description Alexander E. Patrakov 2013-03-22 16:34:19 UTC
I have a Sony VAIO VPCZ23A4R laptop. It comes with a docking station that contains some PCI Express devices: the second video card, the second Ethernet controller, a Marvell ATA controller with a BluRay writer attached, and some USB controllers.

If I boot the laptop without the docking station and then connect it, the following bugs occur:

1. When docking, the kernel does not understand that it has to rescan the PCI bus.
2. When told to rescan (echo 1 > /sys/class/pci_bus/0000\:00/rescan) it emits a lot of warnings and sometimes panics due to the radeon driver.
3. When undocking, the kernel does not understand that it needs to forget about the PCI devices in the dock station. In fact, undocking fails.

I will attach logs to this bug.
Comment 1 Alexander E. Patrakov 2013-03-22 16:35:17 UTC
Created attachment 95971 [details]
Here is what the boot WITH the dock looks like
Comment 2 Alexander E. Patrakov 2013-03-22 16:35:45 UTC
Created attachment 95981 [details]
Devices in the dock
Comment 3 Alexander E. Patrakov 2013-03-22 16:37:22 UTC
Created attachment 95991 [details]
Log spam due to docking and rescanning the PCI bus
Comment 4 Alexander E. Patrakov 2013-03-22 16:39:47 UTC
Created attachment 96001 [details]
PCI bus scan results after hot-plugging the dock

PCI bus scan results when hot-plugging the dock differ from the results when booting with the dock already connected. Sometimes this even makes the ethernet controller appear as enp17s0 instead of enp25s0 (could not capture the dmesg of this).
Comment 5 Alexander E. Patrakov 2013-03-22 16:46:00 UTC
Created attachment 96011 [details]
acpidump (from pmtools) output
Comment 6 Alexander E. Patrakov 2013-03-22 16:49:57 UTC
Created attachment 96021 [details]
radeon-related panic photo
Comment 7 Alexander E. Patrakov 2013-04-06 13:47:27 UTC
With linux-3.9-rc5 and CONFIG_PCI_REALLOC_ENABLE_AUTO=y, the kernel does not see devices in the dock after hot-plugging it and attempting to rescan.
Comment 8 Aaron Lu 2013-04-09 04:57:53 UTC
Looks like a pci problem? Is there a working kernel?
Thanks.
Comment 9 Alexander E. Patrakov 2013-04-09 05:19:10 UTC
There is no fully working kernel:

3.8.2 sees the devices after being told manually to rescan the PCI bus, but warns a lot and may panic during the rescan

3.9-rc5 does not see the devices even after being told manually to rescan the PCI bus, and thus does not warn or panic

old kernels behave like 3.8.2

Should I attempt to bisect what change caused the 3.9-rc5 kernel to ignore the manual PCI rescan request?

As for "Looks like a pci problem?" - I think there are multiple problems here, a sony-laptop or ACPI problem, a PCI problem, and a radeon problem.

Sony-laptop or ACPI problem: why doesn't the kernel rescan the PCI bus automatically after docking/undocking?

PCI problem: "BAR 14: can't assign mem (size 0x10400000)" and similar messages after manual rescanning.

Radeon problem: "radeon 0000:16:00.0: Fatal error during GPU init" and panic.

What other info is needed?
Comment 10 Aaron Lu 2013-04-09 05:43:16 UTC
Hi Alex,

I agree with you, and you have explained very clearly, thanks.

I think we can focus on the first problem here:

Sony-laptop or ACPI problem: why doesn't the kernel rescan the PCI bus
automatically after docking/undocking?

This is what the dock module should handle, it may be a dock driver problem or a problem of the asl code in ACPI table.

For other problems(pci and radeon), you will need to file separate bugs to the appropriate categories.
Comment 11 Alexander E. Patrakov 2013-04-09 05:54:05 UTC
OK, let's focus on the dock problem here. I will file separate bugs for other issues when I return from work.
Comment 12 Aaron Lu 2013-04-11 07:16:04 UTC
Created attachment 98061 [details]
Add some debug statement to dsdt table

Hi,

I've prepared a customized DSDT, which I've placed some debug statement, let's see what happened when you dock the computer after boot.

Please use this kernel command line param when boot: acpi.aml_debug_output=1.
Comment 13 Aaron Lu 2013-04-11 07:26:22 UTC
BTW, there are two ways to override dsdt:
1 Documentation/acpi/initrd_table_override.txt, override dsdt through initrd.
2 https://lesswatts.org/projects/acpi/overridingDSDT.php, tells how to build the new dsdt directly into the kernel.

If you are going to use method 1, please keep a copy of the original initrd file, as I do not hope a mistake made in the attached dsdt makes your system unusable.
Comment 14 Alexander E. Patrakov 2013-04-11 15:58:53 UTC
Created attachment 98171 [details]
dmesg output with debug (kernel 3.8.2)

Thank you for detailed instructions. I hope that this dmesg output contains what you asked for.
Comment 15 Alexander E. Patrakov 2013-04-11 16:00:14 UTC
Created attachment 98181 [details]
dmesg output with debug (kernel 3.9-rc5)

Just in case, here is the same debug output from 3.9-rc5.
Comment 16 Alexander E. Patrakov 2013-04-11 16:02:38 UTC
Forgot to say: both dmesgs contain one dock attempt (by connecting the dock cable) and one undock attempt (by pressing the "undock" button on the cable but not disconeccting the cable). Apparently, on undocking, the laptop wants to redock (maybe just to verify success) and fails.
Comment 17 Aaron Lu 2013-04-12 03:10:26 UTC
Thanks for the test.

So on dock, the pci bridge 0000:00:1c.6 doesn't get notified by the BIOS. I assume this is a BIOS bug, and can do a workaround in the dsdt table.

And on undock, the dock device is notified again, so it tried to dock again, but it found the dock is no longer there, so it prints "Unable to dock". This error message is not a big deal, it doesn't cause any problem.
Comment 18 Aaron Lu 2013-04-12 03:28:43 UTC
Created attachment 98311 [details]
Call _Q07 in _DCK on dock

Please test this dsdt table, hopefully, it will automatically rescan.
Comment 19 Alexander E. Patrakov 2013-04-12 03:55:48 UTC
On 3.9-rc5, your hacked DSDT does not help:

[   80.911787] ACPI: \_SB_.DOCK: docking
[   80.911807] [ACPI Debug]  String [0x04] "_DCK"
[   80.911836] [ACPI Debug]  String [0x04] "DSTS"
[   80.911853] [ACPI Debug]  Integer 0x00000001
[   80.912004] [ACPI Debug]  String [0x0C] "_Q07 in _DCK"
[   80.912110] [ACPI Debug]  String [0x04] "_Q07"
[   80.912122] [ACPI Debug]  String [0x04] "DSTS"
[   80.912143] [ACPI Debug]  Integer 0x00000001
[   80.912165] [ACPI Debug]  String [0x31] "Notify LPMB 0x01, DOCK 0x00, RP07 0x00, PCI0 0x01"
[   80.914950] _handle_hotplug_event_root: Device check notify on \_SB_.PCI0

This helps, but the first line may be redundant (will check later):

echo 1 > /sys/class/pci_bus/0000\:08/rescan
echo 1 > /sys/class/pci_bus/0000\:05/rescan
echo 1 > /sys/class/pci_bus/0000\:08/rescan

There are "can't assign mem" and "can't assign io" messages, but we agreed that they should be in a separate bug that I have not filed yet.

Good news: with 3.9-rc5, there is no radeon panic, the card is just non-functional when docking.
Comment 20 Alexander E. Patrakov 2013-04-12 03:59:03 UTC
With 3.9-rc5 and your DSDT, on undocking after a manual PCI rescan, there are many messages like this:

[  559.193225] ACPI: Device does not support D3cold
[  559.193314] ACPI: Device does not support D3cold
[  559.193455] ACPI: Device does not support D3cold
[  559.193557] ACPI: Device does not support D3cold
[  559.193688] ACPI: Device does not support D3cold
[  559.193814] ACPI: Device does not support D3cold
[  559.193942] ACPI: Device does not support D3cold
[  559.194092] ACPI: Device does not support D3cold

The end result is that the PCI devices in the dock don't go away, they just return something like this in lspci -v:

10:02.0 PCI bridge: Intel Corporation Device 151b (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel driver in use: pcieport
Comment 21 Alexander E. Patrakov 2013-04-12 04:04:30 UTC
With 3.8.2, same result: no automatic rescan, one needs to:

echo 1 > /sys/class/pci_bus/0000\:05/rescan
echo 1 > /sys/class/pci_bus/0000\:08/rescan
Comment 22 Aaron Lu 2013-04-12 04:56:33 UTC
(In reply to comment #19)
> On 3.9-rc5, your hacked DSDT does not help:
> 
> [   80.911787] ACPI: \_SB_.DOCK: docking
> [   80.911807] [ACPI Debug]  String [0x04] "_DCK"
> [   80.911836] [ACPI Debug]  String [0x04] "DSTS"
> [   80.911853] [ACPI Debug]  Integer 0x00000001
> [   80.912004] [ACPI Debug]  String [0x0C] "_Q07 in _DCK"
> [   80.912110] [ACPI Debug]  String [0x04] "_Q07"
> [   80.912122] [ACPI Debug]  String [0x04] "DSTS"
> [   80.912143] [ACPI Debug]  Integer 0x00000001
> [   80.912165] [ACPI Debug]  String [0x31] "Notify LPMB 0x01, DOCK 0x00, RP07
> 0x00, PCI0 0x01"
> [   80.914950] _handle_hotplug_event_root: Device check notify on \_SB_.PCI0

These message shows that on dock, the pci bridge 0000:00:1c.6 and the pci root bridge are all notified a BUS_CHECK, and my understanding is that, the handler for such a notification should rescan the whole tree starting from it. PCI bus 8-20 belongs to pci bridge 0000:00:1c.6. I'll need to check the handler code, see what it does on such a notification, but that belongs to PCI I think.
Comment 23 Aaron Lu 2013-04-12 07:14:40 UTC
Please add acpiphp.debug=1 to the kernel command line, together with the hack dsdt, and attach the dmesg after you dock/undock, thanks.
Comment 24 Alexander E. Patrakov 2013-04-12 07:40:53 UTC
Sorry for being stupid. The acpiphp module was not loaded. However, even with it being loaded manually, docking does not work as expected. This may invalidate some earlier findings.

Dmesg will be attached soon, I am going to rebuild the kernel, with this driver as a non-module.
Comment 25 Alexander E. Patrakov 2013-04-12 08:12:34 UTC
OK, so the original bug report was invalid, because the acpiphp driver was not loaded. All dmesgs that you asked for are attached to the (hopefully valid) bug #56501.
Comment 26 Alexander E. Patrakov 2013-04-12 08:15:53 UTC
Just for clarity, the "invalid" status is only about the "does not rescan PCI bus" bug on 3.8.2. The same "does not rescan" bug is valid on 3.9-rc5 (moved to bug #56501). The "can't assign me / io" bug and the radeon bug are valid as of 3.8.2, but not reported yet. Will to that later today.