Bug 59871

Summary: ata_bay access while undocked regression
Product: ACPI Reporter: Dirk Griesbach (spamthis)
Component: Config-HotplugAssignee: Aaron Lu (aaron.lu)
Status: CLOSED CODE_FIX    
Severity: normal CC: liuj97, rjw, tianyu.lan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: v3.10-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Register hotplug for ATA dock
Register hotplug for ATA dock and wait for SCSI hotplug work done
FSC S7110 acpidump
Register hotplug for ATA dock for all ATA ACPI handles and wait for SCSI hotplug work done when hot removal
Register hotplug for ATA dock for all ATA ACPI handles and wait for SCSI hotplug work done when hot removal
Register hotplug for ATA dock for all ATA ACPI handles and wait for SCSI hotplug work done when hot removal

Description Dirk Griesbach 2013-06-18 08:12:46 UTC
The FSC S7110 notebook contains an ata_bay/battery_bay. If the bay-CD-drive is
accessed while the bay is undocked, an error like:

| $ eject
| eject: unable to eject, last error: Inappropriate ioctl for device

will be prompted. The first time the undocked device is tried to be accessed,
the operation is stalled for some seconds and the following lines are written
to dmesg:

[  161.293085] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  161.297109] ata5.00: ST_FIRST: !(DRQ|ERR|DF)
[  161.301111] sr 4:0:0:0: CDB: 
[  161.301117] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
[  161.301147] ata5.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
[  161.301147]          res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
[  161.309410] ata5: soft resetting link
[  161.464312] ata5.00: NODEV after polling detection
[  161.464318] ata5.00: revalidation failed (errno=-2)
[  166.464119] ata5: soft resetting link
[  166.620311] ata5.00: NODEV after polling detection
[  166.620317] ata5.00: revalidation failed (errno=-2)
[  171.620119] ata5: soft resetting link
[  171.776295] ata5.00: NODEV after polling detection
[  171.776301] ata5.00: revalidation failed (errno=-2)
[  171.780381] ata5.00: disabled
[  171.780430] ata5: soft resetting link
[  171.936181] ata5: EH complete

Any further access to the device is instant and does not produce any more dmesg
output but only the "Inappropriate ioctl for device" message. Even if the
device is docked again, it can't be accessed but only the same error message
appears without any additional dmesg output.

A bisect indicates commit [30dcf76acc695cbd2fa919e294670fe9552e16e7] "libata:
migrate ACPI code over to new bindings" to be the culprit. Before this,
accessing the undocked device resulted in an "device not found" message and if
the device was docked again, it would work as exppected.
Comment 1 Lan Tianyu 2013-06-18 08:49:34 UTC
Please provide the output of acpidump. This seems not a v3.10-rc6 regression.

Aaron, you will be interested about this bug.
Comment 2 Aaron Lu 2013-06-18 08:53:36 UTC
We are discussing this problem right now in mailing list :-)

Hi Dirk,

Thanks for your bisect, I'll propose a patch to fix this later, stay tuned.
Comment 3 Aaron Lu 2013-06-18 09:13:53 UTC
Created attachment 105191 [details]
Register hotplug for ATA dock

Apply on top of v3.9/v3.10-rcx.
Comment 4 Dirk Griesbach 2013-06-18 10:57:51 UTC
(In reply to comment #3)
> Created an attachment (id=105191) [details]

With this patch (on v3.10-rc6), issuing "echo 1 > /sys/sys/devices/platform/dock.0/undock" results in the following dmesg output:

[  128.262106] ACPI: Device does not support D3cold
[  128.262117] ACPI: \_SB_.PCI0.PATA.PRIM.MAST: undocking
[  128.263923] cdrom: issuing MRW background format suspend
[  128.263966] scsi 4:0:0:0: Oops, 'acpi_handle' corrupt

The last line looks like trouble and indeed: The first undock command is working, the device is docked again and available. A second call to undock results in udev utilising the whole CPU.
Comment 5 Aaron Lu 2013-06-18 14:37:43 UTC
Does v3.5 have the same problem or no problem for v3.5 at all? Thanks.
Comment 6 Jiang Liu 2013-06-18 16:02:07 UTC
We working on solving this issue "[  128.263966] scsi 4:0:0:0: Oops, 'acpi_handle' corrupt", please refer to https://lkml.org/lkml/2013/6/14/428
Patch 2 and patch 3 tries to fix the warning.

I think the failure of second undocking is due to that we forget to unregister dock notification handler when undocking. So please help to unregister dock notification on hot-removal.
Comment 7 Dirk Griesbach 2013-06-18 16:25:48 UTC
(In reply to comment #5)
> Does v3.5 have the same problem or no problem for v3.5 at all? Thanks.

No problems at all. For the sake of completeness the dmesg output for v3.5:

1. echo 1 > undock:
[   73.487023] ata5.00: disabled
[   73.487054] ata5.00: detaching (SCSI 4:0:0:0)
[   73.487150] ACPI: \_SB_.PCI0.PATA.PRIM.MAST - undocking
[   73.488607] cdrom: issuing MRW background format suspend

2. actually removing the drive gives:
[  149.436183] ACPI: \_SB_.PCI0.PATA.PRIM.MAST - docking
[  149.436335] ACPI: Unable to dock!
Which is kind of correct as the removed bay drive cant be docked if it is not in the bay. :-)

3. the device is back in the bay:
[  197.400227] ACPI: \_SB_.PCI0.PATA.PRIM.MAST - docking
[  197.400437] ata5: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
[  197.404493] ata5: ACPI event
[  197.408575] ata5: soft resetting link
[  197.572648] ata5.00: ATAPI: MATSHITAUJ-841Db, 1.00, max UDMA/33
[  197.588520] ata5.00: configured for UDMA/33
[  197.593740] ata5: EH complete
[  197.599600] scsi 4:0:0:0: CD-ROM            MATSHITA UJ-841Db         1.00 PQ: 0 ANSI: 5
[  197.604315] sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
[  197.604542] sr 4:0:0:0: Attached scsi CD-ROM sr0
[  197.604796] sr 4:0:0:0: Attached scsi generic sg1 type 5

4. another undock:
[  257.395349] ata5.00: disabled
[  257.395381] ata5.00: detaching (SCSI 4:0:0:0)
[  257.395988] ACPI: \_SB_.PCI0.PATA.PRIM.MAST - undocking

Interspearsing access to the drive (e.g. eject) between No. 1-4 does not change the big picture nor dmesg output: If the device is docked and back the tray it is working.
Comment 8 Aaron Lu 2013-06-19 05:53:55 UTC
Hi Dirk,

Please attach the output of acpidump, thanks.

(In reply to comment #6)
> We working on solving this issue "[  128.263966] scsi 4:0:0:0: Oops,
> 'acpi_handle' corrupt", please refer to https://lkml.org/lkml/2013/6/14/428
> Patch 2 and patch 3 tries to fix the warning.

It looks to me, we have scheduled ATA port EH to handle hotplug, while ATA port EH will schedule the SCSI hotplug work and then return. So even we have waited for EH to finish, it doesn't mean the actual hot removal of SCSI device performed in SCSI hotplug work is done already. So we will need to wait for the SCSI hotplug task to finish too in the notification handler. The reason it doesn't have problem in pre-v3.6 kernel is, ACPI handle is not bound to SCSI device until the bisected commit. Will prepare a patch to do this later.

> 
> I think the failure of second undocking is due to that we forget to
> unregister
> dock notification handler when undocking. So please help to unregister dock
> notification on hot-removal.

I don't know exactly how ATA dock is supposed to work, but from the code, it seems to be necessary for the notification handler to be installed on the ATA port/device ACPI handle even when there is no real device docked in order to receive notifications, so should we unregister dock notification on hot-removal?

BTW, I suppose ATA dock is mainly for PATA devices? Since SATA support hotplug, the host controller knows when a new device is attached, no need to let ACPI tells it to do the probe thing.
Comment 9 Aaron Lu 2013-06-19 05:55:44 UTC
Created attachment 105311 [details]
Register hotplug for ATA dock and wait for SCSI hotplug work done

Apply on top of v3.10-rc6.
Comment 10 Dirk Griesbach 2013-06-19 06:58:53 UTC
Created attachment 105321 [details]
FSC S7110 acpidump
Comment 11 Aaron Lu 2013-06-19 08:45:36 UTC
Created attachment 105341 [details]
Register hotplug for ATA dock for all ATA ACPI handles and wait for SCSI hotplug work done when hot removal

Please test this one, the previous one has a bug in that if the notebook boots without the device in bay, notification won't be registered.
Comment 12 Dirk Griesbach 2013-06-19 09:55:19 UTC
(In reply to comment #11)
> Please test this one,

First attempt to undock, nothing fancy:
# echo 1 > /sys/devices/platform/dock.0/undock
[   58.742866] BUG: unable to handle kernel paging request at 003800d4
[   58.742987] IP: [<c1131317>] kobject_uevent_env+0x1c/0x36c
[   58.743093] *pdpt = 00000000374bc001 *pde = 0000000000000000 
[   58.743200] Oops: 0000 [#1] SMP 
[   58.743269] Modules linked in: cpufreq_stats loop fuse sg sr_mod cdrom psmouse ata_generic acpi_cpufreq serio_raw mperf ac sky2 processor i2c_i801 fujitsu_laptop battery ata_piix evdev ehci_pci ext4 crc16 mbcache jbd2 xts gf128mul dm_crypt dm_mod hid_generic usbhid hid sd_mod crc_t10dif i915 ahci libahci video thermal_sys i2c_algo_bit libata drm_kms_helper drm uhci_hcd ehci_hcd scsi_mod button usbcore i2c_core usb_common
[   58.744220] CPU: 0 PID: 3448 Comm: bash Not tainted 3.10.0-rc6+ #36
[   58.744322] Hardware name: FUJITSU 0000000000/FJNB1B5, BIOS Version 1.35  09/10/2008
[   58.744447] task: f4a6add0 ti: f712c000 task.ti: f712c000
[   58.744536] EIP: 0060:[<c1131317>] EFLAGS: 00010282 CPU: 0
[   58.744628] EIP is at kobject_uevent_env+0x1c/0x36c
[   58.744708] EAX: 003800c4 EBX: 003800c4 ECX: c134cd89 EDX: 00000002
[   58.744810] ESI: f712dea8 EDI: f8227bb2 EBP: f712de88 ESP: f712de54
[   58.744910]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   58.744998] CR0: 80050033 CR2: 003800d4 CR3: 374b6000 CR4: 000007f0
[   58.745099] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   58.745200] DR6: ffff0ff0 DR7: 00000400
[   58.745263] Stack:
[   58.745300]  f712dea0 00000002 c134cd89 f712debc f712deb3 ff0a0210 003800c4 003800c4
[   58.745482]  f712dea8 f8227bb2 003800c4 f712dea8 f8227bb2 f712dec8 f8227b97 f712dea8
[   58.745664]  00000014 f822dfe7 00000003 f712dea8 00000000 5f594142 4e455645 00333d54
[   58.745845] Call Trace:
[   58.745908]  [<f8227bb2>] ? ata_acpi_uevent.isra.10+0x73/0x73 [libata]
[   58.746029]  [<f8227bb2>] ? ata_acpi_uevent.isra.10+0x73/0x73 [libata]
[   58.746149]  [<f8227b97>] ? ata_acpi_uevent.isra.10+0x58/0x73 [libata]
[   58.746261]  [<c113581b>] ? sprintf+0x17/0x1a
[   58.746338]  [<c1171dde>] ? dock_event+0x94/0xcd
[   58.746419]  [<c117213d>] ? handle_eject_request+0x28/0xbe
[   58.746514]  [<c11721ed>] ? write_undock+0x1a/0x28
[   58.746558]  [<c11721d3>] ? handle_eject_request+0xbe/0xbe
[   58.746558]  [<c11be013>] ? dev_attr_store+0x13/0x20
[   58.746558]  [<c10fd897>] ? sysfs_write_file+0xa8/0xed
[   58.746558]  [<c10fd7ef>] ? sysfs_remove_files+0x1c/0x1c
[   58.746558]  [<c10bc708>] ? vfs_write+0x85/0xfe
[   58.746558]  [<c10bcac5>] ? SyS_write+0x3e/0x67
[   58.746558]  [<c1289985>] ? sysenter_do_call+0x12/0x26
[   58.746558] Code: eb 05 b8 f4 ff ff ff 8d 65 f4 5b 5e 5f 5d c3 55 89 e5 57 56 53 83 ec 28 89 4d cc 8b 0c 95 1c 18 2b c1 89 45 e4 89 55 d0 89 4d d4 <8b> 78 10 85 ff 75 11 8b 40 0c 85 c0 75 f2 b8 ea ff ff ff e9 30
[   58.746558] EIP: [<c1131317>] kobject_uevent_env+0x1c/0x36c SS:ESP 0068:f712de54
[   58.746558] CR2: 00000000003800d4
[   58.763603] ---[ end trace 0d5aa6ce0707cdc7 ]---
Comment 13 Aaron Lu 2013-06-19 13:25:47 UTC
Created attachment 105391 [details]
Register hotplug for ATA dock for all ATA ACPI handles and wait for SCSI hotplug work done when hot removal

Sorry, there is a typo in the previous patch, my bad. Hopefully, this one doesn't have any stupid mistakes left.
Comment 14 Dirk Griesbach 2013-06-19 15:11:56 UTC
(In reply to comment #13)
> Sorry, there is a typo in the previous patch, my bad. Hopefully, this one
> doesn't have any stupid mistakes left.

Looks OK. No Oops and undocking/docking is working.

Thanks!
Comment 15 Aaron Lu 2013-06-21 07:21:35 UTC
Created attachment 105581 [details]
Register hotplug for ATA dock for all ATA ACPI handles and wait for SCSI hotplug work done when hot removal

Minor update:
1 Install notification handlers on both PATA and SATA ACPI handles;
2 Coding style change

Please give it a test, thanks.
Comment 16 Dirk Griesbach 2013-06-21 14:58:17 UTC
(In reply to comment #15)
> Created an attachment (id=105581) [details]
> Please give it a test, thanks.

Works for me.
Comment 17 Aaron Lu 2013-06-27 02:02:08 UTC
commit 44521527be36172864e6e7a6fba4b66e9aa48e40
Author: Aaron Lu <aaron.lu@intel.com>
Date:   Thu Jun 20 09:38:34 2013 +0800

    libata-acpi: add back ACPI based hotplug functionality

entered Linus' tree.