Created attachment 246591 [details] lspci_vvv Hello, I recently bought an Acer Aspire Switch Alpha 12 (Model Number: SA5-271) 2-in-1 convertible computer. This computer has an Intel Skylake i5-6200U processor and a Lite-On CV1-8B256 SSD. I noticed that the kernel will intermittently fail to detect the SSD as /dev/sda and may be fixed by changing seemingly unrelated settings in the BIOS (such as clearing secure boot databases) or with a "dirty hack" in libahci.c (tested on Kernel 4.8.11, https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.8.11.tar.xz ). When the SSD is not detected, the kernel will print an alert saying "Gave Up waiting for root device. Alert! /dev/disk/by-uuid/ does not exist. Dropping to a shell." Typing "blkid" in the initramfs shell shows no devices either. Very frustrating since this can happen more than 50% of the times. When the SSD is detected the following relevant lines can be seen in the dmesg output: [ 1.347021] ahci 0000:00:17.0: version 3.0 [ 1.365569] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 3 ports 6 Gbps 0x7 impl SATA mode [ 1.367519] ahci 0000:00:17.0: flags: 64bit ncq pm led clo only pio slum part deso sadm sds apst [ 1.377574] scsi host0: ahci [ 1.379465] scsi host1: ahci [ 1.381373] scsi host2: ahci [ 1.383060] ata1: SATA max UDMA/133 abar m2048@0xb1648000 port 0xb1648100 irq 124 [ 1.384665] ata2: SATA max UDMA/133 abar m2048@0xb1648000 port 0xb1648180 irq 124 [ 1.386305] ata3: SATA max UDMA/133 abar m2048@0xb1648000 port 0xb1648200 irq 124 However, when the SSD is NOT detecting: [ 1.337065] ahci 0000:00:17.0: version 3.0 -> [ 1.343206] ahci 0000:00:17.0: implemented port map (0x7) contains more ports than nr_ports (2), using nr_ports -> [ 1.351165] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x0 impl SATA mode [ 1.352323] ahci 0000:00:17.0: flags: 64bit ncq pm led clo only pio slum part deso sadm sds apst [ 1.355960] scsi host0: ahci [ 1.357292] scsi host1: ahci -> [ 1.358405] ata1: DUMMY -> [ 1.359466] ata2: DUMMY One can note the differences in the marked lines in the dmesg output when the SSD is not detecting: 1) nr_ports becomes 2 instead of 3; 2) ATA1 and ATA2 are both DUMMY; and 3) The Capability registers give different numbers on the # of ports. Adding the following lines in the function "ahci_save_initial_config" in file libahci.c, line 453 seems to fix this for now: if ((cap & 0xC734FF00) == 0xC734FF00) { dev_info(dev, "Forcing CAP to 0xC734FF02 and port_map to 0x7!\n"); hpriv->saved_cap = cap = 0xC734FF02; hpriv->saved_port_map = port_map = 0x7; } The version I used is 4.8.11, ( https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.8.11.tar.xz ) What the code does is to force port_map to become 0x7 and saved_cap to become 0xC734FF02. Actually, when the SSD is detecting the cap register holds a value of 0xC734FF02 but when it fails, cap can be either 0xC734FF01 or 0xC734FF00. I would be greatly appreciative if you can enlighten me on those questions: - This fix of forcing values into the CAP register with magic values is very ad-hoc and superficial; there should be a better way of doing this. How should I continue digging into the cause of this? (for example should I find a way to dump the state changes in the controller, etc) - Is the current way a bad way of solving this; can it possibly damage the computer? Thanks! (PS: Sorry for sending this multiple times (I sent to the Mailing List and the Maintainer on the top of the source code and then realized I should come here to Bugzilla)
Same issue here, but my laptop has kingston ssd - the issue seems to be with the controller.
copr for fedora 25: https://copr.fedorainfracloud.org/coprs/damianatorrpm/acer_kernel/
To be able to boot up and the ssd to be detected you need to run a custom kernel. I have created such: https://copr.fedorainfracloud.org/coprs/damianatorrpm/acer_kernel/ and also I have created a custom Fedora .iso incorporating this kernel https://drive.google.com/drive/folders/0B_wtRVB2Z4pvWFpwVDUyTWlBcVU With that working the installation is possible but bootloader doesn't work. I have tried in UEFI/legacy mode grub (and using chroot after install trying to reinstall grub2). Tried rEFInd, systemd-boot and elilo as well). It is BIOS bug. So here the solution: How to install Fedora 25 (deletes Windows) 1) Download .iso with my custom kernel 2) Set boot mode in BIOS to legacy not UEFI 3) Make bootable usb stick from the iso you downloaded and boot from it. 4) Open Gnome disks and completely format the drive (not the partitions, with MBR!!! not GPT or it won't work) 5) Install Fedora. *Note: The custom kernel.rpm has set Epoch to 1 which means the kernel will never update back to the one from standard repo.
Looks like the BIOS is messing up. I can't think of workarounds other than forcing the port map (or CAP) for the affected machines. ahci already contains a bunch of machine-specific workarounds. Can you please create a patch to match your system and apply the necessary workaround? Thanks.
Created attachment 252401 [details] acer-kernel-ahci.patch
Thanks for the patch but you can't match the CAP value and override it. If you run "dmidecode" as root, it will print out a bunch of identification information. "Product Name" in "System Information" or "Base Board Information" is usually a good field to match. This is how other system-specific workarounds are applied too. If you have trouble creating a patch, please attach the dmidecode output. I can do the patch. Thanks!
Created attachment 252461 [details] patch for ahci.c using DMI Match
Created attachment 252471 [details] dmidecode output
Hi all, I created a patch using the DMI_MATCH routine but I'm not sure if I did it correctly so I attached the dmidecode output as well. Thanks!
Sui, generally looks good to me but can you please make the following changes? * Separate out it to its own function as other workarounds do. * Other info messages aren't capitalized. Maybe drop the capitalization here too? * Please add comments explaining what's going on and link back to this bz. Once the patch is updated, can you please format the patch according to Documentation/process/submitting-patches.rst. There's a lot in there but it'd basically look like Subject: ahci: ONE LINE DESCRIPTION PATCH DESCRIPTION. Link: http://LINK_TO_THIS_BUG Signed-off-by: YOUR_NAME <YOUR_EMAIL> --- ACTUAL PATCH Thanks!
Looks correct to me. If it triggers correctly on your machine, it should be fine.
Created attachment 253091 [details] proposed patch on January 24
I did the changes and formatted the patch and also added Damian as he has tested the patch. I found there some patches that are not in their individual functions (such as the ``MCP65 revision A1 and A2 can't do MSI'' one) but I still made the SA5-271 patch into its own function. Thanks!
Update: There is a BIOS version update here: (https://www.acer.com/ac/en/US/content/support-product/6806?b=1 ) After flashing 1.04 the solution doesn't seem to work anymore. The system seems to hang at boot with a different reason; the ATA1 channel is up and then down again. As a result I downgraded to 1.03 .
Correction: What I described in Comment #14 is a rare event, but it can happen with both BIOS versions 1.03 and 1.04. I did some more tests with version 1.04 and it seems the workaround can correctly trigger and the computer can successfully boot into the system most of the time. It seems the BIOS version is irrelevant to the occurrences of the rare event. I'll keep watching and see if there are any clues to the rare event. Sorry about the confusion !
I have yet only tested the original patch (not in own function). I am running BIOS 1.04 without any issues a week now or so.
(In reply to Widen-Damian Ivanov from comment #16) > I have yet only tested the original patch (not in own function). > I am running BIOS 1.04 without any issues a week now or so. Hi, I'm back and I think I found what happened in Comment #14. It looks it's caused by n_ports being set to 1 in this line in function ahci_init_one in file ahci.c : n_ports = max(ahci_nr_ports(hpriv->cap), fls(hpriv->port_map)); In this case, only ATA1 is probed; ATA2 is not probed. Then, system failed to find the SSD on ATA1, and decides there is no SSD. It seems changing hpriv->port_map and hpriv->cap before n_ports is set, so n_ports becomes 2 or 3, can fix this issue. (The current fix sets these two values after n_ports is set.) The problem when n_ports is set to 1 may be triggered by booting into Windows, and rebooting into Linux. I'm using BIOS 1.04 . I'll test this fix for some more time to confirm it works stably.
Created attachment 256241 [details] revised patch on May 6 The SA5-271 workaround is moved before setting n_ports. This seems to make the workaround more stable in a boot that immediately follows a rebooting from Windows.
Sui Chen, can you post the patch to linux-ide@vger.kernel.org w/ proper patch description and Signed-off-by and cc me? Thanks!