Bug 217114 - Tiger Lake SATA Controller not operating correctly, failing to populate partitions in /dev
Summary: Tiger Lake SATA Controller not operating correctly, failing to populate parti...
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-02 11:25 UTC by emmi
Modified: 2023-03-05 07:21 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.2.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci -vvv on kernel 6.1.12 ASUS Vivobook 15 X513EAN (28.72 KB, text/plain)
2023-03-05 07:15 UTC, Vitalii Solomonov
Details
lspci -vvv on kernel 6.2.1 ASUS Vivobook 15 X513EAN (28.72 KB, text/plain)
2023-03-05 07:15 UTC, Vitalii Solomonov
Details
dmesg on kernel 6.1.12 ASUS Vivobook 15 X513EAN (80.95 KB, text/plain)
2023-03-05 07:16 UTC, Vitalii Solomonov
Details
dmesg on kernel 6.2.1 ASUS Vivobook 15 X513EAN (76.26 KB, text/plain)
2023-03-05 07:16 UTC, Vitalii Solomonov
Details

Description emmi 2023-03-02 11:25:00 UTC
As per kernel problem found in https://bbs.archlinux.org/viewtopic.php?id=283906 ,

Commit 104ff59af73aba524e57ae0fef70121643ff270e seems to have broken Intel Tiger Lake SATA controllers in a way that prevents boot, as the sysroot partition will not be found. 

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=104ff59af73aba524e57ae0fef70121643ff270e
Comment 1 schwagsucks 2023-03-02 17:31:53 UTC
As some people in the reference arch forum post reported this seems to have started in 6.1.13.  6.1.12 loads as expected.  

The problem is the sata disks can not be recognized any longer which is why the reported sysroot partition can't be found.  

My primary disk is nvme and as long as I remove all sata references from my fstab I can boot but then can't mount the device partitions because the devices are not present in /dev.  

Any attempts to boot with a sata disk in fstab results in a boot failure with emergency shell.
Comment 2 schwagsucks 2023-03-02 19:31:28 UTC
I can provide any details required

My sata controller:
10000:e0:17.0 SATA controller: Intel Corporation Tiger Lake-LP SATA Controller (rev 20) (prog-if 01 [AHCI 1.0])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 146
	Region 0: Memory at 50100000 (32-bit, non-prefetchable) [size=8K]
	Region 1: Memory at 50102800 (32-bit, non-prefetchable) [size=256]
	Region 5: Memory at 50102000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee01000  Data: 0000
	Capabilities: [70] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
	Kernel driver in use: ahci
Comment 3 Damien Le Moal 2023-03-03 08:04:11 UTC
Is it possible for you to get and post the ata/ahci related messages during a bad boot ?
Comment 4 Damien Le Moal 2023-03-03 08:10:10 UTC
And can you try booting with libata.force=nolpm to check ?
Comment 5 emmi 2023-03-03 08:34:28 UTC
(In reply to Damien Le Moal from comment #4)
> And can you try booting with libata.force=nolpm to check ?

As per the forum thread attached, this does not correct the issue (most of the extraneous information is on pages 2 and 3 of that thread)


(In reply to Damien Le Moal from comment #3)
> Is it possible for you to get and post the ata/ahci related messages during
> a bad boot ?

Nope, no sysroot means no console and attempts to load a console prior to root mount fail for me (probably because of sulogin etc being restricted)
Comment 6 Damien Le Moal 2023-03-03 08:42:32 UTC
(In reply to emmi from comment #5)
> (In reply to Damien Le Moal from comment #4)
> > And can you try booting with libata.force=nolpm to check ?
> 
> As per the forum thread attached, this does not correct the issue (most of
> the extraneous information is on pages 2 and 3 of that thread)

Missed that. Will have a look.

> (In reply to Damien Le Moal from comment #3)
> > Is it possible for you to get and post the ata/ahci related messages during
> > a bad boot ?
> 
> Nope, no sysroot means no console and attempts to load a console prior to
> root mount fail for me (probably because of sulogin etc being restricted)

Can you use a serial console to capture the messages ?
Comment 7 emmi 2023-03-03 08:48:14 UTC
(In reply to Damien Le Moal from comment #6)
> (In reply to emmi from comment #5)
> > (In reply to Damien Le Moal from comment #4)
> > > And can you try booting with libata.force=nolpm to check ?
> > 
> > As per the forum thread attached, this does not correct the issue (most of
> > the extraneous information is on pages 2 and 3 of that thread)
> 
> Missed that. Will have a look.
> 
> > (In reply to Damien Le Moal from comment #3)
> > > Is it possible for you to get and post the ata/ahci related messages
> during
> > > a bad boot ?
> > 
> > Nope, no sysroot means no console and attempts to load a console prior to
> > root mount fail for me (probably because of sulogin etc being restricted)
> 
> Can you use a serial console to capture the messages ?

Personally I cannot without disassembling my laptop and likely soldering test pads, since its somewhere between ultrabook and a laptop, and thus doesnt have integrated serial connectivity.
Comment 8 Damien Le Moal 2023-03-03 08:51:09 UTC
ah. OK. This is a laptop... Too bad. These error messages would be really useful to come up with a better solution than reverting the patch causing the issue. May be a screen video ? (disable rhgb or any other graphic boot stuff and add earlycon kernel parameter. You should be able to see the boot messages & errors).
Comment 9 emmi 2023-03-03 09:07:53 UTC
(In reply to Damien Le Moal from comment #8)
> ah. OK. This is a laptop... Too bad. These error messages would be really
> useful to come up with a better solution than reverting the patch causing
> the issue. May be a screen video ? (disable rhgb or any other graphic boot
> stuff and add earlycon kernel parameter. You should be able to see the boot
> messages & errors).

I'm not currently able to as i'm not at home, but im sure some others would be able to provide that data...
Comment 10 Damien Le Moal 2023-03-03 09:21:55 UTC
I am going to send a revert to Linus & stable now. We can figure out how to correctly enable LPM for this adapter during the 6.3 cycle.
Comment 11 Damien Le Moal 2023-03-03 10:33:52 UTC
Revert sent. Probably will be picked up in 6.1.15.
Comment 12 emmi 2023-03-05 00:22:11 UTC
Revert accepted
Comment 13 Vitalii Solomonov 2023-03-05 07:15:21 UTC
Created attachment 303855 [details]
lspci -vvv on kernel 6.1.12 ASUS Vivobook 15 X513EAN
Comment 14 Vitalii Solomonov 2023-03-05 07:15:56 UTC
Created attachment 303856 [details]
lspci -vvv on kernel 6.2.1 ASUS Vivobook 15 X513EAN
Comment 15 Vitalii Solomonov 2023-03-05 07:16:34 UTC
Created attachment 303857 [details]
dmesg on kernel 6.1.12 ASUS Vivobook 15 X513EAN
Comment 16 Vitalii Solomonov 2023-03-05 07:16:51 UTC
Created attachment 303858 [details]
dmesg on kernel 6.2.1 ASUS Vivobook 15 X513EAN
Comment 17 Vitalii Solomonov 2023-03-05 07:21:28 UTC
I have an ASUS Vivobook 15 X513EAN laptop and can confirm that. I have / on nvme and /home on sda. 6.1.12 works fine, 6.2.0 - /dev/ is not populated with sda devices.
I'm attaching my full dmesg and lspci output for both kernels. What additional logs are needed?

Note You need to log in before you can comment on or make changes to this bug.