Bug 8935 - Kernel incorrectly chooses external drive as /dev/sda on approx. 70% of boots
Summary: Kernel incorrectly chooses external drive as /dev/sda on approx. 70% of boots
Status: REJECTED INVALID
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-24 16:11 UTC by Tristan Schmelcher
Modified: 2007-08-26 23:24 UTC (History)
0 users

See Also:
Kernel Version: 2.6.22
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
This is the output of "dmesg" from one of the boots when it made the wrong choice for /dev/sda. (7.56 KB, text/plain)
2007-08-24 16:13 UTC, Tristan Schmelcher
Details
Output of "dmesg" after a boot with the correct /dev/sda choice (25.83 KB, text/plain)
2007-08-26 21:16 UTC, Tristan Schmelcher
Details
This is the output of "dmesg" from one of the boots when it made the wrong choice for /dev/sda. (20.93 KB, text/plain)
2007-08-26 21:18 UTC, Tristan Schmelcher
Details

Description Tristan Schmelcher 2007-08-24 16:11:20 UTC
Most recent kernel where this bug did not occur: Unknown
Distribution: Debian
Hardware Environment: Dell XPS M1710 laptop with Intel Core 2 Duo
Software Environment: Debian Etch/Lenny 2.6.22-1-amd64
Problem Description: My laptop has an internal SATA drive and an external eSATA drive, connected via an ExpressCard-to-eSATA adapter. If the external drive is plugged in when I boot the machine then the kernel incorrectly chooses it as /dev/sda approximately 70% of the time, with the internal drive becoming /dev/sdb. It thus mounts the external drive to "/root" in the initramfs instead of the internal drive and dies when it can't find "init". If I instead connect it _after_ booting then it works as normal.

As an uninformed guess, perhaps the non-deterministic choice of /dev/sda stems from the fact that this is an SMP system, so there is a race to see which drive is discovered first?

Steps to reproduce:

1) Get a hardware setup similar to the one I've described.

2) Plug in the eSATA drive.

3) Reboot a lot.


The output of lspci for my system is:

$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 0298 (rev a1)
03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd Unknown device 0832
03:01.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 19)
03:01.2 System peripheral: Ricoh Co Ltd Unknown device 0843 (rev 01)
03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 0a)
03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 05)
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit Ethernet PCI Express (rev 02)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)
0d:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02)

I think the chip that does the ExpressCard slot is either the "Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)" or the "Ricoh Co Ltd Unknown device 0843". According to the parts' literature, the former uses a PCIe-style ExpressCard implementation, while the latter is USB-style. (I get very little performance gain compared to a USB connection for the same drive, so it might be the latter.)
Comment 1 Tristan Schmelcher 2007-08-24 16:13:12 UTC
Created attachment 12522 [details]
This is the output of "dmesg" from one of the boots when it made the wrong choice for /dev/sda.
Comment 2 Tristan Schmelcher 2007-08-24 16:17:14 UTC
Btw, the "70%" comes from 10 test boots that I did, 7 of which produced the bug.
Comment 3 Tejun Heo 2007-08-26 19:50:31 UTC
Please post dmesg from a successful boot.
Comment 4 Tristan Schmelcher 2007-08-26 21:16:31 UTC
Created attachment 12554 [details]
Output of "dmesg" after a boot with the correct /dev/sda choice
Comment 5 Tristan Schmelcher 2007-08-26 21:18:49 UTC
Created attachment 12555 [details]
This is the output of "dmesg" from one of the boots when it made the wrong choice for /dev/sda. 

I noticed that my original dmesg post had the beginning cut off. That seems to be a limitation of the dmesg binary found in the initrd. I did it again and this time used the dmesg binary from within my real /dev/sda, which got the whole thing. This is it.
Comment 6 Tristan Schmelcher 2007-08-26 21:21:14 UTC
(Getting the new attachments took 6 boots, 1 of which worked, so the new percentage is 12/16 = 75%.)
Comment 7 Tejun Heo 2007-08-26 22:06:56 UTC
Your problem is caused by indeterminate module loading order.  Sometimes ata_piix is loaded before ahci while at other times ahci is loaded first.  This of course results in different detection order and device name assignment.

* Please file a bug against your distro.  Kernel and drivers aren't the problem here.  Your distro's module loader (probably in initrd) is screwing things up.

* Use mount-by-label or UUID.
Comment 8 Tristan Schmelcher 2007-08-26 23:24:44 UTC
Thank you, I will communicate that to my distro. Sorry for the noise.

Note You need to log in before you can comment on or make changes to this bug.