Bug 16017

Summary: kernel does not reliable detect sata hard drives anymore
Product: IO/Storage Reporter: Juergen Kosel (juergen.kosel)
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: CLOSED INVALID    
Severity: normal CC: maciej.rutecki, rjw, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 15310    
Attachments: dmesg output if the problem occured

Description Juergen Kosel 2010-05-20 20:29:40 UTC
Created attachment 26468 [details]
dmesg output if the problem occured

Hello,

after the update to linux 2.6.34 (and also 2.6.34-rc6) the kernel fails sometimes to detect the sata hard drives.
With older kernel versions up to 2.6.33 it detects reliable the hard drives.
So it is an hardware related problem. But it is not caused by hardware.

Since the kernel succeeds in more than 50% of the boots, it becomes difficult to find the cause of the problem with git bisect.

Output of lspci:

00:00.0 RAM memory: nVidia Corporation MCP61 Memory Controller (rev a1)
00:01.0 ISA bridge: nVidia Corporation MCP61 LPC Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation MCP61 SMBus (rev a2)
00:01.2 RAM memory: nVidia Corporation MCP61 Memory Controller (rev a2)
00:02.0 USB Controller: nVidia Corporation MCP61 USB Controller (rev a3)
00:02.1 USB Controller: nVidia Corporation MCP61 USB Controller (rev a3)
00:04.0 PCI bridge: nVidia Corporation MCP61 PCI bridge (rev a1)
00:05.0 Audio device: nVidia Corporation MCP61 High Definition Audio (rev a2)
00:06.0 IDE interface: nVidia Corporation MCP61 IDE (rev a2)
00:07.0 Bridge: nVidia Corporation MCP61 Ethernet (rev a2)
00:08.0 IDE interface: nVidia Corporation MCP61 SATA Controller (rev a2)
00:09.0 PCI bridge: nVidia Corporation MCP61 PCI Express bridge (rev a2)
00:0b.0 PCI bridge: nVidia Corporation MCP61 PCI Express bridge (rev a2)
00:0c.0 PCI bridge: nVidia Corporation MCP61 PCI Express bridge (rev a2)
00:0d.0 VGA compatible controller: nVidia Corporation C61 [GeForce 6150SE nForce 430] (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
02:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9400 GT] (rev a1)

Output of scripts/ver_linux 
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux acer-ilzleite 2.6.34 #5 SMP Mon May 17 19:33:13 CEST 2010 x86_64 GNU/Linux
 
Gnu C                  4.4.4
Gnu make               3.81
binutils               2.20.1
util-linux             scripts/ver_linux: 23: fdformat: not found
mount                  support
module-init-tools      found
Linux C Library        2.10.2
Dynamic linker (ldd)   2.10.2
Procps                 3.2.8
Console-tools          0.2.3
Sh-utils               8.5
Modules Loaded         ppdev lp autofs4 vboxnetadp vboxnetflt vboxdrv tun powernow_k8 cpufreq_conservative cpufreq_userspace cpufreq_stats cpufreq_powersave nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc nls_utf8 nls_cp437 vfat fat fuse snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd nvidia soundcore snd_page_alloc i2c_nforce2 parport_pc tpm_tis edac_core agpgart tpm i2c_core pcspkr k8temp edac_mce_amd parport tpm_bios evdev processor ext3 jbd mbcache ide_cd_mod cdrom sd_mod crc_t10dif usbhid hid usb_storage ata_generic sata_nv libata scsi_mod ohci_hcd fan ehci_hcd ide_pci_generic floppy forcedeth usbcore thermal nls_base thermal_sys amd74xx ide_core button

Output of dmesg in good case (problem doesn't occur):
[    2.476413] SCSI subsystem initialized
[    2.495964] ohci_hcd 0000:00:02.0: setting latency timer to 64
[    2.495967] ohci_hcd 0000:00:02.0: OHCI Host Controller
[    2.500495] ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
[    2.505165] ohci_hcd 0000:00:02.0: irq 20, io mem 0xfe02f000
[    2.505230] libata version 3.00 loaded.
[    2.566019] usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
[    2.570625] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.575239] usb usb2: Product: OHCI Host Controller
[    2.579848] usb usb2: Manufacturer: Linux 2.6.34 ohci_hcd
[    2.584473] usb usb2: SerialNumber: 0000:00:02.0
[    2.589155] hub 2-0:1.0: USB hub found
[    2.593674] hub 2-0:1.0: 8 ports detected
[    2.598263] sata_nv 0000:00:08.0: version 3.5
[    2.598630] ACPI: PCI Interrupt Link [APSI] enabled at IRQ 23
[    2.603154]   alloc irq_desc for 23 on node 0
[    2.603156]   alloc kstat_irqs on node 0
[    2.603166] sata_nv 0000:00:08.0: PCI INT A -> Link[APSI] -> GSI 23 (level, low) -> IRQ 23
[    2.608667] sata_nv 0000:00:08.0: setting latency timer to 64
[    2.608752] scsi0 : sata_nv
[    2.613451] scsi1 : sata_nv
[    2.618089] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd800 irq 23
[    2.622609] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd808 irq 23
[    2.816017] usb 1-5: new high speed USB device using ehci_hcd and address 3
[    3.042262] usb 1-5: New USB device found, idVendor=0bda, idProduct=0103
[    3.046645] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    3.051036] usb 1-5: Product: USB 2.0 Card Chip
[    3.055296] usb 1-5: Manufacturer: Generic
[    3.059404] usb 1-5: SerialNumber: 050610014271000001
[    3.071766] Initializing USB Mass Storage driver...
[    3.075934] scsi2 : usb-storage 1-5:1.0
[    3.080080] usbcore: registered new interface driver usb-storage
[    3.084144] USB Mass Storage support registered.
[    3.092537] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.104749] ata1.00: ATA-7: HDT722525DLA380, V44OA80A, max UDMA/133
[    3.108889] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    3.128265] ata1.00: configured for UDMA/133
[    3.132450] scsi 0:0:0:0: Direct-Access     ATA      HDT722525DLA380  V44O PQ: 0 ANSI: 5
[    3.296012] usb 2-4: new low speed USB device using ohci_hcd and address 2
[    3.512026] usb 2-4: New USB device found, idVendor=046d, idProduct=c016
[    3.516296] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    3.520568] usb 2-4: Product: Optical USB Mouse
[    3.524814] usb 2-4: Manufacturer: Logitech
[    3.604041] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.616801] ata2.00: ATA-8: Hitachi HDT721010SLA360, ST6OA31B, max UDMA/133
[    3.621081] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    3.640303] ata2.00: configured for UDMA/133
[    3.644665] scsi 1:0:0:0: Direct-Access     ATA      Hitachi HDT72101 ST6O PQ: 0 ANSI: 5
[    3.654523] usbcore: registered new interface driver hiddev
[    3.657849] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
[    3.657894] sd 0:0:0:0: [sda] Write Protect is off
[    3.657896] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    3.657915] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.658034]  sda:
[    3.659635] sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[    3.659676] sd 1:0:0:0: [sdb] Write Protect is off
[    3.659679] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    3.659696] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.659809]  sdb: sda1 sda2 sda3 < sdb1 sdb2 sdb3 sdb4 < sdb5 sda5 sdb6 sda6
[    3.694251] input: Logitech Optical USB Mouse as /devices/pci0000:00/0000:00:02.0/usb2/2-4/2-4:1.0/input/input3
[    3.700373]  sdb7 sda7
[    3.703130] generic-usb 0003:046D:C016.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech Optical USB Mouse] on usb-0000:00:02.0-4/input0
[    3.710985]  sda8 >
[    3.716934] usbcore: registered new interface driver usbhid
[    3.717403] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.726300]  sdb8
[    3.726308] usbhid: USB HID core driver
[    3.745496]  sdb9 >
[    3.750504] sd 1:0:0:0: [sdb] Attached SCSI disk
[    3.759081] ide-cd driver 5.00
[    3.764910] ide-cd: hda: ATAPI 48X DVD-ROM DVD-R/RAM CD-R/RW drive, 2048kB Cache
[    3.769596] Uniform CD-ROM driver Revision: 3.20
[    3.860526] usb 2-6: new low speed USB device using ohci_hcd and address 3
[    4.080029] usb 2-6: New USB device found, idVendor=147a, idProduct=e00d
[    4.084705] usb 2-6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    4.089384] usb 2-6: Product: Acer IR Receiver
[    4.094009] usb 2-6: Manufacturer: Acer IR Receiver
[    4.111095] input: Acer IR Receiver Acer IR Receiver as /devices/pci0000:00/0000:00:02.0/usb2/2-6/2-6:1.0/input/input4
[    4.115927] generic-usb 0003:147A:E00D.0002: input,hidraw1: USB HID v1.10 Device [Acer IR Receiver Acer IR Receiver] on usb-0000:00:02.0-6/input0
[    4.129293] scsi 2:0:0:0: Direct-Access     Generic  2.0 Reader-CF    1.00 PQ: 0 ANSI: 0 CCS
[    4.164790] scsi 2:0:0:1: Direct-Access     Generic  2.0 Reader-SM/xD 1.00 PQ: 0 ANSI: 0 CCS
[    4.200297] scsi 2:0:0:2: Direct-Access     Generic  2.0 Reader-SD    1.00 PQ: 0 ANSI: 0 CCS
[    4.235779] scsi 2:0:0:3: Direct-Access     Generic  2.0 Reader-MS    1.00 PQ: 0 ANSI: 0 CCS
[    4.244412] sd 2:0:0:0: [sdc] Attached SCSI removable disk
[    4.247154] sd 2:0:0:1: [sdd] Attached SCSI removable disk
[    4.253952] sd 2:0:0:2: [sde] Attached SCSI removable disk
[    4.263659] sd 2:0:0:3: [sdf] Attached SCSI removable disk
[    4.539699] PM: Starting manual resume from disk
[    4.572911] kjournald starting.  Commit interval 5 seconds
[    4.572922] EXT3-fs (sda7): mounted filesystem with ordered data mode
[    6.767181] udev: starting version 154
[    7.715774] k8temp 0000:00:18.3: Temperature readouts might be wrong - check erratum #141
[    7.740942] input: PC Speaker as /devices/platform/pcspkr/input/input5
[    7.925894] Linux agpgart interface v0.103
[    7.936740] EDAC MC: Ver: 2.1.0 May 17 2010
[    8.055098] EDAC amd64_edac:  Ver: 3.3.0 May 17 2010
[    8.056263] parport_pc 00:09: reported by Plug and Play ACPI
[    8.056307] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[    8.078386] i2c i2c-0: nForce2 SMBus adapter at 0x1c00
[    8.086002] i2c i2c-1: nForce2 SMBus adapter at 0xf400
[    8.093534] EDAC amd64: This node reports that Memory ECC is currently disabled, set F3x44[22] (0000:00:18.3).
[    8.100990] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[    8.100992]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
[    8.100993]  (Note that use of the override may cause unknown side effects.)
[    8.123412] amd64_edac: probe of 0000:00:18.2 failed with error -22
[    8.964544] nvidia: module license 'NVIDIA' taints kernel.
[    8.971968] Disabling lock debugging due to kernel taint
[    9.547176] nvidia 0000:00:0d.0: enabling device (0000 -> 0002)
[    9.554707] ACPI: PCI Interrupt Link [AIGP] enabled at IRQ 22
[    9.561930] nvidia 0000:00:0d.0: PCI INT A -> Link[AIGP] -> GSI 22 (level, low) -> IRQ 22
[    9.569283] nvidia 0000:00:0d.0: setting latency timer to 64
[    9.569286] vgaarb: device changed decodes: PCI:0000:00:0d.0,olddecodes=io+mem,decodes=none:owns=none
[    9.577322] ACPI: PCI Interrupt Link [APC8] enabled at IRQ 16
[    9.584574]   alloc irq_desc for 16 on node 0
[    9.584576]   alloc kstat_irqs on node 0
[    9.584587] nvidia 0000:02:00.0: PCI INT A -> Link[APC8] -> GSI 16 (level, low) -> IRQ 16
[    9.592068] nvidia 0000:02:00.0: setting latency timer to 64
[    9.592071] vgaarb: device changed decodes: PCI:0000:02:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
[    9.599774] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  195.36.24  Thu Apr 22 19:10:14 PDT 2010
[   10.241555] ACPI: PCI Interrupt Link [AAZA] enabled at IRQ 21
[   10.248782] HDA Intel 0000:00:05.0: PCI INT B -> Link[AAZA] -> GSI 21 (level, low) -> IRQ 21
[   10.256071] hda_intel: Disable MSI for Nvidia chipset
[   10.263404] HDA Intel 0000:00:05.0: setting latency timer to 64
[   12.215545] Adding 20972820k swap on /dev/sdb6.  Priority:-1 extents:1 across:20972820k 
[   12.249719] Adding 1461876k swap on /dev/sda6.  Priority:-2 extents:1 across:1461876k 
[   12.480705] EXT3-fs (sda7): using internal journal


Greetings
  Juergen
Comment 1 Juergen Kosel 2010-05-20 20:55:19 UTC
Hello,

after rereading the dmesg output, I believe that the problem is caused by dual core race condition:
Sometimes the USB finds faster the card reader than sata finds the hard drives.
This leads to an unexpected enumaration of had drive partitions and therefore the system fails to mount the partitions.

Is there a recommended solution for this problem?


Greetings
  Juergen
Comment 2 Tejun Heo 2010-05-21 13:19:13 UTC
Oh I see.  Please use LABEL or UUID based mounts like any modern linux distro does.  :-)

Resolving as INVALID.  Thanks.