Most recent kernel where this bug did not occur: Distribution: Ubuntu Feisty Hardware Environment: Gigabyte 945GM-S2 motherboard Software Environment: Linux version 2.6.20-15-generic Problem Description: Recent Kernel version doesn't find any hard drive on my SATA controler (ICH7 chipset). The 2.6.15 kernel finds the right hard drive and works perfectly, but every kernel version higher I've tried so far doesn't work : either it doesn't find any hard drive (no sd* device found in /etc), either (for Egdy) it does find the hard drive (with the partition), any attempt to write on it result at a system bug. I've tried several Linux Live CD (Egdy, Feisty, Parted Magic, Gutsy alpha 4, Mandriva). They all have recent kernel (ie, higher than 2.6.15), and the hard drive was not found. So for the moment, I'm using the kernel 2.6.15 (from Dapper) on Ubuntu Feisty and this works great. I've tried to compile my own kernel from kernel.org, but this didn't worked (root device not found at startup, perhaps a bad config). Please refere to Bug repport on Launchpad for complete dmesg + lspci and system info (Bug ID: 131696) : https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131696 Steps to reproduce: - Start a recent Ubuntu Live CD (kernel 2.6.20 for Feisty) on a Gigabyte 945GM-S2 motherboard mothboard. - The connected SATA hard drive is not found by the ata_piix driver. Many thanks for your support, please tell me if you need more information for this issue.
Please post full dmesg from the successful boot and the result of 'hdparm -I /dev/sdX' where sdX is the device which fails detection on recent kernels. It would also be nice if you can attach the failed dmesgs here too for easier access. Thanks.
Created attachment 12568 [details] Kernel 2.6.15 dmesg (working controller) Kernel 2.6.15 : Sata controler working correctly
Created attachment 12569 [details] Kernel 2.6.20 dmesg (NOT working controller)
Created attachment 12570 [details] lspci -vvx on kernel 2.6.20
Should I check the "Regression" checkbox on the bug description header, since it worked for kernel 2.6.15 and not 2.6.20 or 2.6.22. Do you know if there is a way to identify if this regression comes from the kernel itself or by the ubuntu packaging?
Just an extract of the dmesg on 2.6.20 that might help : [ 2.893469] SCSI subsystem initialized [ 2.897624] libata version 2.20 loaded. [ 2.900296] ata_piix 0000:00:1f.1: version 2.10ac1 [ 2.900312] ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18 [ 2.900329] PCI: Setting latency timer of device 0000:00:1f.1 to 64 [ 2.900363] ata1: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14 [ 2.906294] ata2: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15 [ 2.906313] scsi0 : ata_piix [ 2.908392] USB Universal Host Controller Interface driver v3.0 [ 3.230801] ata1.00: ATAPI, max UDMA/66 [ 3.395653] ata1.00: configured for UDMA/66 [ 3.396005] scsi1 : ata_piix [ 3.563402] ATA: abnormal status 0x7F on port 0x00010177 [ 3.570325] scsi 0:0:0:0: CD-ROM PIONEER DVD-RW DVR-108 1.18 PQ: 0 ANSI: 5 [ 3.570585] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] [ 3.570606] ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19 [ 3.570618] PCI: Setting latency timer of device 0000:00:1f.2 to 64 [ 3.570641] ata3: SATA max UDMA/133 cmd 0x0001d000 ctl 0x0001d402 bmdma 0x0001e000 irq 19 [ 3.570662] ata4: SATA max UDMA/133 cmd 0x0001d800 ctl 0x0001dc02 bmdma 0x0001e008 irq 19 [ 3.570672] scsi2 : ata_piix [ 3.740247] ATA: abnormal status 0xD0 on port 0x0001d007 [ 3.740256] scsi3 : ata_piix [ 3.904942] ATA: abnormal status 0x7F on port 0x0001d807
Gaetan, yeah, it's a kernel regression. I'll prep a debug patch against 2.6.22.5. Please wait a bit.
Created attachment 12586 [details] bug-8944-ata_piix-detection-debug.patch Please apply the patch on top of 2.6.22.5 and report the boot log. Thanks.
I've tried the patch you've posted. Here is the result. As I don't know how to capture the boot log to a file with no hard disk, I just have a part of the log, I hope I've written down the most relevant. So, here is the log for 2.6.22.5 WITHOUT your patch : SCSI subsystem initialized ata_piix 0000:00:1f:2: MAP [ P0 P2 IDE IDE ] ACPI PCI Interrupt 0000:00:1f:2[B] -> GSI 19 (level, low) -> IRQ 19 scsi0 : ata_piix scsi1 : ata_piix ata1: SATA max UMDA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14 ata2: SATA max UMDA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15 ata2.... [... about CDROM] My card reader are found on scsi 2:0:0:1 to 2:0:0:3 and it attaches them to sg1 to sg4 scsi 2:0:0:0: Attached scsi generic sg1 type 0 scsi 2:0:0:1: Attached scsi generic sg2 type 0 scsi 2:0:0:2: Attached scsi generic sg3 type 0 scsi 2:0:0:3: Attached scsi generic sg4 type 0 sd 2:0:0:0:0 [sda] Attached SCSI removable disk sd 2:0:0:0:1 [sdb] Attached SCSI removable disk sd 2:0:0:0:2 [sdc] Attached SCSI removable disk sd 2:0:0:0:3 [sdd] Attached SCSI removable disk then I'm dropped on Busybox because /dev/disk/by-uuid/42344dc7-d9c4-43db-a844-ee654a480fb3 does not exists. Here is the same boot sequence WITH your patch: SCSI subsystem initialized ata_piix 0000:00:1f:2: MAP [ P0 P2 IDE IDE ] ACPI PCI Interrupt 0000:00:1f:2[B] -> GSI 19 (level, low) -> IRQ 19 scsi0 : ata_piix scsi1 : ata_piix ata1: SATA max UMDA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14 ata2: SATA max UMDA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15 ata1: XXX: softreset enter ata1: XXX: about to softreset, devmask=2 ata1: XXX: classify to dev0 ff:ff:ff/ff ata1: XXX: classify to dev1 aa:26:02/00 ata1: XXX: softreset enter ata1: XXX: about to softreset, devmask=3 ata1: XXX: classify to dev0 01:14:eb/01 ata1: XXX: classify to dev1 01:00:00/00 ata2.00: ATAPI: PIONEER DVD-RW ..... ... exactly the same log about the DVD drive on IDE and finally the card readers. Hope this helps
Sorry, I set the bug as Invalid by mistake. I'm reopening the bug.
Created attachment 12671 [details] update-diagnostics-failure-handling.patch If you have another machine, setting up serial console or netconsole works nicely. If not, taking a picture of the screen with a digital camera is usually less painful than writing the messages down. Anyways, what you copied is the relevant part. Please apply the attached patch on top of 2.6.22.5 and report the result. Thanks.
Here is the new log (dital camera powered) with both your patches applied: SCSI subsystem initialized ata_piix 0000:00:1f:2: MAP [ P0 P2 IDE IDE ] ACPI PCI Interrupt 0000:00:1f:2[B] -> GSI 19 (level, low) -> IRQ 19 scsi0 : ata_piix scsi1 : ata_piix ata1: SATA max UMDA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14 ata2: SATA max UMDA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15 ata1: XXX: softreset enter ata1: XXX: about to softreset, devmask=2 ata1: XXX: classify dev-582499968 ff:ff:ff/ff ata1: XXX: classify dev-582499968 aa:26:02/00 ata1: XXX: softreset enter ata1: XXX: about to softreset, devmask=3 ata1: XXX: classify dev-538033792 01:14:eb/01 ata1: XXX: classify dev-538033792 01:00:00/00 ata2.00: ATAPI: PIONEER DVD-RW ..... Gaetan
Created attachment 12676 [details] diagnostic-failure-debug.patch Please apply the attached patch on top of clean 2.6.22.5 and report the result. You can just attach a jpg file here. No need to write it down. Thanks.
Created attachment 12685 [details] Boot sequence log Here is the photos of the boot sequence with a clean .22 + your last patch.
Created attachment 12749 [details] diagnostic-failure-debug-1.patch Hmm... Weird. The workaround isn't kicking in. I gotta me missing something stupid. Sorry but can you please try the attached patch once more time? Thanks.
Created attachment 12854 [details] Boot sequence for last patch Sorry for delay but here is the log with your patch (I had to apply both your last 2 patches, without your latter would fail while applying.
Hi all, Any news for this issue? Thx!
I was traveling. I'll look into it soon. Thanks.
Hope you had better chance than me during your travel.... for mine I just broke my leg... :'(
Sorry about the long delay. This time, I moved my apartment. Hope your legs are okay. :-) I went through the log again and found something weird. Previously the controller was put into native dual function mode - PATA controller on 1f.1, SATA on 1f.2 while in the recent screen captures, the controller is now in combined single function mode - 1f.1 doesn't exist anymore and primary channel of 1f.2 serves SATA while the secondary serves PATA. Can you please put the controller into the previous mode (dual function native mode) and report boot log with the patch? Thanks.
Created attachment 13427 [details] Updated error log
Hello. Was very very busy recently (buggy left leg needing many attention and requiring many patches to fix it :) ), so sorry for delay for my new great pictures... So, I've submited a new log after having changed a setting in the BIOS, but I don't know if it will change anything. I hope it was that you mentionned in your comment. I tryed many times with all the option available in the BIOS about SATA controller, and it always resulted by the kernel not to recognize the controller. Options: "On Chip SATA Mode" : - Disabled - Auto - Combined <-- previous - Enhanced <-- new one - non Combined If you have any advise about this please tell me. I have quite a good background in C development but I never digged into SATA development, but if I can do something I'll please to do so ... Many thanks
Created attachment 13428 [details] bug-8944-debug-2.patch Okay, please apply the attached patch on top of 2.6.23.1 and report the result. Thanks.
Has there been any update on this bug? I am having the exact same problem with ICH7 hardware. I've isolated it down to Ubuntu 2.6.15-26-i386 kernel, which boots normally. Any i386 kernel about 2.6.15-29 causes kernel panic. Additionally, attempting to use 2.6.15-26-i686 for SMP results in a system halt. Your help would be greatly appreciated.
Gaetan, ping. Justin, 2.6.15 is ancient. Can you please roll your own 2.6.23.9 or 2.6.24-rc4 kernel and see whether the problem is still there? If the problem is still there, we'll need to test patches to debug it so having custom kernel would help a lot. Thanks.
Created attachment 13921 [details] Bug report for last patch I finally successfully patched and compiled the 2.6.23-1 kernel and it does the same thing (system hand at boot time). Here is the log. Could you quickly explain what it's going wrong (chipset not found, bad id somewhere, ...???) may be I could try some things here (my C background is okay for me, with just a few kernel hack in the past, but never in an SATA driver...). Thanks a lot and sorry for the delay.
Created attachment 13922 [details] bug-8944-debug-3.patch Please try this one. (It's Sunday night here and I'm about to go to bed now so I'll explain it later). Thanks.
Created attachment 13956 [details] New bug report for last patch (wasn't able to get the first messages, hope this is enough...) Thanks.
I'm confused now. So, what exactly isn't being detected here? PATA or SATA? Can you please describe your hardware configuration?
Sorry for the confusion. I've not changed the hardware whatsoever, but the system hangs without any error after that. I think there should be a mechanism to enumerate the partitions and find the "/" filesystem, and this doesn't seem to work. As I read the log (sorry I was not able to get the first screens due to page up limitation, I'll try tonight disconnecting everything but the SATA drive), it seems to fail on IDENTIFY on the PATA drive, and I guess the SATA thing was logged before. Please forget this log I send a cleaner one tonight :) Thanks so much for your patience on this issue....
Created attachment 13975 [details] New log with all external USB things removed, hope this is much more clear for you. Thanks a lot.
I'm confused which drives are not being detected. Your machine has two PATA ports - ata1 and ata2. ata1.00 (the master slot of the first channel) is connected to PIONEER DVD-RW. The slave slot and whole second channel are empty. Then there are four SATA ports mapped to ata3.00, ata4.00, ata3.01 and ata4.01 in that order. ata3.01 which is mapped to the third SATA port reports that there can be an ATA drive connected to the port but IDENTIFY fails on the device. So, you're missing a harddrive connected to the third SATA port, right? What happens if you move the drive to the first or second SATA port?
Hi You're right in you analysis, I do have a DVD on PATA and a Hard drive on the SATA connection, and it is this very latest drive that is not well recognized. As you suggested, I switched the SATA drive on the SATA port 0 (the first one), and removed all USB and ATA connection to see what happen. I can't get the log on my computer, so I wrote them down and I put them below. Things seems to be different: Everything go the same as before until ata1 XXX Softreset reset enter ata1.00: XXX classify 20:20:20/20 ata1.00: XXX class=0 horkage 0x1 0x0 ata1.00: XXX determined class=5 ata1.00: XXX classify 30:30:30/30 ata1.00: XXX class=0 horkage 0x1 0x1 ata1.00: XXX determined class=5 ata2: XXX softreset enter ata2.00: XXX classify 7f:7f:7f/7f ata2.00: XXX class=0 horkage 0x1 0x0 ata2.00: XXX determined class=5 ata2.00: XXX classify 7f:7f:7f/7f ata2.00: XXX class=0 horkage 0x1 0x1 ata2.00: XXX determined class=5 ata_piix 0000:00:1f.2 MAP [ P0 P2 P1 P3] ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19 scsi2: ata_piix scsi3: ata_piix [...] This usualy hanged here, but now this goes farer that before, so it continues with: Begin: Running /scripts/local-premount EXT3-fs: INFO: recovery required on readonly filesystem EXT3-fs: write access will be enabled during recovery kjournal starting. Commit interval 5 seconds EXT3-fs: recovery complete EXT3-fs: mounted filesystem with ordered data mode Done. Done. Begin: Running /scripts/init-bottom Done. * Init: version 2.86 booting Reading files needed to boot... Starting preliminary keymap... done Preparing restricted drivers Setting the system clock Starting basic networking Starting kernel even manager Loading hardware drivers [...] Setting the system check And after that, it displays a loop of the following message: device-mapper: table 254:2: linear: dm-linear: Device lookup failed device-mapper: table 254:2: linear: dm-linear: Device lookup failed device-mapper: table 254:2: linear: dm-linear: Device lookup failed device-mapper: table 254:2: linear: dm-linear: Device lookup failed device-mapper: table 254:2: linear: dm-linear: Device lookup failed device-mapper: table 254:2: linear: dm-linear: Device lookup failed [...] So, using SATA 0 for the hard drive does changed something, and it seem to see my EXT3-fs partitions. For information, here is my partition tables: /dev/sda1 * 1 12 96358+ 83 Linux /dev/sda2 135 24321 194282077+ 5 Extended /dev/sda4 13 134 979965 82 Linux swap / Solaris /dev/sda5 135 1958 14651248+ 83 Linux /dev/sda6 1959 24321 179630766 83 Linux To my opinion, the partitions are well discovered but access fails.
The SATA controller is ata3 and 4 (this was where I was confused too :-) so I need to look at the debug log when you move the drive to the first SATA port. It looks like it could just be a faulty drive. The drive is reporting !BSY && !DRQ during IDENTIFY, something is definitely wrong with it. Can you please try it on a different SATA controller (friend's computer, different motherboard, an add-on card...)? Thanks.
I don't known anybody with a >2.6.23 kernel around me. But I can say you that this drive is working perfectly on the very same machine but with a 2.6.15 kernel. So it can't comes from the drive itself neither the SATA controller.
Are you still running 2.6.15? It seems we'll have to add debug messages to 2.6.15 and see how it's working. Thanks.
Yes, my computer actually runs all day on Feisty Fawn but with a 2.6.15-23-386 kernel and work completely (nvidia acceleration, ...), except for the Dual core (only one core seen by linux). Please see the log from comment #2 for dmesg from 2.6.15 with the same hard drive
Hi Heo, I'll not be here for the next month, unfortunately with out access to my computer. Sorry for the inconvience. See you in a few weeks. Happy new year! Gaetan
Happy new year! Gaetan. This one is difficult && you're currently the only one reporting detection problem on ata_piix. Lucky you. :-) I'm ordering the same board now. Let's see if I can reproduce the problem here. I'll report back. Thanks.
I'm back :) Do you have any news on this old boy issue? G.
Great. As I wrote above, I ordered the same board. I got it and tried to reproduce your problem but my boards detects harddrives just fine. I found the BIOS a bit peculiar tho. It has auto mode which turns on and off controllers and switch modes depending on to which connectors hard drives are connected and it didn't seem to be doing the right thing all the time. To which mode did you set your BIOS mode to?
So it seems there should be at least one software and/or hardware setup configuration that works... except if that comes from my very hard drive... Ok I give you the complete BIOS setup I'm using right now: Standard CMOS Features Channel 0 Master : [PIONEER DVD-RW DVD-] Channel 0 Slave : [None] Channel 2 Master : [ST3200822AS] Integrated Peripheral On Chip PCI IDE : Enabled On Chip SATA Mode : Enhanced (already tried Auto and Combined Mode) PATA IDE Set to : Ch.0 Master/Save SATA Port0/2 Set to : Ch.2 Master/Save SATA Port1/2 Set to : Ch.3 Master/Save All other settings enabled except Legacy USB Storage detext and On board LAN Boot ROM. --- On the first startup screen, the harddrive seems to be connected to Channel 2: IDE Channel 2 Master : ST3200822AS 3.01 One the second startup screen, there is the following line : IDE Channel 2, Master Dist HDD S.M.A.R.T. capability.... Disabled Anyway, thanks a lot for all your effort your spending on my issue.. Gaetan
I tried to reproduce this one more time today. My configuration is almost identical to yours. The same board, one PATA optical drive connected as master to the IDE channel. One Seagate SATA drive (7200.10), connected to one of the SATA ports (I tried all four) and it gets detected just fine. Ah... so strange. I bet you're still using 2.6.15, right? The log from #2 is not from enhanced mode, it's from combined mode. Can you please post boot log with BIOS setting configured as in comment #42? And you're booting off (loading the kernel from) the harddisk, right? Thanks.
Hello. Thanks for following this issue with so much effort. I've managed some days ago to get a stable configuration, by resetting the bios and installing the latest ubuntu (hardy). I works now completly (with the latest kernel). I don't know if I have to reproduce the issue by finding the incorrect setup in the bios, or if you want the log/bios configuration for information purpose, please tell me if you have anyone reproducing this issue so I might help him. Anyway, I think you can close this bug since everything works perfectly for some days (and no pbl with latest kernel update yesterday night to 2.6.24-18 I think). Anyway I'd like to thanks you a lot you are really giving lot of effort for the kernel. Are you doing this during your spare time?
Great that it works now. The BIOS on that board is quirky. I triggered a few strange not-so-reproducible problems playing with add-on cards and BIOS settings so it could be that something funky happened w/ the BIOS. It would be great if you can try various BIOS harddisk settings and verify that all of them work correctly (keep in mind that not all SATA ports are accessible if it's in combined mode). libata is (partially) my baby and SUSE is paying my lazy ass, so I'm just doing my job. :-)