Bug 30962

Summary: Linux installer does not detect my 2TB SATA3 Hard Drive (or sees partitions on it as corrupted)
Product: IO/Storage Reporter: Mircea Kitsune (sonichedgehog_hyperblast00)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: blocking CC: alan, mlord, tj
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://bugzilla.novell.com/show_bug.cgi?id=617288
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: Requested tests for debugging the issue
Requested test for debugging the issue
Requested test for debugging the issue

Description Mircea Kitsune 2011-03-12 14:11:13 UTC
This is a very severe bug that I'm experiencing for over an year, and am desperately trying to get help with. I was told this is a Kernel issue, so I'm posting it here too. I will quote it from another bugtracker that I made on the matter (please see the original bug at https://bugzilla.novell.com/show_bug.cgi?id=617288 where I've also submitted some test results).

Three months ago I got a Seagate 2TB Baracuda XT, SATA3, 7200rpm, 64MB Hard Drive, which I am now using as my main drive with Windows ( http://www.emag.ro/hard_disk-uri/hdd-seagate-2tb-baracuda-xt-sata3-7200rpm-64mb--pST32000641AS ).

The issue I am experiencing is OpenSuse 11.2 to 11.4 not seeing my hard drive, and if it sees it the partitions are considered corrupted. This makes it impossible to install OpenSuse and possibly to browse existing partitions with it.

I am certain that my hard drive is not damaged, nor any other hardware. I'm using this hard drive as my main HDD with Windows 7 64bit, and have never experienced the smallest issue with it (apart from this problem). I have another hard drive as well (400 GB on SATA2) which is detected properly by OpenSuse. Connecting or disconnecting this drive does not change the 2TB drive not working however.

I tried different bios settings, such as setting it to both AHCI and IDE mode, and physical settings like using different SATA ports (tried sata2 and sata3 ports alike). I even tried wiping the HDD to nothing, but still the same issue (note that OpenSuse still said partitions are corrupted while the HDD was completely wiped and had no partitions). I don't have any RAID setup that I'm aware of, and currently run my hard drives in AHCI mode.

I wrote more detail in this forum topic: http://forums.opensuse.org/get-help-here/hardware/437039-opensuse-11-2-does-not-see-my-sata3-hard-disk-partitions-2.html and http://forums.opensuse.org/english/get-technical-help-here/pre-release-beta/440987-major-2tb-sata3-hard-disk-still-not-detectable-11-3-rc1.html I don't know any more about the issue than what I mentioned there, but I can run other tests if that can help. Note that I currently have two NTFS partitions which I use and keep all my data on, so I cannot try anything risky that can damage them and causing me to loose data. I'm also inexperienced with Linux, so I would appreciate detail on what I need to do in order to test or get more info. Here are important quotes from what I wrote in that topic:

------------------------------------------
Hello everyone. I got a new hard drive yesterday, a Seagate Baracuda XT 2000GB SATA3. I'm running it on the SATA3 jmicron chip in AHCI mode, and Opensuse 11.2 is having issues with using this hard drive properly.

The hard drive seems to take a while to detect, for one thing. After it is detected however, I am told I have no hard disk that can be used for installation. The exact message is:

"No hard disks were found for the installation. Please check your hardware!"

After that, I'm told the partitioner can't read the partitions on my hard drive properly:

"The partitioning on disk /dev/sda is not readable by the partitioning tool parted, which is used to change the partition table.

You can use the partitions on disk /dev/sda as they are. You can format them and assign mount points to them, but you cannot add, edit, resize, or rename partitions from that disk with this tool."

I partitioned the new hard drive under Windows, with a partitioning tool called Partition Wizard (free / home version). I'm not sure if hard drives partitioned with this tool are not recognized by OpenSuse's partitioner, or if it's something else.

I already installed Windows on this hard drive, so I can't delete all the partitions I have so far and start all over again (I'm keeping my system a dual-boot between Windows 7 and openSuse 11.2). What can I do so OpenSuse will see and modify my partitions? If it is the partitioning not being understood, is there some sort of tool that can make the partition table of the hard disk linux-readable? (a free Windows program that could do such)
------------------------------------------

------------------------------------------
It is detected correctly in BIOS, and it's not set in RAID but AHCI. I get this issue even if I run it in IDE mode, or on a non-jmicron SATA2 port.

Good idea to try the Windows partitioner, I forgot it actually has one. I decided to wipe the hard disk and reinstall everything again, with a better partitioner, so I'll see if this still happens after.
------------------------------------------

------------------------------------------
OK. After completely wiping my HDD clean using a 5 hour wiping tool then running the installer again, I noticed neither openSuse 11.2 nor openSuse Live KDE 11.3 Milestone4 see my hard disk.

I then partitioned it using the Windows 7 partitioning tool. Booting the live 11.3 sees and mounts my ntfs partitions, but the partitioning tool and 11.2 installer give me the same errors described in the first post.

So my final thought is that openSuse has an issue with my HDD. If the reason isn't sata3, it may be that it's a 2TB perhaps. Could it be possible that openSuse might not understand such big hard drives yet?

I'm certain my HDD is not in any way damaged. I just got it and Windows 7 as well as all other files work like a charm, so there's clearly nothing wrong with it imo.

Has anyone else managed to install openSuse 11.2 on a SATA3 HDD and / or a +2TB one?
------------------------------------------

------------------------------------------
Yeah, I use the official copies only, and try to use good media and do all preparations. The installation media is likely not damaged, because both the 11.2 DVD and 11.3 KDE live CD I burned yesterday do the same thing, and the other HDD is always detected properly. Will check the install disk later when I look into this again.

Haven't tried another distro yet, perhaps I will at some point. Configuring the rest of my system now.

And as far as I know I'm not using RAID. All my HDD ports are set to AHCI mode from BIOS.

I shall probably test more, but imho I think one of the devs should look at this, if they have a similar HDD to test with. When it comes to Linux I'm still a newb, but this strongly feels to me like openSuse does not understand the new HDD properly, from everything I tried and seen.
------------------------------------------

------------------------------------------
Thanks a lot for the info. I'm not sure if my drive is a '4K physical sector', is there any way to check for that and see?

And now it appears that whether I plug in my sata2 or not, the same issues happen with the new HDD. So from what I'm seeing having the other one doesn't affect the new one not working.

I finished reinstalling my Windows system and copying my data, so I can't do anything risky any more. I'd be glad to do more safe tests and help with fixing this though.
------------------------------------------
Comment 1 Tejun Heo 2011-03-12 14:16:07 UTC
Can you please attach the followings?

* The output of "lspci -nn"
* The output of "dmesg" after the said failure happened.
* The output of "hdparm -I /dev/sdX" and "smartctl -a /dev/sdX" where sdX is the failing drive.

Thanks.
Comment 2 Mircea Kitsune 2011-03-12 16:24:24 UTC
Created attachment 50672 [details]
Requested tests for debugging the issue
Comment 3 Mircea Kitsune 2011-03-12 16:24:35 UTC
Sure. Posted all of them in an archive here (for all devices), and included the Yast logs as well for more info. Done from OpenSuse 11.4 64bit's install DVD.

A summary of my drives:

sda - My other 400GB hard drive, that works well with Linux. Everything should be fine with this one.

sdb - The problematic 2TB hard drive.

sdc - Probably my DVD drive, or the RAM or a virtual drive.

sdd - USB stick I used to save the console outputs to.

NOTE: Although this wasn't printed in the output for sdb, when I ran the hdparm command on the problematic drive, I got the following message: HDIO_DRIVE_CMD(identify) failed: Input/output error
Comment 4 Tejun Heo 2011-03-12 19:25:43 UTC
Please also attach /var/log/boot.msg.  The dmesg is truncated.
Comment 5 Mircea Kitsune 2011-03-12 21:21:37 UTC
Created attachment 50692 [details]
Requested test for debugging the issue

Here's the boot.msg.
Comment 6 Tejun Heo 2011-03-13 08:38:43 UTC
Okay, the controller which is causing the problem is the marvell one @01:00.0, not Jmicron. The controller looks pretty funky with virtual device and it also has set up HPA for some reason.

A couple of things to try...

* See whether 'libata.force=3Gbps' or 1.5Gbps makes the problem go away.

* Try 'libata.ignore_hpa=0'.

If none of the above solves the problem, I don't know.  I personally am not too interested in digging down marvell controller problems.  I don't have access to documentation, hardware or technical contacts and it seems that way for most other libata developers too.  Maybe we should just declare marvell controllers unsupported.

Mark, AFAIK, you're the one with the most contact with marvell, so cc'ing you.

Thanks.
Comment 7 Mircea Kitsune 2011-03-13 13:50:30 UTC
I shall try these settings too. Do I just write them in the boot options field before I startup the installer? Also, does this risk corrupting any data on the hard drive? Just to be sure, as I have my main Windows system and data on it.

But like I mentioned in the description, I tried running the hard drive on both the jmicron SATA ports, and the other controller on the mobo (not sure what that was, but it's a different chipset for sure). One being a sata2 controller and the other sata3. Both had the same issue however. Whereas plugging the 400GB drive into any port / controller always works. So this probably means it's the HDD internal controller, and not the motherboard SATA chipset, given I tried both of these chipsets.

Anyway, I know this is probably a bigger issue to find and fix, and not sure who can do it. But I'd be very grateful if anyone at all could do something about it. I miss using Linux. And given I can't be the only person in the world that has this hard drive (or will get it), many people could be affected by the issue. Hope very much something can be done.
Comment 8 Tejun Heo 2011-03-13 13:54:59 UTC
It's not very likely that the problem is wide spread.  If you can reproduce the problem while the disk is connected to other controllers, please do so and attach /var/log/boot.msg and the output of dmesg (as text/plain w/o compression, please. You can concatenate the two files if you like).

As for the parameters, yeah, specifying them from boot loader should work.  You can verify that the parameter was properly specified from the boot log too.  In case of doubt, please post the boot.msg.

Thanks.
Comment 9 Mircea Kitsune 2011-03-13 14:04:33 UTC
I'll try that soon, and post these outputs with the 2TB drive connected to the other controller too.

The issue with dmesg is that I use the 'dmesg > filename' command to print it to a text file. What is the correct version of that command to not compress it, and save it to a text file?
Comment 10 Tejun Heo 2011-03-13 14:22:54 UTC
Oh, what I meant was to not zip it when attaching the output here.  If you want a single log, just do 'cp /var/log/boot.msg dmesg.out; dmesg >> dmesg.out'.
Comment 11 Mark Lord 2011-03-13 14:28:27 UTC
There is way too much verbage for me to see what the problem is here.
I see lots of mention of "doesn't see my drive" and "I/O errors", but I don't see the actual kernel logs for the "I/O errors".

Perhaps the affected party might post a *concise* summary of what is not working, with evidence, and WITHOUT a zillion links to other articles that then link back to this one again ?

Keep it very short, and to the point, with supporting "dmesg" output (kernel logs).

Thanks
Comment 12 Mircea Kitsune 2011-03-13 16:09:54 UTC
Created attachment 50722 [details]
Requested test for debugging the issue

Just tried the tests suggested here, and finally have good news! I'll start from the beginning:

When trying the commands suggested by Tejun, I got the following results:

libata.force=3Gbps - Nothing changed.

libata.force=1.5Gbps - Nothing changed, just the installer taking longer to load.

libata.ignore_hpa=0 - WORKED! My hard disk and its partitions were finally seen without any issues, and no more I/O errors with hdparm.

I've also attached the boot.msg and dmesg with the problematic drive connected to the other SATA controller. On the motherboard manual, the controller I had it connected to is called GSATA3. The other one I switched to for this test is labeled SATA2, which is the other chipset.

It didn't really change anything, and the disk was still not detected and had I/O errors. The only thing that changed was that I also got an additional message saying:

"The partitioning on disk /dev/sdb is not readable by the partitioning tool parted, which is used to change the partition table.
You can use the partitions on disk /dev/sdb as they are. You can format them and assign mount points to them, but you cannot add, edit, resize or remove partitions from that disk with this tool."

So yeah, it seems that libata.ignore_hpa=0 did it at last :) I can hopefully install using this flag until the issue is solved, though I'm kinda scared of data corruption until more is known about the problem.

@ Mark: The reason I linked other topics and bug trackers is because I discussed different things in each of them. So I thought this would offer as much info as possible. As for concrete results, I did all the tests requested here, and posted all outputs I was asked to. If anything more is needed, please let me know and I'll gladly try to do that too.
Comment 13 Tejun Heo 2011-03-13 16:37:12 UTC
Hmmm.... the same failure even when connected to ich ahci. This actually could be a firmware bug on the drive. HPA unlocking maybe somehow gets the drive confused.  Can you check whether there is a firmware update available?
Comment 14 Mircea Kitsune 2011-03-13 16:40:37 UTC
Ok. I never updated a hard disk firmware before, and like I said I'm afraid of data loss. So this might take a while, but I'll try.
Comment 15 Mircea Kitsune 2011-03-15 16:31:56 UTC
Looked for a HDD firmware update, but couldn't find any. I asked the people at Seagate about it too, and even showed them this issue and what I need it for. They said my drive doesn't have nor need any newer firmware.

I updated my BIOS instead. The new rev is said to improve compatibility with SATA3 throughout many things, but it doesn't fix this issue, nor add HPA in the bios menu (someone said some motherboards might have it there).

Let me know if there's anything else I can test and help with. In case this is a Kernel issue and not a bios / firmware bug (which I'm not sure about, given it happens with the Linux installer only).
Comment 16 Tejun Heo 2011-03-16 09:22:38 UTC
Hmmm... okay.  Can you please try the followings?

* Boot the machine w/o the hard drive connected to the port.

* Once the machine is booted, connect the hard drive.  dmesg should show a hotplug event and the device will show up as /dev/sdX device.

* Verify IOs to the hard drive works.

* Run the following commands.  Make sure the device is not RW mounted at this point.

  hdparm -N 3907020911 /dev/sdX
  echo - - - > /sys/block/sdX/$(readlink /sys/block/sdX/device)/../../scsi_host/host0/scan

* Check whether IOs to the hard drive works.

* Post the test result and dmesg output.

Thanks.
Comment 17 Mark Lord 2011-03-16 12:57:00 UTC
Missing a closing parenthesis there.
Comment 18 Mark Lord 2011-03-16 12:57:50 UTC
Nak.. parenthesis are okay as is.  Pardon the noise.  :)
Comment 19 Mircea Kitsune 2011-03-16 13:41:35 UTC
I tried those tests too. But this time, I was not able to get the results. Mostly because, no matter at what point I plugged the hard drive in, it would still not be mapped to dev/sdx.

If I would plug it in the motherboard after the Hardware Probing step of the installer (while I was in the partitioning menu), nothing would happen and it would not get detected. If I would plug it in at the welcome screen of the installer (first interactive screen after I start it up), it would be mapped to dev/sdc I think. And if I'd plug it in during / before the Grub menu (before entering the installer at all), it would be mapped to dev/sdb as usual. I did try dmesg each time, but seen nothing special in the output.

I still tried those commands however, as the drive was mapped to dev/sdc after plugging it in at some point after booting. 'hdparm -N 3907020911 /dev/sdc' gave me the usual I/O error, while 'echo - - - > /sys/block/sdc/$(readlink /sys/block/sdc/device)/../../scsi_host/host0/scan' gave me a "file or folder not found" message.

Let me know if I should re-try this in a different way. Although this looks like a more risky test, and gave me quite a few scares at some point. So I'd prefer to redo this one only if it's very necessary, if that's ok.
Comment 20 Tejun Heo 2011-03-16 13:47:33 UTC
It's not dangerous as long as nothing is mounted rw. As for why the device is not showing up, I can't really tell w/o the kernel log and it would probably be much easier if you have the system installed on a different hard disk.  Can you please get the system installed on a different drive and try again and then post the kernel log?
Comment 21 Mark Lord 2011-03-16 15:19:17 UTC
You might also/instead be able to collect the log info by booting from a LiveCD rather than from the hard drive.  But if you're not at all comfortable with Linux, then a full install onto a different hard disk will likely be more straightforward.
Comment 22 Mircea Kitsune 2011-03-16 17:41:07 UTC
Reinstalling my system on a different drive is not something I can do now. But if it's clearly safe to do such tests, I don't mind I guess. I do prefer the drive to be always mounted read-only, still.

And booting via live CD is something I can do. But like I mentioned, at least most of the time, a live CD version DOES see my HDD properly (even the partitioning tool if I open it from Yast). It's only the installer I know this to happen with, from what I remember and if I'm correct.

If I get the OpenSuse 11.4 live CD version, what file in what folder should I copy and paste here (or what console output)?
Comment 23 Mircea Kitsune 2011-07-06 20:35:07 UTC
Are there any news on this? It's been almost half an year again, and I worry
that Linux still won't support my hardware at next release. If no one here
knows, maybe email some higher developers about it. I tried to email this to
Linus himself, but no reply... maybe someone else has more luck?