Bug 30962
Summary: | Linux installer does not detect my 2TB SATA3 Hard Drive (or sees partitions on it as corrupted) | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Mircea Kitsune (sonichedgehog_hyperblast00) |
Component: | Serial ATA | Assignee: | Jeff Garzik (jgarzik) |
Status: | RESOLVED INSUFFICIENT_DATA | ||
Severity: | blocking | CC: | alan, mlord, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://bugzilla.novell.com/show_bug.cgi?id=617288 | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
Requested tests for debugging the issue
Requested test for debugging the issue Requested test for debugging the issue |
Description
Mircea Kitsune
2011-03-12 14:11:13 UTC
Can you please attach the followings? * The output of "lspci -nn" * The output of "dmesg" after the said failure happened. * The output of "hdparm -I /dev/sdX" and "smartctl -a /dev/sdX" where sdX is the failing drive. Thanks. Created attachment 50672 [details]
Requested tests for debugging the issue
Sure. Posted all of them in an archive here (for all devices), and included the Yast logs as well for more info. Done from OpenSuse 11.4 64bit's install DVD. A summary of my drives: sda - My other 400GB hard drive, that works well with Linux. Everything should be fine with this one. sdb - The problematic 2TB hard drive. sdc - Probably my DVD drive, or the RAM or a virtual drive. sdd - USB stick I used to save the console outputs to. NOTE: Although this wasn't printed in the output for sdb, when I ran the hdparm command on the problematic drive, I got the following message: HDIO_DRIVE_CMD(identify) failed: Input/output error Please also attach /var/log/boot.msg. The dmesg is truncated. Created attachment 50692 [details]
Requested test for debugging the issue
Here's the boot.msg.
Okay, the controller which is causing the problem is the marvell one @01:00.0, not Jmicron. The controller looks pretty funky with virtual device and it also has set up HPA for some reason. A couple of things to try... * See whether 'libata.force=3Gbps' or 1.5Gbps makes the problem go away. * Try 'libata.ignore_hpa=0'. If none of the above solves the problem, I don't know. I personally am not too interested in digging down marvell controller problems. I don't have access to documentation, hardware or technical contacts and it seems that way for most other libata developers too. Maybe we should just declare marvell controllers unsupported. Mark, AFAIK, you're the one with the most contact with marvell, so cc'ing you. Thanks. I shall try these settings too. Do I just write them in the boot options field before I startup the installer? Also, does this risk corrupting any data on the hard drive? Just to be sure, as I have my main Windows system and data on it. But like I mentioned in the description, I tried running the hard drive on both the jmicron SATA ports, and the other controller on the mobo (not sure what that was, but it's a different chipset for sure). One being a sata2 controller and the other sata3. Both had the same issue however. Whereas plugging the 400GB drive into any port / controller always works. So this probably means it's the HDD internal controller, and not the motherboard SATA chipset, given I tried both of these chipsets. Anyway, I know this is probably a bigger issue to find and fix, and not sure who can do it. But I'd be very grateful if anyone at all could do something about it. I miss using Linux. And given I can't be the only person in the world that has this hard drive (or will get it), many people could be affected by the issue. Hope very much something can be done. It's not very likely that the problem is wide spread. If you can reproduce the problem while the disk is connected to other controllers, please do so and attach /var/log/boot.msg and the output of dmesg (as text/plain w/o compression, please. You can concatenate the two files if you like). As for the parameters, yeah, specifying them from boot loader should work. You can verify that the parameter was properly specified from the boot log too. In case of doubt, please post the boot.msg. Thanks. I'll try that soon, and post these outputs with the 2TB drive connected to the other controller too. The issue with dmesg is that I use the 'dmesg > filename' command to print it to a text file. What is the correct version of that command to not compress it, and save it to a text file? Oh, what I meant was to not zip it when attaching the output here. If you want a single log, just do 'cp /var/log/boot.msg dmesg.out; dmesg >> dmesg.out'. There is way too much verbage for me to see what the problem is here. I see lots of mention of "doesn't see my drive" and "I/O errors", but I don't see the actual kernel logs for the "I/O errors". Perhaps the affected party might post a *concise* summary of what is not working, with evidence, and WITHOUT a zillion links to other articles that then link back to this one again ? Keep it very short, and to the point, with supporting "dmesg" output (kernel logs). Thanks Created attachment 50722 [details]
Requested test for debugging the issue
Just tried the tests suggested here, and finally have good news! I'll start from the beginning:
When trying the commands suggested by Tejun, I got the following results:
libata.force=3Gbps - Nothing changed.
libata.force=1.5Gbps - Nothing changed, just the installer taking longer to load.
libata.ignore_hpa=0 - WORKED! My hard disk and its partitions were finally seen without any issues, and no more I/O errors with hdparm.
I've also attached the boot.msg and dmesg with the problematic drive connected to the other SATA controller. On the motherboard manual, the controller I had it connected to is called GSATA3. The other one I switched to for this test is labeled SATA2, which is the other chipset.
It didn't really change anything, and the disk was still not detected and had I/O errors. The only thing that changed was that I also got an additional message saying:
"The partitioning on disk /dev/sdb is not readable by the partitioning tool parted, which is used to change the partition table.
You can use the partitions on disk /dev/sdb as they are. You can format them and assign mount points to them, but you cannot add, edit, resize or remove partitions from that disk with this tool."
So yeah, it seems that libata.ignore_hpa=0 did it at last :) I can hopefully install using this flag until the issue is solved, though I'm kinda scared of data corruption until more is known about the problem.
@ Mark: The reason I linked other topics and bug trackers is because I discussed different things in each of them. So I thought this would offer as much info as possible. As for concrete results, I did all the tests requested here, and posted all outputs I was asked to. If anything more is needed, please let me know and I'll gladly try to do that too.
Hmmm.... the same failure even when connected to ich ahci. This actually could be a firmware bug on the drive. HPA unlocking maybe somehow gets the drive confused. Can you check whether there is a firmware update available? Ok. I never updated a hard disk firmware before, and like I said I'm afraid of data loss. So this might take a while, but I'll try. Looked for a HDD firmware update, but couldn't find any. I asked the people at Seagate about it too, and even showed them this issue and what I need it for. They said my drive doesn't have nor need any newer firmware. I updated my BIOS instead. The new rev is said to improve compatibility with SATA3 throughout many things, but it doesn't fix this issue, nor add HPA in the bios menu (someone said some motherboards might have it there). Let me know if there's anything else I can test and help with. In case this is a Kernel issue and not a bios / firmware bug (which I'm not sure about, given it happens with the Linux installer only). Hmmm... okay. Can you please try the followings? * Boot the machine w/o the hard drive connected to the port. * Once the machine is booted, connect the hard drive. dmesg should show a hotplug event and the device will show up as /dev/sdX device. * Verify IOs to the hard drive works. * Run the following commands. Make sure the device is not RW mounted at this point. hdparm -N 3907020911 /dev/sdX echo - - - > /sys/block/sdX/$(readlink /sys/block/sdX/device)/../../scsi_host/host0/scan * Check whether IOs to the hard drive works. * Post the test result and dmesg output. Thanks. Missing a closing parenthesis there. Nak.. parenthesis are okay as is. Pardon the noise. :) I tried those tests too. But this time, I was not able to get the results. Mostly because, no matter at what point I plugged the hard drive in, it would still not be mapped to dev/sdx. If I would plug it in the motherboard after the Hardware Probing step of the installer (while I was in the partitioning menu), nothing would happen and it would not get detected. If I would plug it in at the welcome screen of the installer (first interactive screen after I start it up), it would be mapped to dev/sdc I think. And if I'd plug it in during / before the Grub menu (before entering the installer at all), it would be mapped to dev/sdb as usual. I did try dmesg each time, but seen nothing special in the output. I still tried those commands however, as the drive was mapped to dev/sdc after plugging it in at some point after booting. 'hdparm -N 3907020911 /dev/sdc' gave me the usual I/O error, while 'echo - - - > /sys/block/sdc/$(readlink /sys/block/sdc/device)/../../scsi_host/host0/scan' gave me a "file or folder not found" message. Let me know if I should re-try this in a different way. Although this looks like a more risky test, and gave me quite a few scares at some point. So I'd prefer to redo this one only if it's very necessary, if that's ok. It's not dangerous as long as nothing is mounted rw. As for why the device is not showing up, I can't really tell w/o the kernel log and it would probably be much easier if you have the system installed on a different hard disk. Can you please get the system installed on a different drive and try again and then post the kernel log? You might also/instead be able to collect the log info by booting from a LiveCD rather than from the hard drive. But if you're not at all comfortable with Linux, then a full install onto a different hard disk will likely be more straightforward. Reinstalling my system on a different drive is not something I can do now. But if it's clearly safe to do such tests, I don't mind I guess. I do prefer the drive to be always mounted read-only, still. And booting via live CD is something I can do. But like I mentioned, at least most of the time, a live CD version DOES see my HDD properly (even the partitioning tool if I open it from Yast). It's only the installer I know this to happen with, from what I remember and if I'm correct. If I get the OpenSuse 11.4 live CD version, what file in what folder should I copy and paste here (or what console output)? Are there any news on this? It's been almost half an year again, and I worry that Linux still won't support my hardware at next release. If no one here knows, maybe email some higher developers about it. I tried to email this to Linus himself, but no reply... maybe someone else has more luck? |