Bug 11579 - Libata SATA hard drive detection
Summary: Libata SATA hard drive detection
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-16 21:31 UTC by drhaun88
Modified: 2009-05-18 17:10 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.24.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Dmesg output for kernel version 2.6.23 (2.49 KB, application/octet-stream)
2008-09-16 21:34 UTC, drhaun88
Details
Dmesg output for kernel version 2.6.24 (1.81 KB, application/octet-stream)
2008-09-16 21:36 UTC, drhaun88
Details
SEMB-sig-debug.patch (523 bytes, patch)
2008-09-17 14:03 UTC, Tejun Heo
Details | Diff
Dmesg output post-patch (30.21 KB, application/octet-stream)
2008-09-18 08:22 UTC, drhaun88
Details
Ubuntu LiveCD with Samsung hard drive (30.27 KB, application/octet-stream)
2008-10-08 10:52 UTC, drhaun88
Details
Smart test result (10.76 KB, text/plain)
2008-10-08 12:33 UTC, drhaun88
Details
results of lspci -nn (2.45 KB, text/plain)
2008-10-08 14:59 UTC, Lon Ingram
Details
lspci -nn output (1.96 KB, application/octet-stream)
2008-10-08 20:52 UTC, drhaun88
Details
lspci -nn output (Asus P5K motherboard) (2.70 KB, text/plain)
2009-04-06 15:38 UTC, Lars Wirzenius
Details
libata-workaround-semb-sig.patch.eml (2.67 KB, patch)
2009-04-14 10:36 UTC, Tejun Heo
Details | Diff
libata-workaround-semb-sig.patch (2.53 KB, patch)
2009-04-14 21:22 UTC, Tejun Heo
Details | Diff

Description drhaun88 2008-09-16 21:31:00 UTC
Latest working kernel version:2.6.23.17
Earliest failing kernel version: 2.6.24.1
Distribution: Ubuntu
Hardware Environment: Custom PC with Abit KN9 motherboard, nForce 4 chipset, and 2 SATA Western Digital hard drives
Software Environment:
Problem Description: Kernels later than and including 2.6.24 no longer detect 2 internal Western Digital SATA hard drives, both WD1600JS-62M, returning a "SEMB device ignored" message on startup. However, a Cavalry external hard drive using eSATA and a Western Digital WD5000AACS-0 hard drive is detected. All drives utilize libata and sata_nv drivers.

Kernel version 2.6.23 used libata version 2.21 and sata_nv version 3.5, while 2.6.24 and later use libata version 3.0 and sata_nv version 3.5

Steps to reproduce:
1. Compile a 2.6.24 kernel on an Abit KN9 motherboard using a Nvidia nForce 4 chipset, also known as CK804, with 2 internal SATA Western Digital hard drives and an external SATA Western Digital hard drive, with Linux being installed on the external hard drive.

2. Start Linux.

3. Linux will start, however neither of the internal drives will be detected.
Comment 1 drhaun88 2008-09-16 21:34:14 UTC
Created attachment 17827 [details]
Dmesg output for kernel version 2.6.23
Comment 2 drhaun88 2008-09-16 21:36:22 UTC
Created attachment 17828 [details]
Dmesg output for kernel version 2.6.24
Comment 3 Tejun Heo 2008-09-17 04:21:55 UTC
Can you please try 2.6.26.5?  Thanks.
Comment 4 drhaun88 2008-09-17 12:29:28 UTC
Still the same problem. I've also tried pre-patch versions 2.6.27-rc5 and rc6, with the same results.

I'll admit right now I know barely anything about how drivers are written, but I did notice that kernels that use libata 2.21 work fine, and later kernels that use libata 3.00 do not work, so that has been my hypothesis, but again, I'm not an expert or anything.
Comment 5 Tejun Heo 2008-09-17 14:03:02 UTC
Created attachment 17842 [details]
SEMB-sig-debug.patch

Can you please try the attached patch?  Thanks.
Comment 6 drhaun88 2008-09-17 19:19:01 UTC
Attempting to install the patch gave me this:

test@test-desktop:~$ sudo patch < /usr/src/SEMB-sig-debug.patch
can't find file to patch at input line 5
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
|index 5ba96c5..b58c2c3 100644
|--- a/drivers/ata/libata-core.c
|+++ b/drivers/ata/libata-core.c
--------------------------
File to patch: 
Comment 7 drhaun88 2008-09-17 20:43:16 UTC
Ok, correction, I was trying to patch my currently running kernel, silly me. I was able to patch and recompile, and it worked like a charm!

On one other note, is it possible that a permanent fix can be merged into the main kernel, so it won't be a constant process of patching and recompiling every time a new kernel comes out? 

Anyway, it seems to work just fine, thanks a bunch!
Comment 8 Tejun Heo 2008-09-18 01:39:12 UTC
Can you please post the boot log from the patched kernel?  I'm surely gonna merge the fix upstream but I first need to find out what's going on.  Your controller probably is reporting SEMB signature for ATA devices and that's why the detection got broken.  I hope this is from the controller not the drive.  The drives are directly connected to the connectors, right?  Does your board have some special IO feature - say integrated PMP or hardware RAID, etc...?
Comment 9 drhaun88 2008-09-18 08:22:41 UTC
Created attachment 17859 [details]
Dmesg output post-patch

My drives are all plugged directly into the motherboard. I'm also not running any special IO features.
Comment 10 Tejun Heo 2008-09-28 19:20:28 UTC
Can you please try to connect a different drive and see whether the SEMB message is triggered?  This is the first report of this problem and I hope it's caused by the controller instead of the drive.  Thanks.
Comment 11 Lon Ingram 2008-10-02 11:47:13 UTC
It may be unrelated, but I have been fighting a problem installing Ubuntu on the Western Digital Caviar SE WD1600AAJS.  While I had no trouble detecting the drive, I couldn't install any of the following: Ubuntu Intrepid A6, Fedora 10B, Arch or Xubuntu.  All failed with similar symptoms at partition time.  I had the same problem with two separate drives of this model.  I just successfully installed Intrepid A6 on a Hitachi drive.  Please see the following for syslog and partman logs:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276558
Comment 12 drhaun88 2008-10-03 07:46:49 UTC
(In reply to comment #10)
> Can you please try to connect a different drive and see whether the SEMB
> message is triggered?  This is the first report of this problem and I hope
> it's
> caused by the controller instead of the drive.  Thanks.
> 

Unfortunately, I don't have access to another SATA hard drive, but several months ago, I tried a Ubuntu 8.04 LiveCD on a friend's machine, using a 2.6.24 kernel, and it also returned the same kind of error. I'm not sure what kind of chipset it uses, but I do know it was a Maxtor hard drive.
Comment 13 drhaun88 2008-10-03 07:47:24 UTC
a SEMB device ignored error, that is
Comment 14 Tejun Heo 2008-10-03 17:22:50 UTC
How do you know it was a SEMB problem?  There are myriads of different ways hard drive detection can go wrong and this is the first one reporting SEMB problem, so I doubt it's a wide spread problem.  Is it possible for you to take out the hard drive and bring it to another machine with different chipset and see whether the same problem exists?

Thanks.
Comment 15 Tejun Heo 2008-10-03 17:30:14 UTC
drhaun88, your drive is aborting every IO commands it's receiving.  Looks like a dying drive to me?  Is the drive usable?  Can you please boot a live CD and run "smartctl -a /dev/sdX" and report the result?
Comment 16 drhaun88 2008-10-08 10:51:12 UTC
(In reply to comment #14)
> How do you know it was a SEMB problem?  There are myriads of different ways
> hard drive detection can go wrong and this is the first one reporting SEMB
> problem, so I doubt it's a wide spread problem.  Is it possible for you to
> take
> out the hard drive and bring it to another machine with different chipset and
> see whether the same problem exists?
> 
> Thanks.
> 

I was able to borrow a hard drive from a friend, a 750GB SATA II Samsung HD753LJ, which was read just fine by a LiveCD using a 2.6.24 kernel. I'm also attaching the Dmesg output as well.
Comment 17 drhaun88 2008-10-08 10:52:49 UTC
Created attachment 18216 [details]
Ubuntu LiveCD with Samsung hard drive
Comment 18 drhaun88 2008-10-08 12:32:28 UTC
(In reply to comment #15)
> drhaun88, your drive is aborting every IO commands it's receiving.  Looks
> like
> a dying drive to me?  Is the drive usable?  Can you please boot a live CD and
> run "smartctl -a /dev/sdX" and report the result?
> 

I ran a self-test on one of my hard drives, and and attached are the results. I don't think my drives are dying, however. They don't make any abnormal noises, and they work nearly flawlessly with my current Windows and Linux installs. The only symptom I've noticed is that they won't be read by any Linux kernel later than and including version 2.6.24
Comment 19 drhaun88 2008-10-08 12:33:07 UTC
Created attachment 18219 [details]
Smart test result
Comment 20 Tejun Heo 2008-10-08 14:44:10 UTC
(In reply to comment #18)
> (In reply to comment #15)
> > drhaun88, your drive is aborting every IO commands it's receiving.  Looks
> like
> > a dying drive to me?  Is the drive usable?  Can you please boot a live CD
> and
> > run "smartctl -a /dev/sdX" and report the result?
> > 
> 
> I ran a self-test on one of my hard drives, and and attached are the results.
> I
> don't think my drives are dying, however. They don't make any abnormal
> noises,
> and they work nearly flawlessly with my current Windows and Linux installs.
> The
> only symptom I've noticed is that they won't be read by any Linux kernel
> later
> than and including version 2.6.24

Aieee... That comment (#15) was directed at Lon Ingram.  Sorry about the confusion.
Comment 21 Tejun Heo 2008-10-08 14:45:07 UTC
Can you please post the result of "lspci -nn"?
Comment 22 Lon Ingram 2008-10-08 14:51:52 UTC
I've already RMAed both drives, unfortunately.  As I mentioned above and in the Ubuntu bug, I experienced the exact same symptoms on two identical, brand-new drives.  Neither drive was usable on any distro that I tried.
Comment 23 Tejun Heo 2008-10-08 14:56:06 UTC
Can you please still post the result of "lspci -nn"?
Comment 24 Lon Ingram 2008-10-08 14:59:25 UTC
Created attachment 18228 [details]
results of lspci -nn
Comment 25 Tejun Heo 2008-10-08 15:07:48 UTC
Aieee... I was asking drhaun this time.  :-)  Anyways, I agree those drives should have been RMAed, so did you get drives of different model or ones of the same model?
Comment 26 Lon Ingram 2008-10-08 15:13:20 UTC
(In reply to comment #25)
> Aieee... I was asking drhaun this time.  :-)  Anyways, I agree those drives
> should have been RMAed, so did you get drives of different model or ones of
> the
> same model?
> 

I picked up a Hitachi at Fry's and installed Ubuntu and Fedora 10b on it with no problem.  I'll probably get another of those.  I don't plan to buy that model of WD again.
Comment 27 drhaun88 2008-10-08 20:52:38 UTC
Created attachment 18231 [details]
lspci -nn output

Here's my lspci -nn output
Comment 28 Tejun Heo 2008-10-13 00:50:17 UTC
Thanks.  It seems the drive is reporting 69/96 signature for some reason.  I'll ask around.  Thanks.
Comment 29 Tejun Heo 2008-10-13 00:57:11 UTC
Oops, make that 3c/c3.
Comment 30 Tejun Heo 2008-10-13 00:58:27 UTC
Any chance you can sell the drive to me so that I can play with it myself?  I wanna try it on different controllers.  I can pay for the shipping cost + replacement cost via paypal.  Thanks.
Comment 31 drhaun88 2008-10-16 20:18:11 UTC
(In reply to comment #30)
> Any chance you can sell the drive to me so that I can play with it myself?  I
> wanna try it on different controllers.  I can pay for the shipping cost +
> replacement cost via paypal.  Thanks.
> 

Possibly. I've actually got 2 of the drives. I'd be willing to sell you both of them, since if I kept one, Linux wouldn't be able to read off of it anyway. It might be a bit, since I'd have to order the drive and find the time to reinstall everything. My other question would be one of price. They were about $60 each when I bought them 2 years ago. I'd be looking at a 320GB replacement, which would be about $60, so might $60 for both drives be reasonable?
Comment 32 Juan Manuel 2008-10-21 05:30:50 UTC
i´ve the same problem.
Comment 33 Tejun Heo 2008-10-21 20:14:33 UTC
Yeap, $60 + shipping sounds fair enough.  I'll send my shipping address to you via email.  Thanks.
Comment 34 drhaun88 2008-10-21 20:42:36 UTC
(In reply to comment #33)
> Yeap, $60 + shipping sounds fair enough.  I'll send my shipping address to
> you
> via email.  Thanks.
> 

Ok, I'll send it off as soon as I can. It probably might be a bit, though, since I  still have to buy a replacement hard drive and transfer all my data, but I shall do so as quickly as possible.
Comment 35 Lars Wirzenius 2009-04-06 15:36:22 UTC
I have four WD SATA hard disks (WD2500AAJS) that exhibit the "SEMB device ignored" problem. If I change the code to return ATA_DEV_ATA, the disks work. Other SATA disks do work, using the same controller (on motherboard), and even the same cables, so I assume it is not the controller but the disks. I'd be glad to provide additional testing or information, since I really want these disks to work.
Comment 36 Lars Wirzenius 2009-04-06 15:38:20 UTC
Created attachment 20831 [details]
lspci -nn output (Asus P5K motherboard)
Comment 37 Tejun Heo 2009-04-14 10:36:53 UTC
Created attachment 20970 [details]
libata-workaround-semb-sig.patch.eml

The attached patch works around the problem.  Patch posted upstream.
Comment 38 Tejun Heo 2009-04-14 10:37:13 UTC
Resolving as CODE_FIX.  Thanks.
Comment 39 Tejun Heo 2009-04-14 21:22:55 UTC
Created attachment 20986 [details]
libata-workaround-semb-sig.patch

The patch had a stupid bug.  Updated version attached.
Comment 40 Lars Wirzenius 2009-05-18 17:10:29 UTC
The patch works, and (some variant of it) has been included in the Ubuntu jaunty-proposed kernel, which I'm running on my desktop.

Note You need to log in before you can comment on or make changes to this bug.