Bug 6163 - serial ata fails to detect gigabyte i-ram (GC Ramdisk)
Summary: serial ata fails to detect gigabyte i-ram (GC Ramdisk)
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Jeff Garzik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-04 09:11 UTC by Justin Miller
Modified: 2007-06-15 19:45 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.17-rc3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
experience with I-RAM and several raid controllers (180 bytes, text/html)
2007-01-02 16:07 UTC, Daniel Feenberg
Details

Description Justin Miller 2006-03-04 09:11:43 UTC
Most recent kernel where this bug did not occur: Unknown

Distribution: gentoo

Hardware Environment: Amd64 platform, Nforce3 based serial ata chipset

Software Environment: 

Problem Description:

I've installed a gigabyte i-ram device.  This device emulates (in hardware) a
sata hard-drive.  Here is the product page:

http://www.gigabyte.com.tw/Products/Storage/Default.aspx

Anyway, it shows up in the BIOS as a regular hard drive.  

I did a little bit of debugging and found that in ata_dev_try_classify, the code:

	/* see if device passed diags */
	if (err == 1)
		/* do nothing */ ;
	else if ((device == 0) && (err == 0x81))
		/* do nothing */ ;
	else
		return err;

Always hits the "else" condition and returns 0 without attempting to classify
the drive.  I think what might have been intended was:
	/* see if device passed diags */
        if ( ( err != 0 && err != 1 ) && 
            !((device == 0) && (err == 0x81)) )
          return err;

This will cause the function to return early if error is not 0, not 1 and if the
(device == 0 && err == 0x81) condition is not true.

I changed the code to this new set of conditions, and it correctly detects the
i-ram device now.

I don't know much about kernel development, and even less about sata hardware
drivers, so it's quite probable that code change I've suggested is incorrect. 
I'm willing to test any changes that a real kernel developer makes to address
this problem.


Steps to reproduce:

Well, I would image this could occur with other devices, but the i-ram is the
only sata drive that I own.
Comment 1 Andrew Morton 2006-03-16 15:00:53 UTC
bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6163
> 
>            Summary: serial ata fails to detect gigabyte i-ram (GC Ramdisk)
>     Kernel Version: 2.6.15
>             Status: NEW
>           Severity: normal
>              Owner: jgarzik@pobox.com
>          Submitter: millerju@gmail.com
> 
> 
> Most recent kernel where this bug did not occur: Unknown
> 
> Distribution: gentoo
> 
> Hardware Environment: Amd64 platform, Nforce3 based serial ata chipset
> 
> Software Environment: 
> 
> Problem Description:
> 
> I've installed a gigabyte i-ram device.  This device emulates (in hardware) a
> sata hard-drive.  Here is the product page:
> 
> http://www.gigabyte.com.tw/Products/Storage/Default.aspx
> 
> Anyway, it shows up in the BIOS as a regular hard drive.  
> 
> I did a little bit of debugging and found that in ata_dev_try_classify, the code:
> 
> 	/* see if device passed diags */
> 	if (err == 1)
> 		/* do nothing */ ;
> 	else if ((device == 0) && (err == 0x81))
> 		/* do nothing */ ;
> 	else
> 		return err;
> 
> Always hits the "else" condition and returns 0 without attempting to classify
> the drive.  I think what might have been intended was:
> 	/* see if device passed diags */
>         if ( ( err != 0 && err != 1 ) && 
>             !((device == 0) && (err == 0x81)) )
>           return err;
> 
> This will cause the function to return early if error is not 0, not 1 and if the
> (device == 0 && err == 0x81) condition is not true.
> 
> I changed the code to this new set of conditions, and it correctly detects the
> i-ram device now.
> 
> I don't know much about kernel development, and even less about sata hardware
> drivers, so it's quite probable that code change I've suggested is incorrect. 
> I'm willing to test any changes that a real kernel developer makes to address
> this problem.
> 
> 
> Steps to reproduce:
> 
> Well, I would image this could occur with other devices, but the i-ram is the
> only sata drive that I own.
> 

AFACIT this problem is stil in there, and it's due to this device's
->tf_read() unexpectedly returning zero in tf.feature.  (Perhaps it never
filled it in at all).

Is this a kernel bug, or a device bug?

If it's a device bug, is Justin's proposal to accept a zero in tf.feature
acceptable?


Comment 2 Tejun Heo 2006-03-17 00:12:40 UTC
On Thu, Mar 16, 2006 at 03:03:09PM -0800, Andrew Morton wrote:
> bugme-daemon@bugzilla.kernel.org wrote:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=6163
> > 
> >            Summary: serial ata fails to detect gigabyte i-ram (GC Ramdisk)
> >     Kernel Version: 2.6.15
> >             Status: NEW
> >           Severity: normal
> >              Owner: jgarzik@pobox.com
> >          Submitter: millerju@gmail.com
> > 
> 
> AFACIT this problem is stil in there, and it's due to this device's
> ->tf_read() unexpectedly returning zero in tf.feature.  (Perhaps it never
> filled it in at all).
> 
> Is this a kernel bug, or a device bug?
> 

It is a device bug.  The device is actually telling us that it has
failed internal diagnostics and unuseable.

> If it's a device bug, is Justin's proposal to accept a zero in tf.feature
> acceptable?

I think it is as we have plenty of other protections including class
signature and whole configuration process including IDENTIFYing and
all those.  In addition, some drivers (AHCI/sil24) accidentally aren't
checking for diagnostic result in tf.feature (the Error register)
after reset and we have yet to hear any complaints about that.

But, then again, this i-ram thingie seems to be the first ATA device
EVER to report incorrect diagnostic result.  So, both sides of
arguments seem to be in pretty good balance.  Jeff, what do you think?

Comment 3 Anonymous Emailer 2006-03-20 17:47:43 UTC
Reply-To: jeff@garzik.org

Andrew Morton wrote:
> AFACIT this problem is stil in there, and it's due to this device's
> ->tf_read() unexpectedly returning zero in tf.feature.  (Perhaps it never
> filled it in at all).
> 
> Is this a kernel bug, or a device bug?
> 
> If it's a device bug, is Justin's proposal to accept a zero in tf.feature
> acceptable?

It would be useful to dump the entire taskfile to see if its all zero.

I would REALLY rather not blindly accept a device that's reporting its 
failed diagnostics (tf.feature == 0).

I've got one of these i-ram cards on order (they sound damned spiffy!), 
and so should be able to see how it behaves.

	Jeff


Comment 4 Tejun Heo 2006-03-21 03:38:42 UTC
Jeff Garzik wrote:
> Andrew Morton wrote:
>> AFACIT this problem is stil in there, and it's due to this device's
>> ->tf_read() unexpectedly returning zero in tf.feature.  (Perhaps it never
>> filled it in at all).
>>
>> Is this a kernel bug, or a device bug?
>>
>> If it's a device bug, is Justin's proposal to accept a zero in tf.feature
>> acceptable?
> 
> It would be useful to dump the entire taskfile to see if its all zero.

Jure reported the entire TF in previous thread[1], which looked like...

ata_dev_try_classify: dev=0, TF 00 00:01:01:00:00 a0 50

Where, TF is ordered as 'ctl feat:nsect:lbal:lbam:lbah dev cmd', so the 
feature register is zero.

> 
> I would REALLY rather not blindly accept a device that's reporting its 
> failed diagnostics (tf.feature == 0).

I can't think of any simple workaround currently other than ignoring 
feature. Oh and just to remind you, AHCI and sil24 don't check 
diagnostic code currently.

> 
> I've got one of these i-ram cards on order (they sound damned spiffy!), 
> and so should be able to see how it behaves.
> 

Cool, although I've been always a bit skeptical about those 
dram-masquerading-as-disk devices (from PATA days), they seem to me as 
resource spent in the wrong place. But, yeah, cool. :-)

Maybe it's high time to add some standard mechanism to control queue 
plugging from LLDD as NCQ and cheap SSD devices are becoming popular.

Thanks.

Comment 5 K. Paden 2006-04-10 08:38:18 UTC
Suse Linux detects the I-RAM correctly if I use it through a 3ware 8506-4 SATA
RAID controller. If I use my Tyan MOBO's native SATA controller, Linux does not
see it.
Comment 6 K. Paden 2006-04-25 10:14:38 UTC
I tried both of the suggestions above. Neither worked for me. The first
suggestion did nothing, The second, it seems just confused the
"placement??/location??" subsystem. In effect, the 2.6.16.5 kernel saw my 3ware
9550SX-4 twice as sda, and sdb. I am using a Tyan S2895 with the Nvidia chipset.
Comment 7 Jeff Garzik 2006-04-28 16:14:47 UTC
I just received my gigabyte i-RAM in the mail, after ages on backorder from CDW.

I verified on sata_sil that the i-RAM does not appear under 2.6.17-rc3, but does
appear if I change the ata_dev_try_classify() code as described in the initial
comment.

However, I am seeing massive data corruption.  "mke2fs -cc /dev/sdc1" fails to
complete.  fdisk works, and successfully retains data, but "mke2fs" followed
immediately by "e2fsck" shows errors.  "mke2fs" followed by "mount" and "iozone"
likewise indicate the data is garbage.

Its possible my RAM is bad, so not draw any conclusions just yet...
Comment 8 Justin Miller 2006-04-29 07:04:46 UTC
I've also noticed data corruptions.  I've tried different filesystems (just to
see if that made a difference), but nothing so far has helped.  I'm not sure
that it's an error with the card or with the ram, it could also be some driver
issue.  

Sometime this week I plan on pulling the ram out, and putting it into my
motherboard and running the memtest86 checks on it.  That should tell me whether
or not the ram is bad.
Comment 9 K. Paden 2006-04-29 20:09:50 UTC
Don't bother switching out the RAM. When I was using it on my 3ware RAID
controller, I was getting data corruption also. 
Comment 10 Guenther Thomsen 2006-08-25 16:45:51 UTC
I recently purchased such a card (Gigabyte i-RAM 1.3) too and had difficulties
getting it to work (one of my PCs even locks hard during BIOS POST as long as
the RAM disk is attached to the internal SATA controller). I just got a Vantec
UGT-ST200 SATA adaptor card, with which I can use the RAM disk in Windows XP. I
then applied the hack listed earlier to vanilla linux 2.6.18-rc4 and am now able
to use it in Linux too. 

I haven't experienced any FS corruption yet, even though I threw a bunch of
tests at it, including exploding tar files in a loop, postmark and bonnie++
tests, some of them simultaneously. I verified also that it would retain the
content while the host was off (~10h on PCI stand-by power only + ~0.5h off the
grid). So far it looks solid. 

Guenther
Comment 11 Alan 2006-09-11 07:06:14 UTC
Have a card here and am playing now. My take is that we can both handle the tf
feature problem *and* educate the vendor. 

   if (tf.feature ==0) { 
     printk(KERN_ERR "%s: device reports internal diagnostics failure. This may
indicate a real drive fault or a faulty SATA emulation. Contact your SATA device
vendor for advice.\n", ...)


Should cause the desired pain in the ideal places.
Comment 12 Daniel Feenberg 2007-01-02 16:07:17 UTC
Created attachment 9996 [details]
experience with I-RAM and several raid controllers

There are compatibility issues between certain SATA controller and the I-RAM
card - the problem may not be fixable from the Linux side. The original poster
of this report could perhaps try to confirm that the I-RAM is visible to
another operating system.
Comment 13 Justin Miller 2007-01-02 17:51:51 UTC
Sorry, I only have linux on my machine.  I've finally finished my degree, so I
should have some free time to test things out again.  It's been a while, maybe
things have fixed themselves? (probably not, but I can always hope)
Comment 14 Justin Miller 2007-01-11 19:20:17 UTC
I've just tested my i-Ram again.  Under kernel 2.6.19-gentoo-r3 the card is
automatically detected.  However, I still have file system corruptions.   I've
tried ext2 and vfat, both of them become corrupt after a short period of use.

Running the badblocks program on the drive results in a huge number of blocks
being returned as bad (522950).  The output looks something like this:
2261737
3669984
3669985
3669986
3669987
.....
4192929
4192930
4192931
4192932

Comment 15 Brian Menges 2007-05-25 13:05:18 UTC
I was just able to successfully install Gentoo 2007.0 from live disk using
command line.  I did have to manually partition the iram before installation,
and install to only the iram instead of electing to have additional partitions
upon install (these may be gentoo issues).  Kernel used was 2.6.19-r5 emerged
from the live cd installation.  dvd 2007.0 did not install for some reason, and
kept giving errors.
Comment 16 Alan 2007-06-05 09:45:25 UTC
Make sure you have up to date firmware. We now handle the failed detect due to
poor emulation
Comment 17 Jeremy Cole 2007-06-05 10:11:58 UTC
Great!

What kernel version will this patch be represented in?
Comment 18 Jeff Garzik 2007-06-05 10:21:30 UTC
It's been upstream for a little while... not sure which kernel version.
Comment 19 Oddbj 2007-06-05 11:05:44 UTC
The i-Ram seems to be working great in recent kernels (>= 2.6.18 I believe). I 
have been using a i-Ram as my main system disk for a few months now without a 
hitch. For those of you having problems with filesystem corruption etc. I 
suggest trying RAM from Gigabytes "Recommended Memory List" (http://
www.gigabyte.com.tw/Support/Peripherals/
FAQ_Model.aspx?FAQID=2152&ProductID=2076). I went with two Kingston 
KVR400X64C3A/1G which worked straight out of the box. As per this post: http://
www.nerdnos.net/?newsarchives it seems the i-Ram runs into some trouble when 
trying to use "high density dimms", but I can't find any more info on this 
matter.
Comment 20 Brian Menges 2007-06-05 11:15:50 UTC
Just as the lst post stated, yeah, I went with Kingston Hyper-X RAM as I know
Kingston to be rather standard when it comes to memory, and often well suited
for applications such as this.

Yes, kernels 2.9.19-R3+ seem to have the ability to detect this device.  Watch
out for the Gentoo 2007.0 LiveCD installation though, there are a number of bad
e-builds on there that need work, and I've got several tickets open with them. 
May have already been patched and new e-builds available at sources.gentoo.org
so when the install pops up that failure screen, you can replace the .ebuild
file with the one from the cvs online, and it will install just fine - just
takes longer than it should.

I'd like to see other flavors of linux jump on the 2.9.18-R2+ band wagon, as
Gentoo presently is the only build that contains this kernel, therefore detects
it at install time as a writable device.
Comment 21 Justin Miller 2007-06-15 19:45:41 UTC
I switched my ram to the kingston ram listed on the manufacturers web page.  It appears to be working fine now.  

Note You need to log in before you can comment on or make changes to this bug.