Most recent kernel where this bug did not occur: Unknown Distribution: gentoo Hardware Environment: Amd64 platform, Nforce3 based serial ata chipset Software Environment: Problem Description: I've installed a gigabyte i-ram device. This device emulates (in hardware) a sata hard-drive. Here is the product page: http://www.gigabyte.com.tw/Products/Storage/Default.aspx Anyway, it shows up in the BIOS as a regular hard drive. I did a little bit of debugging and found that in ata_dev_try_classify, the code: /* see if device passed diags */ if (err == 1) /* do nothing */ ; else if ((device == 0) && (err == 0x81)) /* do nothing */ ; else return err; Always hits the "else" condition and returns 0 without attempting to classify the drive. I think what might have been intended was: /* see if device passed diags */ if ( ( err != 0 && err != 1 ) && !((device == 0) && (err == 0x81)) ) return err; This will cause the function to return early if error is not 0, not 1 and if the (device == 0 && err == 0x81) condition is not true. I changed the code to this new set of conditions, and it correctly detects the i-ram device now. I don't know much about kernel development, and even less about sata hardware drivers, so it's quite probable that code change I've suggested is incorrect. I'm willing to test any changes that a real kernel developer makes to address this problem. Steps to reproduce: Well, I would image this could occur with other devices, but the i-ram is the only sata drive that I own.
bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=6163 > > Summary: serial ata fails to detect gigabyte i-ram (GC Ramdisk) > Kernel Version: 2.6.15 > Status: NEW > Severity: normal > Owner: jgarzik@pobox.com > Submitter: millerju@gmail.com > > > Most recent kernel where this bug did not occur: Unknown > > Distribution: gentoo > > Hardware Environment: Amd64 platform, Nforce3 based serial ata chipset > > Software Environment: > > Problem Description: > > I've installed a gigabyte i-ram device. This device emulates (in hardware) a > sata hard-drive. Here is the product page: > > http://www.gigabyte.com.tw/Products/Storage/Default.aspx > > Anyway, it shows up in the BIOS as a regular hard drive. > > I did a little bit of debugging and found that in ata_dev_try_classify, the code: > > /* see if device passed diags */ > if (err == 1) > /* do nothing */ ; > else if ((device == 0) && (err == 0x81)) > /* do nothing */ ; > else > return err; > > Always hits the "else" condition and returns 0 without attempting to classify > the drive. I think what might have been intended was: > /* see if device passed diags */ > if ( ( err != 0 && err != 1 ) && > !((device == 0) && (err == 0x81)) ) > return err; > > This will cause the function to return early if error is not 0, not 1 and if the > (device == 0 && err == 0x81) condition is not true. > > I changed the code to this new set of conditions, and it correctly detects the > i-ram device now. > > I don't know much about kernel development, and even less about sata hardware > drivers, so it's quite probable that code change I've suggested is incorrect. > I'm willing to test any changes that a real kernel developer makes to address > this problem. > > > Steps to reproduce: > > Well, I would image this could occur with other devices, but the i-ram is the > only sata drive that I own. > AFACIT this problem is stil in there, and it's due to this device's ->tf_read() unexpectedly returning zero in tf.feature. (Perhaps it never filled it in at all). Is this a kernel bug, or a device bug? If it's a device bug, is Justin's proposal to accept a zero in tf.feature acceptable?
On Thu, Mar 16, 2006 at 03:03:09PM -0800, Andrew Morton wrote: > bugme-daemon@bugzilla.kernel.org wrote: > > > > http://bugzilla.kernel.org/show_bug.cgi?id=6163 > > > > Summary: serial ata fails to detect gigabyte i-ram (GC Ramdisk) > > Kernel Version: 2.6.15 > > Status: NEW > > Severity: normal > > Owner: jgarzik@pobox.com > > Submitter: millerju@gmail.com > > > > AFACIT this problem is stil in there, and it's due to this device's > ->tf_read() unexpectedly returning zero in tf.feature. (Perhaps it never > filled it in at all). > > Is this a kernel bug, or a device bug? > It is a device bug. The device is actually telling us that it has failed internal diagnostics and unuseable. > If it's a device bug, is Justin's proposal to accept a zero in tf.feature > acceptable? I think it is as we have plenty of other protections including class signature and whole configuration process including IDENTIFYing and all those. In addition, some drivers (AHCI/sil24) accidentally aren't checking for diagnostic result in tf.feature (the Error register) after reset and we have yet to hear any complaints about that. But, then again, this i-ram thingie seems to be the first ATA device EVER to report incorrect diagnostic result. So, both sides of arguments seem to be in pretty good balance. Jeff, what do you think?
Reply-To: jeff@garzik.org Andrew Morton wrote: > AFACIT this problem is stil in there, and it's due to this device's > ->tf_read() unexpectedly returning zero in tf.feature. (Perhaps it never > filled it in at all). > > Is this a kernel bug, or a device bug? > > If it's a device bug, is Justin's proposal to accept a zero in tf.feature > acceptable? It would be useful to dump the entire taskfile to see if its all zero. I would REALLY rather not blindly accept a device that's reporting its failed diagnostics (tf.feature == 0). I've got one of these i-ram cards on order (they sound damned spiffy!), and so should be able to see how it behaves. Jeff
Jeff Garzik wrote: > Andrew Morton wrote: >> AFACIT this problem is stil in there, and it's due to this device's >> ->tf_read() unexpectedly returning zero in tf.feature. (Perhaps it never >> filled it in at all). >> >> Is this a kernel bug, or a device bug? >> >> If it's a device bug, is Justin's proposal to accept a zero in tf.feature >> acceptable? > > It would be useful to dump the entire taskfile to see if its all zero. Jure reported the entire TF in previous thread[1], which looked like... ata_dev_try_classify: dev=0, TF 00 00:01:01:00:00 a0 50 Where, TF is ordered as 'ctl feat:nsect:lbal:lbam:lbah dev cmd', so the feature register is zero. > > I would REALLY rather not blindly accept a device that's reporting its > failed diagnostics (tf.feature == 0). I can't think of any simple workaround currently other than ignoring feature. Oh and just to remind you, AHCI and sil24 don't check diagnostic code currently. > > I've got one of these i-ram cards on order (they sound damned spiffy!), > and so should be able to see how it behaves. > Cool, although I've been always a bit skeptical about those dram-masquerading-as-disk devices (from PATA days), they seem to me as resource spent in the wrong place. But, yeah, cool. :-) Maybe it's high time to add some standard mechanism to control queue plugging from LLDD as NCQ and cheap SSD devices are becoming popular. Thanks.
Suse Linux detects the I-RAM correctly if I use it through a 3ware 8506-4 SATA RAID controller. If I use my Tyan MOBO's native SATA controller, Linux does not see it.
I tried both of the suggestions above. Neither worked for me. The first suggestion did nothing, The second, it seems just confused the "placement??/location??" subsystem. In effect, the 2.6.16.5 kernel saw my 3ware 9550SX-4 twice as sda, and sdb. I am using a Tyan S2895 with the Nvidia chipset.
I just received my gigabyte i-RAM in the mail, after ages on backorder from CDW. I verified on sata_sil that the i-RAM does not appear under 2.6.17-rc3, but does appear if I change the ata_dev_try_classify() code as described in the initial comment. However, I am seeing massive data corruption. "mke2fs -cc /dev/sdc1" fails to complete. fdisk works, and successfully retains data, but "mke2fs" followed immediately by "e2fsck" shows errors. "mke2fs" followed by "mount" and "iozone" likewise indicate the data is garbage. Its possible my RAM is bad, so not draw any conclusions just yet...
I've also noticed data corruptions. I've tried different filesystems (just to see if that made a difference), but nothing so far has helped. I'm not sure that it's an error with the card or with the ram, it could also be some driver issue. Sometime this week I plan on pulling the ram out, and putting it into my motherboard and running the memtest86 checks on it. That should tell me whether or not the ram is bad.
Don't bother switching out the RAM. When I was using it on my 3ware RAID controller, I was getting data corruption also.
I recently purchased such a card (Gigabyte i-RAM 1.3) too and had difficulties getting it to work (one of my PCs even locks hard during BIOS POST as long as the RAM disk is attached to the internal SATA controller). I just got a Vantec UGT-ST200 SATA adaptor card, with which I can use the RAM disk in Windows XP. I then applied the hack listed earlier to vanilla linux 2.6.18-rc4 and am now able to use it in Linux too. I haven't experienced any FS corruption yet, even though I threw a bunch of tests at it, including exploding tar files in a loop, postmark and bonnie++ tests, some of them simultaneously. I verified also that it would retain the content while the host was off (~10h on PCI stand-by power only + ~0.5h off the grid). So far it looks solid. Guenther
Have a card here and am playing now. My take is that we can both handle the tf feature problem *and* educate the vendor. if (tf.feature ==0) { printk(KERN_ERR "%s: device reports internal diagnostics failure. This may indicate a real drive fault or a faulty SATA emulation. Contact your SATA device vendor for advice.\n", ...) Should cause the desired pain in the ideal places.
Created attachment 9996 [details] experience with I-RAM and several raid controllers There are compatibility issues between certain SATA controller and the I-RAM card - the problem may not be fixable from the Linux side. The original poster of this report could perhaps try to confirm that the I-RAM is visible to another operating system.
Sorry, I only have linux on my machine. I've finally finished my degree, so I should have some free time to test things out again. It's been a while, maybe things have fixed themselves? (probably not, but I can always hope)
I've just tested my i-Ram again. Under kernel 2.6.19-gentoo-r3 the card is automatically detected. However, I still have file system corruptions. I've tried ext2 and vfat, both of them become corrupt after a short period of use. Running the badblocks program on the drive results in a huge number of blocks being returned as bad (522950). The output looks something like this: 2261737 3669984 3669985 3669986 3669987 ..... 4192929 4192930 4192931 4192932
I was just able to successfully install Gentoo 2007.0 from live disk using command line. I did have to manually partition the iram before installation, and install to only the iram instead of electing to have additional partitions upon install (these may be gentoo issues). Kernel used was 2.6.19-r5 emerged from the live cd installation. dvd 2007.0 did not install for some reason, and kept giving errors.
Make sure you have up to date firmware. We now handle the failed detect due to poor emulation
Great! What kernel version will this patch be represented in?
It's been upstream for a little while... not sure which kernel version.
The i-Ram seems to be working great in recent kernels (>= 2.6.18 I believe). I have been using a i-Ram as my main system disk for a few months now without a hitch. For those of you having problems with filesystem corruption etc. I suggest trying RAM from Gigabytes "Recommended Memory List" (http:// www.gigabyte.com.tw/Support/Peripherals/ FAQ_Model.aspx?FAQID=2152&ProductID=2076). I went with two Kingston KVR400X64C3A/1G which worked straight out of the box. As per this post: http:// www.nerdnos.net/?newsarchives it seems the i-Ram runs into some trouble when trying to use "high density dimms", but I can't find any more info on this matter.
Just as the lst post stated, yeah, I went with Kingston Hyper-X RAM as I know Kingston to be rather standard when it comes to memory, and often well suited for applications such as this. Yes, kernels 2.9.19-R3+ seem to have the ability to detect this device. Watch out for the Gentoo 2007.0 LiveCD installation though, there are a number of bad e-builds on there that need work, and I've got several tickets open with them. May have already been patched and new e-builds available at sources.gentoo.org so when the install pops up that failure screen, you can replace the .ebuild file with the one from the cvs online, and it will install just fine - just takes longer than it should. I'd like to see other flavors of linux jump on the 2.9.18-R2+ band wagon, as Gentoo presently is the only build that contains this kernel, therefore detects it at install time as a writable device.
I switched my ram to the kingston ram listed on the manufacturers web page. It appears to be working fine now.