Bug 6093

Summary: ieee1394 storage disconnects abruptly
Product: Drivers Reporter: Kevin (kevkim55)
Component: IEEE1394Assignee: Stefan Richter (stefanr)
Status: REJECTED DUPLICATE    
Severity: normal CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-15 Subsystem:
Regression: --- Bisected commit-id:
Attachments: syslog

Description Kevin 2006-02-18 04:43:28 UTC
Most recent kernel where this bug did not occur: 2.6.13-15
Distribution: SuSE 10.0
Hardware Environment: Intel P4M, Texas Instruments ieee1394 link on BenQ 
JoyBook DH3000
Software Environment:
Problem Description: External storage connected thru'  ieee1394 disconnects 
unexpectedly.

Steps to reproduce:
A Maxtor 300 gigs HD connected by means of ieee1394. Of the 11 partitions, two 
reiserfs partitions are mounted by Hotplug which triggers a custom script 
whenever the device is powered/connected. The rest are NTFS, which are 
accessed thru' autofs. 
Whenever I access this disk for say compiling a package on one of the reiserfs 
partitions or to read from or write to the command that I used be it make, cp 
or whatever just hangs. Doing a 'cat /proc/scsi/scsi' tells me that the device 
Maxtor OneTouch II is attached. But accessing device nodes under dev directory 
(/dev/sda[1-11]) fails because, these nodes get removed for a reason I don'e 
understand and thatz why I'm filing this bug report. 
The only option to get this device back online is to reboot as unloading and 
re-inserting the scsi_mod,sd_mod,ieee1394,sbp2 modules don't do the trick I 
expect nor repowering/reconnecting the device brings it online. 
This happens most of the time, if not all the time. I can tell one thing 
though, the times when the storage device worked without getting disconnected, 
there was nothing common. OK, what I'm trying to say is that this might not be 
reproducible under the exact same conditions. Sometimes it works fine but, 
most of the times it just vanishes.
Comment 1 Stefan Richter 2006-02-18 06:48:59 UTC
Please clarify which is the last kernel version that works for you (test this
kernel again to make sure it is not a hardware fault) and which kernel version
fails.

Provide /var/log/syslog (or whereever your distro logs kernel messages) from a
failing session.
Comment 2 Andrew Morton 2006-02-18 12:45:19 UTC

Begin forwarded message:

Date: Sat, 18 Feb 2006 04:43:33 -0800
From: bugme-daemon@bugzilla.kernel.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 6093] New: ieee1394 storage disconnects abruptly


http://bugzilla.kernel.org/show_bug.cgi?id=6093

           Summary: ieee1394 storage disconnects abruptly
    Kernel Version: 2.6.13-15
            Status: NEW
          Severity: normal
             Owner: io_scsi@kernel-bugs.osdl.org
         Submitter: kevkim55@yahoo.com


Most recent kernel where this bug did not occur: 2.6.13-15
Distribution: SuSE 10.0
Hardware Environment: Intel P4M, Texas Instruments ieee1394 link on BenQ 
JoyBook DH3000
Software Environment:
Problem Description: External storage connected thru'  ieee1394 disconnects 
unexpectedly.

Steps to reproduce:
A Maxtor 300 gigs HD connected by means of ieee1394. Of the 11 partitions, two 
reiserfs partitions are mounted by Hotplug which triggers a custom script 
whenever the device is powered/connected. The rest are NTFS, which are 
accessed thru' autofs. 
Whenever I access this disk for say compiling a package on one of the reiserfs 
partitions or to read from or write to the command that I used be it make, cp 
or whatever just hangs. Doing a 'cat /proc/scsi/scsi' tells me that the device 
Maxtor OneTouch II is attached. But accessing device nodes under dev directory 
(/dev/sda[1-11]) fails because, these nodes get removed for a reason I don'e 
understand and thatz why I'm filing this bug report. 
The only option to get this device back online is to reboot as unloading and 
re-inserting the scsi_mod,sd_mod,ieee1394,sbp2 modules don't do the trick I 
expect nor repowering/reconnecting the device brings it online. 
This happens most of the time, if not all the time. I can tell one thing 
though, the times when the storage device worked without getting disconnected, 
there was nothing common. OK, what I'm trying to say is that this might not be 
reproducible under the exact same conditions. Sometimes it works fine but, 
most of the times it just vanishes.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 3 Kevin 2006-02-24 11:03:21 UTC
Created attachment 7467 [details]
syslog

Please find the syslog attached. 
I have been using USB interface to connect to the drive in question. Only after
installing SUSE did I switch over to the ieee1394. I do use this disk in WinXP
and never had any porblem with it. So I believe there's something amiss with
the linux ieee1394 driver. 
NB: The problem just got worse. Few secs after accesing the disk I get an error
message which is as under:
ieee1394: sbp2 command error.
Fearing that I would lose the data on this disk, I have fallen back to the USB
interface which is working flawlessly.
Comment 4 Stefan Richter 2006-02-25 14:21:48 UTC
There are no messages from ieee1394, ohci1394 or pcilynx, sbp2, scsi_mod, sd_mod
in your log. Please test the disk with a read-only mounted filesystem. This will
certainly let you reproduce the problems but will protect the filesystem from
corruption. I need a syslog with messages from sbp2 and the other relevant
drivers to understand the nature of the problem.

BTW, to be able to recover from offlined devices after sbp2 errors without
reboot, you need a kernel based on 2.6.13.4 or later. I don't know if SuSE's
kernel 2.6.13-15 contains 2.6.13.4's bugfixes for sbp2 module unloading.

Thanks for assisting to (hopefully) fix this.
Comment 5 Kevin 2006-04-01 21:15:34 UTC
After I received these sbp2 errors, I'd switched over to usb interface. 
Lately, external disk stopped working due to some hardware faults that were 
developed for mysterious reason (First thing that comes to my mind is 
manufacturing defect !). I'd to pull out the disk from the casing and use a 
USB to IDE adapter which worked for a couple of days and then simply trashed 
my disk. Spent a couple of hours trying to recover and the adapter stopped 
funtioning ! I'm totally pissed off and given up the recovery. I'd have to buy 
another caddy enclosure for the disk to try and reproduce these errors and I 
don't see it happening before July. 

Thanks for your help following up.
Comment 6 Stefan Richter 2006-04-02 01:58:21 UTC
Are you sure it's the interface hardware and not the disk which is faulty? If
your distribution keeps old syslogs around, you could search them for SCSI
errors, for example "sense key: Medium Error", which may give an idea about the
kind of fault.
Comment 7 Stefan Richter 2006-04-02 03:00:14 UTC
PS: The logs reside usually in /var/log/.
Comment 8 Kevin 2006-04-02 04:08:49 UTC
The logs contain no relevant information that could point at the problem. Like 
I said, the USB to IDE controller is to blame for trashing my disk as it in 
someway failed to translate/convert the command/data to the disk. Directly 
reading the contents of the disk using dd revealed something very unusual and 
weird. The data that was written to the disk was actually written at an offset 
i.e. The MBR didn't start at the beginning of the sector but, at a few bytes 
offset. Or atleast, this is how the controller forwarded data to the OS. As 
MBR was offset by a few bytes, linux would show the disk as not containing any 
partitions. Writing the correct MBR from backup didn't work as expected. To 
get the online again, I'd to turn off the power to the external disk, 
disconnect USB cable and then power on the disk and connect the USB cable. 
Interestingly, doing dd on MBR would fetch the correct MBR and the disk would 
right for a few minutes then it'd just become unavailable ! Just to make sure 
it is not the disk itself which I dread the most and hate to accept, I plugged 
another IDE disk to the "USB to IDE controller". Nope, it didn't work. Thus I 
draw my conclusion that, it is the controller which malfunctioned and the disk 
itself though corrupt is allright.
The above doesn't answer the sbp2 command errors as, when I received these 
sbp2 errors, the disk inside its proprietary enclosure and was working fine 
under WinXP.
Although, what I'm trying to ask is off topic, I'm just curious to know - What 
driver would I require to operate the USB disk ? The driver that is meant for 
the IDE chip present on the disk or the SATA controller chip inside the "USB 
to IDE adapter" ?  The adapter came with a driver CD which'd drivers for Win9x 
and claimed that no drivers are required for WinXP. The problem described 
above - MBR and other data being written with an offset was present in WinXP 
too. I installed a driver meant for Win2K which had the disk working allright 
for few hours when it started again with, the above described symptoms. The 
manual also, said that the product should work with linux and no drivers 
required. A bit of googling and  I learned from the chip manufaturers "Silicon 
Image" website that, one needs to compile SATA driver for the SiI18x chipset 
that comes with standard kernel distribution. I haven't tried this one yet and 
I don't feel encouraged to go ahead and try that. If the partition table read 
from the disk is not the correct partition table, how can the driver for SiI 
chip is going to help ? Does it translate the data or does it pass the 
right/required commands to controller chip that'd make it read/pass the data 
in the required format or whatever ? Does the chip driver achieves what the 
chip itself cannot achieve ? I'm dead sure, I just fell victim to one of those 
cheap/fake ill architectureed and crappy controllers and crapped my data. I 
think, I'd buy Vantec's external enclosure and like I said earlier it is not 
gonna happen before June.
Yep, the logs didn't say nothing useful about the whole scenario that occured 
when I connected the disk thru' this controller I mentioned about.I'll go back 
and check the entire contents of logs to see if those messages I received in 
the past are still there.
Thanks again for your help.
Comment 9 Stefan Richter 2006-04-02 05:01:42 UTC
> The above doesn't answer the sbp2 command errors as, when I received
> these sbp2 errors, the disk inside its proprietary enclosure and was
> working fine under WinXP.

Perhaps it was the same as in bug 1872 which is alas a widely observed problem
of sbp2. I am slowly working on it.

> Although, what I'm trying to ask is off topic, I'm just curious to
> know - What driver would I require to operate the USB disk ?

A driver for the USB host controller (e.g. ehci-hcd for USB 2.0) and the driver
for USB storage devices (usb-storage, which requires scsi_mod and other SCSI
drivers like sd_mod). Similar to SBP-2 for FireWire, there is only one protocol
for storage devices on USB which is why these devices can be used with built-in
drivers of all PC OSs. However there are many different flavors of buggy
implementations of USB storage devices. Therefore Linux' drivers contain an
impressive range of device-specific workarounds. But it is a cat-and-mouse game
to get all required workarounds figured out and shipped with new Linux releases.

Comment 10 Stefan Richter 2006-04-23 10:46:20 UTC
Judging from the initial description I assume this bug being the same as bug 1872.

*** This bug has been marked as a duplicate of 1872 ***