Most recent kernel where this bug did not occur: 2.6.13-15 Distribution: SuSE 10.0 Hardware Environment: Intel P4M, Texas Instruments ieee1394 link on BenQ JoyBook DH3000 Software Environment: Problem Description: External storage connected thru' ieee1394 disconnects unexpectedly. Steps to reproduce: A Maxtor 300 gigs HD connected by means of ieee1394. Of the 11 partitions, two reiserfs partitions are mounted by Hotplug which triggers a custom script whenever the device is powered/connected. The rest are NTFS, which are accessed thru' autofs. Whenever I access this disk for say compiling a package on one of the reiserfs partitions or to read from or write to the command that I used be it make, cp or whatever just hangs. Doing a 'cat /proc/scsi/scsi' tells me that the device Maxtor OneTouch II is attached. But accessing device nodes under dev directory (/dev/sda[1-11]) fails because, these nodes get removed for a reason I don'e understand and thatz why I'm filing this bug report. The only option to get this device back online is to reboot as unloading and re-inserting the scsi_mod,sd_mod,ieee1394,sbp2 modules don't do the trick I expect nor repowering/reconnecting the device brings it online. This happens most of the time, if not all the time. I can tell one thing though, the times when the storage device worked without getting disconnected, there was nothing common. OK, what I'm trying to say is that this might not be reproducible under the exact same conditions. Sometimes it works fine but, most of the times it just vanishes.
Please clarify which is the last kernel version that works for you (test this kernel again to make sure it is not a hardware fault) and which kernel version fails. Provide /var/log/syslog (or whereever your distro logs kernel messages) from a failing session.
Begin forwarded message: Date: Sat, 18 Feb 2006 04:43:33 -0800 From: bugme-daemon@bugzilla.kernel.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 6093] New: ieee1394 storage disconnects abruptly http://bugzilla.kernel.org/show_bug.cgi?id=6093 Summary: ieee1394 storage disconnects abruptly Kernel Version: 2.6.13-15 Status: NEW Severity: normal Owner: io_scsi@kernel-bugs.osdl.org Submitter: kevkim55@yahoo.com Most recent kernel where this bug did not occur: 2.6.13-15 Distribution: SuSE 10.0 Hardware Environment: Intel P4M, Texas Instruments ieee1394 link on BenQ JoyBook DH3000 Software Environment: Problem Description: External storage connected thru' ieee1394 disconnects unexpectedly. Steps to reproduce: A Maxtor 300 gigs HD connected by means of ieee1394. Of the 11 partitions, two reiserfs partitions are mounted by Hotplug which triggers a custom script whenever the device is powered/connected. The rest are NTFS, which are accessed thru' autofs. Whenever I access this disk for say compiling a package on one of the reiserfs partitions or to read from or write to the command that I used be it make, cp or whatever just hangs. Doing a 'cat /proc/scsi/scsi' tells me that the device Maxtor OneTouch II is attached. But accessing device nodes under dev directory (/dev/sda[1-11]) fails because, these nodes get removed for a reason I don'e understand and thatz why I'm filing this bug report. The only option to get this device back online is to reboot as unloading and re-inserting the scsi_mod,sd_mod,ieee1394,sbp2 modules don't do the trick I expect nor repowering/reconnecting the device brings it online. This happens most of the time, if not all the time. I can tell one thing though, the times when the storage device worked without getting disconnected, there was nothing common. OK, what I'm trying to say is that this might not be reproducible under the exact same conditions. Sometimes it works fine but, most of the times it just vanishes. ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
Created attachment 7467 [details] syslog Please find the syslog attached. I have been using USB interface to connect to the drive in question. Only after installing SUSE did I switch over to the ieee1394. I do use this disk in WinXP and never had any porblem with it. So I believe there's something amiss with the linux ieee1394 driver. NB: The problem just got worse. Few secs after accesing the disk I get an error message which is as under: ieee1394: sbp2 command error. Fearing that I would lose the data on this disk, I have fallen back to the USB interface which is working flawlessly.
There are no messages from ieee1394, ohci1394 or pcilynx, sbp2, scsi_mod, sd_mod in your log. Please test the disk with a read-only mounted filesystem. This will certainly let you reproduce the problems but will protect the filesystem from corruption. I need a syslog with messages from sbp2 and the other relevant drivers to understand the nature of the problem. BTW, to be able to recover from offlined devices after sbp2 errors without reboot, you need a kernel based on 2.6.13.4 or later. I don't know if SuSE's kernel 2.6.13-15 contains 2.6.13.4's bugfixes for sbp2 module unloading. Thanks for assisting to (hopefully) fix this.
After I received these sbp2 errors, I'd switched over to usb interface. Lately, external disk stopped working due to some hardware faults that were developed for mysterious reason (First thing that comes to my mind is manufacturing defect !). I'd to pull out the disk from the casing and use a USB to IDE adapter which worked for a couple of days and then simply trashed my disk. Spent a couple of hours trying to recover and the adapter stopped funtioning ! I'm totally pissed off and given up the recovery. I'd have to buy another caddy enclosure for the disk to try and reproduce these errors and I don't see it happening before July. Thanks for your help following up.
Are you sure it's the interface hardware and not the disk which is faulty? If your distribution keeps old syslogs around, you could search them for SCSI errors, for example "sense key: Medium Error", which may give an idea about the kind of fault.
PS: The logs reside usually in /var/log/.
The logs contain no relevant information that could point at the problem. Like I said, the USB to IDE controller is to blame for trashing my disk as it in someway failed to translate/convert the command/data to the disk. Directly reading the contents of the disk using dd revealed something very unusual and weird. The data that was written to the disk was actually written at an offset i.e. The MBR didn't start at the beginning of the sector but, at a few bytes offset. Or atleast, this is how the controller forwarded data to the OS. As MBR was offset by a few bytes, linux would show the disk as not containing any partitions. Writing the correct MBR from backup didn't work as expected. To get the online again, I'd to turn off the power to the external disk, disconnect USB cable and then power on the disk and connect the USB cable. Interestingly, doing dd on MBR would fetch the correct MBR and the disk would right for a few minutes then it'd just become unavailable ! Just to make sure it is not the disk itself which I dread the most and hate to accept, I plugged another IDE disk to the "USB to IDE controller". Nope, it didn't work. Thus I draw my conclusion that, it is the controller which malfunctioned and the disk itself though corrupt is allright. The above doesn't answer the sbp2 command errors as, when I received these sbp2 errors, the disk inside its proprietary enclosure and was working fine under WinXP. Although, what I'm trying to ask is off topic, I'm just curious to know - What driver would I require to operate the USB disk ? The driver that is meant for the IDE chip present on the disk or the SATA controller chip inside the "USB to IDE adapter" ? The adapter came with a driver CD which'd drivers for Win9x and claimed that no drivers are required for WinXP. The problem described above - MBR and other data being written with an offset was present in WinXP too. I installed a driver meant for Win2K which had the disk working allright for few hours when it started again with, the above described symptoms. The manual also, said that the product should work with linux and no drivers required. A bit of googling and I learned from the chip manufaturers "Silicon Image" website that, one needs to compile SATA driver for the SiI18x chipset that comes with standard kernel distribution. I haven't tried this one yet and I don't feel encouraged to go ahead and try that. If the partition table read from the disk is not the correct partition table, how can the driver for SiI chip is going to help ? Does it translate the data or does it pass the right/required commands to controller chip that'd make it read/pass the data in the required format or whatever ? Does the chip driver achieves what the chip itself cannot achieve ? I'm dead sure, I just fell victim to one of those cheap/fake ill architectureed and crappy controllers and crapped my data. I think, I'd buy Vantec's external enclosure and like I said earlier it is not gonna happen before June. Yep, the logs didn't say nothing useful about the whole scenario that occured when I connected the disk thru' this controller I mentioned about.I'll go back and check the entire contents of logs to see if those messages I received in the past are still there. Thanks again for your help.
> The above doesn't answer the sbp2 command errors as, when I received > these sbp2 errors, the disk inside its proprietary enclosure and was > working fine under WinXP. Perhaps it was the same as in bug 1872 which is alas a widely observed problem of sbp2. I am slowly working on it. > Although, what I'm trying to ask is off topic, I'm just curious to > know - What driver would I require to operate the USB disk ? A driver for the USB host controller (e.g. ehci-hcd for USB 2.0) and the driver for USB storage devices (usb-storage, which requires scsi_mod and other SCSI drivers like sd_mod). Similar to SBP-2 for FireWire, there is only one protocol for storage devices on USB which is why these devices can be used with built-in drivers of all PC OSs. However there are many different flavors of buggy implementations of USB storage devices. Therefore Linux' drivers contain an impressive range of device-specific workarounds. But it is a cat-and-mouse game to get all required workarounds figured out and shipped with new Linux releases.
Judging from the initial description I assume this bug being the same as bug 1872. *** This bug has been marked as a duplicate of 1872 ***