Bug 15203 - Disconnects on heavy load
Summary: Disconnects on heavy load
Status: CLOSED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_ieee1394
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-01 01:13 UTC by Jörg Sommer
Modified: 2010-02-09 16:48 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.33-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel log (52.19 KB, text/plain)
2010-02-01 01:13 UTC, Jörg Sommer
Details

Description Jörg Sommer 2010-02-01 01:13:25 UTC
Created attachment 24843 [details]
kernel log

Hi,

I've made a backup of my filesystem and copied the content to an external via firewire attached disk. After some data was written, the kernel lost the connection to the drive and the copying stopped. I had the unplug and plug in the device again, to access the disk. I think I can reproduce it by copying many data to the disk.
Comment 1 Anonymous Emailer 2010-02-01 23:31:26 UTC
Reply-To: stefanr@s5r6.in-berlin.de

>> http://bugzilla.kernel.org/show_bug.cgi?id=15203
>>  --> (http://bugzilla.kernel.org/attachment.cgi?id=24843)

According to the attachment, the device is a Maxtor OneTouch 500 GB,
perhaps based on an LSI bridge.  About 15 minutes after the disk was
probed and filesystem was mounted and no further SCSI or 1394 log entry
occurred, the node vanishes:

> [ 2261.346212] EXT4-fs (dm-0): mounted filesystem with ordered data mode
> [ 3211.921958] ieee1394: Error parsing configrom for node 0-00:1023
> [ 3211.922080] ieee1394: Node paused: ID:BUS[0-00:1023] 
> GUID[0010b9021147235d]
> [ 3214.937877] ieee1394: Node removed: ID:BUS[0-00:1023] 
> GUID[0010b9021147235d]
> [ 3214.938098] sd 6:0:0:0: [sda] Unhandled error code
> [ 3214.938107] sd 6:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
> [ 3214.938115] sd 6:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 11 58 a1 c4 00 02 00
> 00
> [ 3214.938133] end_request: I/O error, dev sda, sector 291021252
> [ 3214.938144] Buffer I/O error on device sda, logical block 72755313
[...]
> [ 3216.119736] EXT3-fs error (device sda): ext3_find_entry: reading directory
> #9091097 offset 0
> [ 3216.125503] EXT3-fs error (device sda): ext3_find_entry: reading directory
> #9091097 offset 0
> [ 3220.388195] __journal_remove_journal_head: freeing b_committed_data
> [ 3220.449074] sd 6:0:0:0: [sda] Stopping disk
> [ 3220.449193] sd 6:0:0:0: [sda] START_STOP FAILED
> [ 3220.449201] sd 6:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
> [ 3220.714957] ieee1394: Node added: ID:BUS[0-00:1023] 
> GUID[0010b9021147235d]
> [ 3220.722737] scsi7 : SBP-2 IEEE-1394
> [ 3221.725340] ieee1394: sbp2: Logged into SBP-2 device
> [ 3221.725408] ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload
> [2048]
> [ 3221.728653] scsi 7:0:0:0: Direct-Access     Maxtor   OneTouch         0121
> PQ: 0 ANSI: 4

From the point of view of the drivers, the device has disabled itself at
 3211.921958.  About nine seconds later, at 3220.714957, the device had
enabled itself again, reset the bus, and was consequently discovered
again by the drivers.

This can only mean that either the hardware is unstable (this may be
just one of the component or the entire combination of disk enclosure
and bridge board, FireWire cable, and FireWire controller in the PC) or
that the firmware is buggy.

Jörg, please reconfigure the kernel to use the newer firewire-ohci +
firewire-core + firewire-sbp2 drivers instead of ohci1394 + ieee1394 +
sbp2.  Perhaps the newer drivers handle this better, but I doubt it.  If
 the newer drivers bring no substantial improvement, please attach a
kernel log with their failure messages then.
Comment 2 Anonymous Emailer 2010-02-01 23:31:34 UTC
Reply-To: stefanr@s5r6.in-berlin.de

For the list to know.  If you want to respond on the list and on
bugzilla, add Cc: bugzilla-daemon@bugzilla.kernel.org abd keep the [Bug
...] in the subject.

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15203
> 
>            Summary: Disconnects on heavy load
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.33-rc4
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IEEE1394
>         AssignedTo: drivers_ieee1394@kernel-bugs.osdl.org
>         ReportedBy: joerg@alea.gnuu.de
>         Regression: No
> 
> 
> Created an attachment (id=24843)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=24843)
> kernel log
> 
> Hi,
> 
> I've made a backup of my filesystem and copied the content to an external via
> firewire attached disk. After some data was written, the kernel lost the
> connection to the drive and the copying stopped. I had the unplug and plug in
> the device again, to access the disk. I think I can reproduce it by copying
> many data to the disk.
>
Comment 3 Jörg Sommer 2010-02-03 17:16:30 UTC
The firewire-sbp2 suffers from the same problem.

Feb  3 17:44:46 ibook kernel: [18048.266475] firewire_ohci: Added fw-ohci device 0002:20:0e.0, OHCI version 1.10
Feb  3 17:44:47 ibook kernel: [18048.766781] firewire_core: created device fw0: GUID 000a95fffeb23f48, S400
Feb  3 17:44:47 ibook kernel: [18048.779537] scsi6 : SBP-2 IEEE-1394
Feb  3 17:44:47 ibook kernel: [18048.779778] firewire_core: created device fw1: GUID 0010b9021147235d, S400
Feb  3 17:44:47 ibook kernel: [18048.779793] firewire_core: phy config: card 0, new root=ffc1, gap_count=5
Feb  3 17:44:47 ibook kernel: [18048.979935] firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries)
Feb  3 17:44:48 ibook kernel: [18048.983272] scsi 6:0:0:0: Direct-Access     Maxtor   OneTouch         0121 PQ: 0 ANSI: 4
Feb  3 17:44:48 ibook kernel: [18048.985219] sd 6:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
Feb  3 17:44:48 ibook kernel: [18048.985972] sd 6:0:0:0: [sdb] Write Protect is off
Feb  3 17:44:48 ibook kernel: [18048.985983] sd 6:0:0:0: [sdb] Mode Sense: 2d 08 00 00
Feb  3 17:44:48 ibook kernel: [18048.987002] sd 6:0:0:0: [sdb] Got wrong page
Feb  3 17:44:48 ibook kernel: [18048.987011] sd 6:0:0:0: [sdb] Assuming drive cache: write through
Feb  3 17:44:48 ibook kernel: [18048.989597] sd 6:0:0:0: [sdb] Got wrong page
Feb  3 17:44:48 ibook kernel: [18048.989608] sd 6:0:0:0: [sdb] Assuming drive cache: write through
Feb  3 17:44:48 ibook kernel: [18048.989619]  sdb: unknown partition table
Feb  3 17:44:48 ibook kernel: [18049.020862] sd 6:0:0:0: [sdb] Got wrong page
Feb  3 17:44:48 ibook kernel: [18049.020876] sd 6:0:0:0: [sdb] Assuming drive cache: write through
Feb  3 17:44:48 ibook kernel: [18049.020886] sd 6:0:0:0: [sdb] Attached SCSI disk
Feb  3 17:45:17 ibook kernel: [18078.601471] kjournald starting.  Commit interval 5 seconds
Feb  3 17:45:17 ibook kernel: [18078.607698] EXT3-fs (sdb): using internal journal
Feb  3 17:45:17 ibook kernel: [18078.607729] EXT3-fs (sdb): mounted filesystem with writeback data mode
Feb  3 18:00:54 ibook kernel: [19015.905198] firewire_core: phy config: card 0, new root=ffc1, gap_count=5
Feb  3 18:00:59 ibook kernel: [19021.283668] firewire_core: phy config: card 0, new root=ffc1, gap_count=5
Feb  3 18:01:00 ibook kernel: [19022.406139] firewire_sbp2: fw1.0: orb reply timed out, rcode=0x00
Feb  3 18:01:00 ibook kernel: [19022.406152] firewire_sbp2: fw1.0: failed to reconnect
Feb  3 18:01:01 ibook kernel: [19022.605714] firewire_sbp2: fw1.0: error status: 0:10
Feb  3 18:01:01 ibook kernel: [19022.606118] firewire_sbp2: fw1.0: logged in to LUN 0000 (0 retries)
Feb  3 18:01:01 ibook kernel: [19022.840913] sd 6:0:0:0: [sdb] Unhandled error code
Feb  3 18:01:01 ibook kernel: [19022.840929] sd 6:0:0:0: [sdb] Result: hostbyte=0x02 driverbyte=0x00
Feb  3 18:01:01 ibook kernel: [19022.840938] sd 6:0:0:0: [sdb] CDB: cdb[0]=0x2a: 2a 00 31 2a 97 34 00 00 08 00
Feb  3 18:01:01 ibook kernel: [19022.840956] end_request: I/O error, dev sdb, sector 824874804
Feb  3 18:01:01 ibook kernel: [19022.840967] Buffer I/O error on device sdb, logical block 206218701

Therefore, I guess the device is broken. Do you think it's possible to add an workaround that prevents this?
Comment 4 Anonymous Emailer 2010-02-06 13:34:01 UTC
Reply-To: stefanr@s5r6.in-berlin.de

> Therefore, I guess the device is broken. Do you think it's possible to add an
> workaround that prevents this?

Check whether a firmware update is available for this device.  Also,
some FireWire disk models from a major HDD manufacturer (I don't
remember which one) were sold with bad cables.  If you can get hold of
another cable, try that.
Comment 5 Jörg Sommer 2010-02-07 19:22:38 UTC
Thanks for your reply. I think you can close this bug report.

Note You need to log in before you can comment on or make changes to this bug.